For only one microprocessor unit, the load average output by top could be understood that if it's above 1.0 then there are jobs waiting. But if we have n number of cores on a multicore system with l*n logical cores (on my Intel CPU n=6 and l*n = 12 so the output from nproc is 12), should we divide the load average by the output from nproc to see if that number is above 1 to understand if there are (on average) jobs waiting, or is it better to use htop to understand if a parallel multicore system is getting too much average load?
I think that my method was wrong but the conclusion was right when I saw that an average load was above 10 top, I checked with ps which process was expensive and found an overflow from a running program, but if that machine actually has output from nproc > 10 then it would not really have been cause for investigation if I had known that. Do you agree?