surely you would expect the schedule latency to be smaller not three times bigger
And that is the reason why it gets "corrected" with _NONE, _LOG. or _LINEAR.
The concept of SMP also works when you look at it as splitting, and not adding, a CPU. Then you don't gain performance overall, but still have better responsiveness.
This short function ("period") uses both min_granularity and latency. I reformatted a bit. I don't think you have to know anything about C language to understand - there is even an unlikely hint:
static u64 __sched_period(unsigned long nr_running)
{
if (unlikely(nr_running > sched_nr_latency))
return nr_running * sysctl_sched_min_granularity;
else
return sysctl_sched_latency;
}
In the end it is more about the word than the thing: wikipedia, CFS:
...the atomic units by which an individual process' share of the
CPU was allocated (thus making redundant the previous notion of
timeslices)
That redundant word is still in kernel/sched/fair.c:
* (to see the precise effective timeslice length of your workload,
* run vmstat and monitor the context-switches (cs) field)
The values 6ms, 0.75ms (=1/8) and 24ms (= _LOG-corrected for ncpus=8) can be IMHO interpreted as periods i.e. timeslices. If you convert it to Hertz, it matches with the Kconfig.hz ranges, which are 100HZ (server) to 1000HZ (high-responsiveness).
1/.00075 s
1333.3 Hz
More than thousand min-granularity-"slices" fit in a second.
1/.006 s
166.6 Hz
166 uncorrected latency "slices" lies between the 100HZ "server" and the 250HZ "compromise".
1/.024 s
41.6 Hz
With log-correction for 8 cores, each one can reduce context switching by factor 4, still the "effective latency" remains low.
Compare it to a barber shop, where you want to guarantee that no new costumer has to wait longer than 10 minutes. This means you have to preempt your current costumer in the seat every 10 minutes. at least for the time it takes to say hello.
A shop with four seats and barbers can reduce that 10-minute slice. With four barbers working each in a cabinet, they only have to stop and peek every 40 minutes, and on average a newly entered costumer will only wait 10 min as before.
That would be the full, "linear" correction of latency: multiply by N.
But in the worst case, all four check for new costumers at the same time - because they started simultaneously. If a costumer enters one minute after that, he might have to wait 39 minutes before he gets served.
So as a compromise you multiply not by N, but by log(N).
1 + ilog(N)
this gives 1+ilog(4) = 1+2, so the 4 barbers can extend their slice from 10 to 30 minutes (instead of 40). Together they achieve a 10-minute latency.
Quadruple to 16 and it extends only to 50 minutes. The "correction" is logarithmic and has this + 1.