The full tickless mode that is activated with e.g. nohz_full=cpux-cpuy indeed is only effective if there is just one runnable task on a each nohz_full CPU:
Adaptive-ticks does not do anything unless there is only one
runnable task for a given CPU, even though there are a number
of other situations where the scheduling-clock tick is not
needed.
(cf. Documentation/timers/NO_HZ.txt)
Thus, if you check a nohz_full CPU with ps it makes sense to explicitly look for runnable tasks - e.g.:
$ ps -e -L -o cpuid,pid,lwp,state,pri,rtprio,class,wchan:20,comm \
| awk '$1 == '$mycpunr
(i.e. look at the state column)
That means it's ok to have have some additional tasks on a nohz_full CPU as long as they aren't runnable.
With just nohz_full=, nothing stops the kernel to schedule user/kernel threads on the selected CPUs. Thus, one usually also isolates those CPUs to avoid any interference by other threads. For example with:
nohz_full=cpux-cpuy isolcpus=cpux-cpuy
(cf. Linux Kernel Parameters)
With those options a thread on an isolated nohz_full CPU still can be interrupted, e.g. by timers and RCU operations.
Thus, if you want to minimize the latency of your isolated thread you need to disable other sources of interruptions.
You can check /proc/timer_list for timers that are still active on isolated CPUs.
Common examples for timers that may show on an isolated CPU are watchdog_timer_fn and a timer related to machine check exceptions (MCE) functionality.
You can disable those interruptions by further kernel options, e.g.:
nowatchdog mce=ignore_ce
Looking at the /proc/interrupts counters is a good way to check for hardware induced interruptions. Another source of interruptions are Softirqs, thus one also has to check the /proc/softirqs counters.
For example, to minimize RCU related interruptions on isolated CPUs, one can offload RCU callbacks to kernel threads, migrate them to a non-isolated CPU and free the isolated CPU from having to notify a callback thread by adding the kernel option:
rcu_nocb_poll
That option requires rcu_nocbs= to be effective, but nohz_full= already implies rcu_nocbs= for the specified CPUs.
Note that you explicitly have to move the offloaded RCU callback threads to a housekeeping CPU - by explicitly setting the CPU affinities of those threads. For example with tuna (to CPU 0):
# tuna -U -t 'rcu*' -c 0 -m
The kernel document Documentation/kernel-per-CPU-kthreads.txt describes further sources of interruptions (a.k.a. OS Jitter) and shows how to locate them by running your test load with tracing enabled.