This will depend on your specific workload and more details of your
architecture (e.g., do the processes read the same memory, which may
be on separate NUMA nodes?), so it would be best if you could measure
performance and see what you get.
Running 24 CPU-intensive tasks over 8 execution units will require
scheduling tasks on and off, whereas running as many tasks as
execution units should in principle require less switching. I wouldn't
expect that the overhead related to running code to switch between
tasks is that high, but, when a task is scheduled off a CPU, one
effect is that the state of that CPU's caches risks being flushed
(they are likely to become useless for the new task being scheduled);
if this happens often enough, it could lead to degraded performance.
In your case, it may make sense to benchmark those three against each other:
- Run 24 concurrent tasks;
- run 8 tasks;
- run 8 tasks, assigning a specific CPU to each of them.
You can do (3) using e.g. cpuset on Linux.