The OS's "CPU usage" measurement is based on whether that user-space process/thread (task) is running on a CPU, even if those instructions just waste time. If a task isn't using CPU time, that means the OS could be running something else.
That is not the case when user-space busy-waits, even if it can force the CPU into a low-power state. CPU power states are irrelevant, as marcelm commented: all that matters is whether you've told the OS this task should sleep. The OS can put the CPU to sleep if there are no other tasks.
Even the x86 pause instruction doesn't let the OS schedule a different task on the CPU. Nor would the recent tpause or umonitor / umwait, although those can put the CPU into C0.1 or C0.2 power-save state.
Also nop is a true NOP in x86-64, no back-end execution unit. xchg eax,eax is not a NOP in 64-bit mode, it zero-extends EAX into RAX, thus it can't use the 0x90 encoding in 64-bit mode, only in 16 or 32. That's why nop has its own entry in the Intel manuals. Of course, even in 32-bit mode, modern CPUs recognize 0x90 as a no-op and special case it, along with other long-nop opcodes.
But again, that's not the relevant thing, CPU usage is about whether the CPU can go into a power-save state or run something else, because the kernel knows that this process doesn't have anything to run.
hlt is a privileged instruction (only valid in ring 0, aka CPL=0 as you can see in Intel's manual entry exceptions: #GP(0) If the current privilege level is not 0.
User-space can't put the CPU to sleep until the next interrupt on its own, only the kernel can do that (if the scheduler decides there's nothing else to do while this user-space task is sleeping).
Or not until the WAITPKG CPU feature (Tremont and Alder Lake), although umwait lets the kernel set a duration limit for the sleeps initiated by user-space so the OS could limit it to not be such long sleeps from user-space. And they're limited in how deep they can sleep, even shallower than hlt (C1), which is important for wake-up latency. Perhaps a real-time OS knows some critical interrupt is coming soon and wants the CPU to not have any extra wakeup latency for it. Or doesn't want to let anything put the CPU to sleep. (umwait has control knows for the kernel to set, hlt doesn't.)
User-space can waste CPU time until the next interrupt, though, and that's the only way for a pre-emptive multitasking kernel to regain control of the CPU, unless user-space made a system call or CPU exception. (Perhaps a system call like yield() specifically intended to hint the OS to context-switch to some other task that's waiting for a free CPU core.)
Cross-site duplicate:
And related Q&As:
This question IMO belongs on Stack Overflow, with tags [assembly][operating-system][x86][cpu-architecture]. It would at least be a better fit there; unix.SE is mostly about using it, not OS theory / concepts.
If possible, instead of lots of tiny sleeps, use poll or select to wait for activity on a file descriptor. Then your process (and the CPU) can stay asleep instead of continually waking up to make another system call.
If you are going to busy-wait, putting a sleep inside that loop makes it less bad, but it's still not great compared to sleeping or blocking in a way that the OS will wake you up when there's something to do.
As an analogy, imagine processes like people waiting in line at a bank. The bank tellers are CPUs, they carry out requests from customers (processes).
sleep() is like stepping away from a teller and sitting down in a chair, getting back into line after the indicated duration. (If most people are doing this, the line will usually be empty: CPUs idle so a new task can run right away).
- Running NOPs in a loop would be like talking about the weather, keeping the teller occupied but not doing any banking. The teller can't serve another customer because you're keeping them occupied doing nothing.
(Unlike real banks, pre-emptive multi-tasking involves a teller forcing a customer go to the back of the line even though they aren't finished with everything they want to do.)