8

I am trying to learn operating system concepts. Here is two simple python code:

while True:
    pass

and this one:

from time import sleep
while True:
    sleep(0.00000001)

Question: Why when running first code CPU usage is 100% but when running the second one it is about 1% to 2% ? I know it may sounds stupid but why we can not implement something like sleep in user space mode without using sleep system call ?

NOTE: I have tried to understand sleep system call from linux kernel but TBH I didn't understand what happens there. I also search about NOP assembly code and turns out that it is not really doing nothing but doing something useless (like xchg eax, eax) and maybe this is that cause of 100% CPU usage. but I am not sure.

What exactly assembly code for sleep system call that we can't do it in user space mode? Is it something like HLT

I also tried to use HLT assembly in code like this:

section     .text
global      _start 
_start: 
    hlt
halter:
    jmp     _start
section     .data
msg     db  'Hello world',0xa  
len     equ $ - msg   

but after running this code I see kernel general protection fault like this:

[15499.991751] traps: hello[22512] general protection fault ip:401000 sp:7ffda4e83980 error:0 in hello[401000+1000]

I don't know maybe this is related to protection ring or my code is wrong? The other question here is that OS is using HLT or other protected assembly commands under beneath sleep system call or not?

Peter Cordes
  • 6,328
  • 22
  • 41
  • Incidentally, your `sleep(0.00000001)` is doing no such thing. Even ignoring the time taken for a system call and timer interrupt, your process gave up its scheduled time-slot and will be rescheduled through a queue mechanism. `sleep` is usually the _minimum_ interval (unless a signal is received): Linux gives no real-time guarantees. – Paul_Pedant Jul 10 '22 at 11:19
  • 3
    There's a pretty great blog post on this exact topic regarding Linux: https://blog.donkeysharp.xyz/post/what-happens-when-a-process-goes-to-sleep/ – Mavrik Jul 10 '22 at 23:45
  • python code has to go through an interpreter which obscures the relationship between the code and what the processor is actually executing. idk if the sleep function from time even uses a sleep syscall? – qwr Jul 11 '22 at 00:43
  • @Paul_Pedant what about `sleep(0.00000000000001)` ? it is even less than nanoseconds (order of these days cpu speed) – Mojtaba Kamyabi Jul 11 '22 at 06:12
  • @qwr is there any other way to sleep without calling `sleep` systemcall? BTW I checked it with `strace python myscript.py` linux command and yes it doesn't use `clock_nanosleep` in the system calls !!? I don't know why :/ – Mojtaba Kamyabi Jul 11 '22 at 06:14
  • The limiting factor is the delay caused by switching to kernel mode, and rescheduling the user process. Sleeping for a picosecond will not make any difference to the fixed factors. I would be surprised if anything shorter than 0.00001 second made any further difference whatsoever. Try getting the clock time, sleep for 1 nanosec a million times, then clock time again. – Paul_Pedant Jul 11 '22 at 07:10
  • @MojtabaKamyabi See `man -s 7 time`, high-resolution timers. ".. as accurate as the hardware allows (microsecond accuracy is typical of modern hardware)". That is only the timer feature itself -- add on a system call, a signal, and a reschedule. – Paul_Pedant Jul 11 '22 at 07:16
  • 1
    All of this has nothing to do with putting the CPU to sleep (which is what HLT etc is about). It has everything to do with your process yielding time to the OS or not. You'd see the exact same behaviour on a CPU that has no sleep states. – marcelm Jul 11 '22 at 07:39
  • Too bad there's a tag limit of 5 tags; this could use [cpu-usage][cpu-architecture] tags. Maybe doesn't need the [python] tag, though; pretty clearly Python `sleep` is going to result in a `nanosleep` system call eventually. – Peter Cordes Jul 12 '22 at 11:03

5 Answers5

16

Why when running first code CPU usage is 100% but when running the second one it is about 1% to 2% ?

Because the first is a "busy loop": You are always executing code. The second tells the OS that this particular process wants to pause (sleep), so the OS deschedules the process, and if nothing else is using CPU, the CPU becomes idle.

I also search about NOP assembly code and turns out that it is not really doing nothing

Well, NOP = no operation: It is actively executing code that has no effect. Which you can use to pad code, but not to put the CPU in a low power state.

What exactly assembly code for sleep system call that we can't do it in user space mode?

Modern OS on x86 CPUs use mwait. Other CPU architectures use other commands.

but after running this code I see kernel general protection fault like this

That's because the OS is supposed to do this in supervisor mode. As I wrote above, the OS needs to be able to keep scheduling processes, so a process itself isn't allowed to put the CPU to idle mode.

The other question here is that OS is using HLT or other protected assembly commands under beneath sleep system call

Yes, it does. Though it's not executed during the sleep call, but inside the scheduler loop, when the scheduler detects that there are no processes that want to run.


One question for the first part. If I use very little slot of time i.e: sleep(0.0000000000000001) is scheduler still go to the next process?

For the actual OS syscalls, see man 3 sleep (resolution in seconds), man usleep (resolution in microseconds), and man nanosleep (resolution in nanoseconds`).

No matter what floating point number you use in your python code, you won't get a better resolution than the syscall used by python (whichever variant it is).

The manpages say "suspends execution of the calling thread for (at least) usec microseconds." etc., so I'd assume it gets descheduled even if the delay is zero (and then immediately rescheduled), but I didn't test that, nor did I read the kernel code. .

dirkt
  • 31,679
  • 3
  • 40
  • 73
  • One question for the first part. If I use very little slot of time i.e: sleep(0.0000000000000001) is scheduler still go to the next process? because in this situation still cpu usage is not 100% – Mojtaba Kamyabi Jul 11 '22 at 06:20
  • 6
    That time is a *minimum*. You have asked to sleep for at least that period. The operating system will have its own idea of what the smallest sleep it will allow you to take is. It could easily round it up to 10 miliseconds, for example. – pjc50 Jul 11 '22 at 13:02
5

The OS's "CPU usage" measurement is based on whether that user-space process/thread (task) is running on a CPU, even if those instructions just waste time. If a task isn't using CPU time, that means the OS could be running something else.

That is not the case when user-space busy-waits, even if it can force the CPU into a low-power state. CPU power states are irrelevant, as marcelm commented: all that matters is whether you've told the OS this task should sleep. The OS can put the CPU to sleep if there are no other tasks.


Even the x86 pause instruction doesn't let the OS schedule a different task on the CPU. Nor would the recent tpause or umonitor / umwait, although those can put the CPU into C0.1 or C0.2 power-save state.

Also nop is a true NOP in x86-64, no back-end execution unit. xchg eax,eax is not a NOP in 64-bit mode, it zero-extends EAX into RAX, thus it can't use the 0x90 encoding in 64-bit mode, only in 16 or 32. That's why nop has its own entry in the Intel manuals. Of course, even in 32-bit mode, modern CPUs recognize 0x90 as a no-op and special case it, along with other long-nop opcodes.

But again, that's not the relevant thing, CPU usage is about whether the CPU can go into a power-save state or run something else, because the kernel knows that this process doesn't have anything to run.


hlt is a privileged instruction (only valid in ring 0, aka CPL=0 as you can see in Intel's manual entry exceptions: #GP(0) If the current privilege level is not 0.

User-space can't put the CPU to sleep until the next interrupt on its own, only the kernel can do that (if the scheduler decides there's nothing else to do while this user-space task is sleeping).

Or not until the WAITPKG CPU feature (Tremont and Alder Lake), although umwait lets the kernel set a duration limit for the sleeps initiated by user-space so the OS could limit it to not be such long sleeps from user-space. And they're limited in how deep they can sleep, even shallower than hlt (C1), which is important for wake-up latency. Perhaps a real-time OS knows some critical interrupt is coming soon and wants the CPU to not have any extra wakeup latency for it. Or doesn't want to let anything put the CPU to sleep. (umwait has control knows for the kernel to set, hlt doesn't.)

User-space can waste CPU time until the next interrupt, though, and that's the only way for a pre-emptive multitasking kernel to regain control of the CPU, unless user-space made a system call or CPU exception. (Perhaps a system call like yield() specifically intended to hint the OS to context-switch to some other task that's waiting for a free CPU core.)


Cross-site duplicate:

And related Q&As:

This question IMO belongs on Stack Overflow, with tags [assembly][operating-system][x86][cpu-architecture]. It would at least be a better fit there; unix.SE is mostly about using it, not OS theory / concepts.


If possible, instead of lots of tiny sleeps, use poll or select to wait for activity on a file descriptor. Then your process (and the CPU) can stay asleep instead of continually waking up to make another system call.

If you are going to busy-wait, putting a sleep inside that loop makes it less bad, but it's still not great compared to sleeping or blocking in a way that the OS will wake you up when there's something to do.


As an analogy, imagine processes like people waiting in line at a bank. The bank tellers are CPUs, they carry out requests from customers (processes).

  • sleep() is like stepping away from a teller and sitting down in a chair, getting back into line after the indicated duration. (If most people are doing this, the line will usually be empty: CPUs idle so a new task can run right away).
  • Running NOPs in a loop would be like talking about the weather, keeping the teller occupied but not doing any banking. The teller can't serve another customer because you're keeping them occupied doing nothing.

(Unlike real banks, pre-emptive multi-tasking involves a teller forcing a customer go to the back of the line even though they aren't finished with everything they want to do.)

Peter Cordes
  • 6,328
  • 22
  • 41
3

Only adding to dirkt's good answer regarding what you can do, should do, should not do :

There are two major different ways for your code to doing nothing for a certain time :

A/ Program a timer and release the CPU for other activities expecting your code to be woken up and able to continue following the timer interrupt.
This is of course the fairest way to follow when coding programs running in a shared-time multi tasking system (to speak regarding CFS in particular, tasks designed run under the SCHED_OTHER scheduling policy)

B/ Do nothing but keep the CPU. Because expecting to get woken up by some timer interrupt is : an expectation. In no case a guarantee. At the time the interrupt would fire, the system might be busy :

  • Running some task under some real time scheduling policy,
  • Running some interrupt handler having all interrupts masked or whatever parts of uninterruptible kernel code (there are still some even under a so called preemptible linux kernel).
  • Not to say that depending on the timer system chosen the interrupt could well even never be fired.

In any case the latency (the difference in chronological time) between the moment your code expected to be running again and the moment it will actually be running again is definitely unpredictable.

This is clearly unacceptable in some contexts.
Imagine talking to some device (writing to devices memory addresses) strict minimum timings must be guaranteed between two consecutive operations and strict maximum timings must be guaranteed unless facing potential underruns or at least unacceptable latencies.

Of course since following that way, you can easily ruin the efficiency of most of kernel's code, you should only do that in what is called atomic context running dedicated in-kernel functions and therefore be constrained, from userspace, not to directly use some cpu opcodes flaged as priviledged. (such as HLT)

BTW and for now answering the real question as expressed in the title, I can remember having sometimes relied on NOPs when I needed to wait for a very precise number of cpu cycles. (It is the only thing you can be sure about with opcodes) Since, on modern arches, the memory, the devices stand on buses working on different (and sometimes unrelated) clockrates (than the cpu) this method leads to unportable (and hard to debug) code. This is clearly no longer the right way to go.
It is definitely advisable to reserve NOPs for setting breakpoints.

MC68020
  • 6,281
  • 2
  • 13
  • 44
2

It's important to note here that your Python tests are done in Python. Python is an interpreter. That means the CPU is interpreting assembly instructions of the Python interpreter executable - NOT instructions of your executable. pass is not NOP. pass will translate to Fetch the next Python line to be executed. That's going to take several dozen of assembly instructions.

On the other hand, Python's sleep( ) command takes a few instructions to parse the argument, but after that the Python interpreter lets the OS handle the actual sleep. And when the Python interpreter is not running, it doesn't count for the CPU%.

MSalters
  • 421
  • 5
  • 12
  • you are right but Suppose the same file I written as assembly for HLT case. If we replace it with NOP it still use 100% of CPU. – Mojtaba Kamyabi Jul 11 '22 at 12:32
0

NOP, may better be thought of as Null OPeration. It does something, at least in theory, but it doesn't change anything - except the Programme Counter.

They have a variety of uses, notably in older systems to wait a clock cycle for some hardware to become ready, or in execute-then-branch systems like the SPArc, where the next instruction must be executed if you wanted to branch. It'll be executed if you don't branch too! You have nothing valuable to do before changing the PC? Just throw a NOP in there!

  • 1
    If you look up really old CPU descriptions, e.g. the [6502](http://archive.6502.org/datasheets/mos_6500_mpu_nov_1985.pdf), you'll see `NOP` always has been an abbreviation for "no operation", and never for "null operation". And the 6502 actually has a dedicated opcode for that, it doesn't re-use an existing opcode that just happens to do nothing with the given registers etc. – dirkt Jul 11 '22 at 06:36
  • @dirkt: On the Intel x86/x64 CPU (and compatibles), the NOP instruction is actually an alias for XCHG AX,AX / EXCHG EAX,EAX - Opcode $90. – HeartWare Jul 11 '22 at 10:18
  • 1
    @HeartWare: No, only in 16 and 32-bit mode. In x86-64 long mode, `0x90` is a NOP, but `xchg eax, eax` has to zero-extend EAX into RAX because you're writing the 32-bit register EAX. So if you write `xchg eax,eax` in your source, an assembler has to use the 2-byte encoding, not `0x90 nop` which is now officially documented separately (along with 0F 1F long nop). https://www.felixcloutier.com/x86/nop. But yes, x86's 1-byte `90 nop` is `xchg ax,ax` or `xchg eax,eax` in 16 and 32-bit modes, just special-cased by hardware sometime *after* 8086. – Peter Cordes Jul 11 '22 at 12:26
  • @PeterCordes: Not (entirely) true. When I enter XCHG EAX,EAX in Delphi 64-bit assembler, it gets translated to opcode $90, just as in 32-bit assembler. Granted, it may be a 32-bit operand size segment in 64-bit size code segment, but still... However, if I enter XCHG RAX,RAX it gets translated to 2-byte opcode $48 $90. – HeartWare Jul 11 '22 at 13:06
  • @HeartWare: That would be a bug in Delphi's assembler. (The default operand-size in 64-bit long mode is still 32-bit; that's why a REX prefix is needed for `add rax, 1`. But if you were assembling for 32-bit *mode*, `rax` wouldn't be a valid register name, and `0x48` would be `dec eax`, so you're definitely assembling for 64-bit mode). `xchg eax,eax` writes both its operands, so it's required to assemble it to something that will actually do that, not a nop. For example `87 c0 xchg eax,eax`. BTW, `$` doesn't mean hex in any mainstream x86 syntaxes; in AT&T it just means immediate. – Peter Cordes Jul 11 '22 at 13:18
  • @PeterCordes: I can't talk to Delphi bug or not - I have to go by what I can see :-). But on another note - would XCHG EAX,EAX (long form, ie. not $90) then semantically be the same as MOVZX RAX,EAX ? Also, I use "$" as hex prefix, as this is a Delphi question :-) – HeartWare Jul 12 '22 at 05:21
  • @HeartWare: You're the only person that's mentioned Delphi on this unix.SE post, but I guess you mean this comment thread. Anyway, there's no such thing as `movzx` with a 32-bit source; the instruction you want for zero-extending 32 EAX to 64-bit RAX is `mov eax,eax`. (See my answer on [MOVZX missing 32 bit register to 64 bit register](https://stackoverflow.com/q/51387571) for details, and ISA design choices). And yes, `xchg eax,eax` is architecturally equivalent to `mov eax, eax`. – Peter Cordes Jul 12 '22 at 05:30
  • @PeterCordes: You're right (about Delphi). I had forgotten, I'd jumped into another site (usually I'm over at the general StackOverflow). My mistake :-). I'm aware that there is no legal instruction MOVZX r64,r32 - thus my "semantically" qualifier :-). 64-bit assembler is not my forte - I grew up on 16/32 bit Intel assembly, and haven't really caught up with the times, but am still interested in the subject. – HeartWare Jul 12 '22 at 05:40
  • @HeartWare I am aware of that. And on other ISAs it's one (or several) other instruction(s) that use a regular opcode with no effect (for example, `BC 0,...` and `BCR 0,...` in the /360 ISA). That's why I wrote "it doesn't re-use an existing opcode that just happens to do nothing with the given registers etc.". But it's still an abbreviation for "NO (O)Peration". – dirkt Jul 12 '22 at 18:06
  • It's an abbreviation, or mnemonic in some jargon for No OPeration, but it may (or not) be better thought of as a Null OPperation perhaps more-so in RISC. Which the 6502 wasn't! I don't know if the word Null in lieu of No was even considered, if it had been my call I'd have gone with Null, as it better captures the meaning in my mind. – Robin Hammond Jul 15 '22 at 07:13