As Gilles mentions, anywhere the FPU is liable to be used, the kernel needs to support saving and restoring its state. Since user-space can use the FPU, this needs to be handled in any case on context switches (i.e., when the current CPU switches from one thread to another) — at least, when the previously-running thread used the FPU. So why not extend that to the kernel?
There are a couple of reasons to avoid using the FPU in the kernel:
- from a portability perspective, some architectures don’t support using the FPU in the kernel at all, so generic code can’t rely on it;
- saving and restoring the FPU state is expensive and introduces certain implementation-related constraints (on x86 Linux, pre-emption in particular needs careful consideration here).
Having the kernel avoid using the FPU means that the cost for user-space can be reduced: FPU state need only be restored after a context switch when returning to user-space (as opposed to immediately after a context switch), and not in all cases (only when the threads involved actually use the FPU).
It is however possible to use the FPU (and MMX/SSE/AVX) in the kernel, in x86-specific code where the benefits outweigh the costs: thus it ends up being used in the crypto code and RAID6. These emails from Linus provide some more details. If you want to use the FPU, you need to bracket all the FPU-using the code between kernel_fpu_begin and kernel_fpu_end, and make sure it can’t fault or sleep. See arch/x86/include/asm/fpu/api.h and arch/x86/kernel/fpu/core.c for details.
For memcpy, the performance gains don’t outweigh the cost of using the FPU.
(x86 has a rather complex FPU architecture, but it provides all the features needed to make it possible for an operating system to share the FPU: it can trap whenever an FPU instruction is emitted, which allows the kernel to optimise for processes which never use the FPU, and it can indicate when the CPU and FPU state are liable to diverge. It also provides instructions to save and restore the FPU state — FSAVE, FXSAVE, and XSAVE depending on the FPU’s vintage. FPU support is perhaps the aspect of the 8086 design where the designers had the most foresight.)