I am not a kernel programmer, though, so I can't say whether it's being used in any capacity on modern, x64 systems or whether it's a compatibility mode for low-powered embedded architectures. Maybe someone more knowledgeable can chime in.
In the case of the kernel it's not just a performance thing. There are a lot of things that are totally irrelevant to the high level notion of the C execution model that are therefore not exposed, and certainly not in the C standard.
Things like: Hm, I need to swap my stack register and page table with this other process.
Or writing interrupt handlers.
Atomic operations and memory barriers used to be one of those things, but compiler extensions and new language standards have been catching up on some of that... Though to be honest a kernel will want enough control that it will likely still go outside the standard or extra compiler support for these anyway.
EDIT: I see several, just from doing a simple search against 'movq'
https://github.com/torvalds/linux/search?q=movq&unscoped_q=m...
I am not a kernel programmer, though, so I can't say whether it's being used in any capacity on modern, x64 systems or whether it's a compatibility mode for low-powered embedded architectures. Maybe someone more knowledgeable can chime in.