I recommend doing some experiments before considering no register allocation unbearingly slow. I once tried running Gentoo with everything compiled -O0 and the user experience with most software wasn't significantly different. The amount of performance critical C code on a modern PC is surprisingly low. Stuff like media decoding is usually done in assembly.
> I recommend doing some experiments before considering no register allocation unbearingly slow. I once tried running Gentoo with everything compiled -O0
AFAIK, register allocation is one of the few optimization passes which are always enabled on all compilers, even with -O0, so your experiment proves nothing.
The memory accesses are also easily visible by disassembling the compiled binary. Performance of resulting binary at -O0 is also rougly similar to performance of binary produced by Tiny C Compiler, which doesn't implement register allocation at all.