-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
The idea is intuitive, though I'm not sure it was ever sounded in context of CLR JIT - why not use X/Y/ZMM registers for "hot" local variables to avoid stack memory accesses, just like common GP registers are used to load/store the values? (I'm not talking here about operations other than load/store MOVQ/MOVD because it's much deeper topic which may include auto-vectorization and other funny stuff.)
There are always up to at least 6 volatile SIMD registers, and the number of regs used may be increased up to the size of SIMD register file. With more complex techniques this may provide up to 8 regs for x86/SSE+, up to 16 regs for x64/SSE+, up to 32 regs for x64/AVX-512 (future). These numbers may be achievable in the context of CLR due to the fact that at the moment few code in system assemblies uses vectors, and to my understanding SIMD ops are now only used for FP operations otherwise.
Even taking into account the store forwarding mechanisms implemented in modern CPUs when accessing memory, the significant speed-up could be achieved. One extra point is that on HyperThreaded CPUs the register files are independent on each other, whereas memory access circuitry is mostly shared by (sub-)cores.
category:design
theme:register-allocator
skill-level:expert
cost:large
impact:large