Consider using SIMD registers for "hot" local variables instead of placing them on stack when out of free GP registers

The idea is intuitive, though I'm not sure it was ever sounded in context of CLR JIT - why not use X/Y/ZMM registers for "hot" local variables to avoid stack memory accesses, just like common GP registers are used to load/store the values? (I'm not talking here about operations other than load/store `MOVQ/MOVD` because it's much deeper topic which may include auto-vectorization and other funny stuff.)

There are always up to at least 6 volatile SIMD registers, and the number of regs used may be increased up to the size of SIMD register file. With more complex techniques this may provide up to 8 regs for x86/SSE+, up to 16 regs for x64/SSE+, up to 32 regs for x64/AVX-512 (future). These numbers may be achievable in the context of CLR due to the fact that at the moment few code in system assemblies uses vectors, and to my understanding SIMD ops are now only used for FP operations otherwise.

Even taking into account the store forwarding mechanisms implemented in modern CPUs when accessing memory, the significant speed-up could be achieved. One extra point is that on HyperThreaded CPUs the register files are independent on each other, whereas memory access circuitry is mostly shared by (sub-)cores.

category:design
theme:register-allocator
skill-level:expert
cost:large
impact:large

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using SIMD registers for "hot" local variables instead of placing them on stack when out of free GP registers #10444

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider using SIMD registers for "hot" local variables instead of placing them on stack when out of free GP registers #10444

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions