-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Labels
Milestone
Description
On x64 we emit the following code for jump stubs:
mov rax, 123456789abcdef0h
jmp raxas I understand from
runtime/src/coreclr/vm/amd64/cgenamd64.cpp
Lines 505 to 507 in 70d20f1
| // mov rax, 123456789abcdef0h 48 b8 xx xx xx xx xx xx xx xx | |
| // jmp rax ff e0 | |
while on arm64 we make a memory load (from data section via pc):
ldr x16, [pc, #8]
br x16
[target address]runtime/src/coreclr/vm/arm64/cgencpu.h
Lines 294 to 296 in eeb79b3
| // +0: ldr x16, [pc, #8] | |
| // +4: br x16 | |
| // +8: [target address] |
I'm just wondering if it's not faster to do what x64 does and emit the const directly even if it takes 4 instructions to populate it...
mov x8, #9044
movk x8, #9268, lsl #16
movk x8, #61203, lsl #32
movk x8, #43981, lsl #48
br x8I'm asking because I have a feeling that it could be a bottleneck if I read it correctly from the TE traces (Plaintext benchmark):

cc @dotnet/jit-contrib @jkotas
PaulusParssinen
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status