-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Description
Feature
Currently, on aarch64 backend, the following piece of CLIF instructions...
; Equivalent to: int64_t *v9; int64_t v10; v4 = v9[v10];
v1 = iconst.i64 3
v2 = ishl.i64 v10, v1 ; v1 = 3
v3 = iadd v9, v2
v4 = load.i64 v3
... will generate the assembly like below:
adrp x4, 0x780000
ldr x4, [x4]
lsl x5, x3, #3
ldr x4, [x4, x5]
However, the assembly can be converted into more efficient one like this:
adrp x4, 0x780000
ldr x4, [x4]
ldr x4, [x4, x3, lsl #3]
Benefit
The shorter instruction sequence will help improve the performance.
In fact, this problem was found when I was diffing the assembly generated by cranelift and llvm, where llvm was around 10% faster than cranelift in my case.
Implementation
I've walked through the cranelift codebase and figured out that such addressing mode seems to be represented as AMode::RegScaled, but not sure how I can teach the code generator to use RegScaled for ldr.
Editing isle rules or something like that?
Metadata
Metadata
Assignees
Labels
No labels