Skip to content

Cranelift: Generate load/store using AMode::RegScaled on aarch64 #6742

@maekawatoshiki

Description

@maekawatoshiki

Feature

Currently, on aarch64 backend, the following piece of CLIF instructions...

; Equivalent to: int64_t *v9; int64_t v10; v4 = v9[v10];
v1 = iconst.i64 3
v2 = ishl.i64 v10, v1  ; v1 = 3
v3 = iadd v9, v2
v4 = load.i64 v3

... will generate the assembly like below:

adrp    x4, 0x780000
ldr     x4, [x4]
lsl     x5, x3, #3
ldr     x4, [x4, x5]

However, the assembly can be converted into more efficient one like this:

adrp    x4, 0x780000
ldr     x4, [x4]
ldr     x4, [x4, x3, lsl #3]

Benefit

The shorter instruction sequence will help improve the performance.
In fact, this problem was found when I was diffing the assembly generated by cranelift and llvm, where llvm was around 10% faster than cranelift in my case.

Implementation

I've walked through the cranelift codebase and figured out that such addressing mode seems to be represented as AMode::RegScaled, but not sure how I can teach the code generator to use RegScaled for ldr.
Editing isle rules or something like that?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions