Skip to content

aarch64: Implement TLS ELF GD Relocations#3026

Merged
cfallin merged 1 commit intobytecodealliance:mainfrom
afonso360:aarch64-elf-tls
Jun 24, 2021
Merged

aarch64: Implement TLS ELF GD Relocations#3026
cfallin merged 1 commit intobytecodealliance:mainfrom
afonso360:aarch64-elf-tls

Conversation

@afonso360
Copy link
Copy Markdown
Contributor

Hey,

I've been working on this, but I'm a bit stuck and need some help.

This PR Implements TLS ELF GD relocations for aarch64. ELF GD Relocations are a bit unusual in aarch64 since the default is TLS descriptors (at least in gcc/clang).

Right now I don't think the relocations are being performed correctly.

The only way I have to test this is to run the following command:
clif-util compile -D --set tls_model=elf_gd --target aarch64 ./filetests/filetests/isa/aarch64/tls-elf-gd.clif

Which emits the following code:

... snipped ...
  20:   00 00 00 90             adrp    x0, #0
  24:   00 00 00 91             add     x0, x0, #0
  28:   00 00 00 94             bl      #0x28
  2c:   1f 20 03 d5             nop
... snipped ...

I suspect that this means that the first two relocations are silently not being done?

The other issue that I have, is that I haven't found a way to create a runtest that tests this.

&mut Inst::EmitIsland { .. } => {}

&mut Inst::ElfTlsGetAddr { .. } => {
// TODO: This is probably wrong
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. All registers are fixed, so marking them as clobbered in aarch64_get_regs is enough. There are no registers that need to be remapped.

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

I suspect that this means that the first two relocations are silently not being done?

clif-util compile shows the relocations separate from the disassembly. I believe you need to use -p to show them at all though.

let mut relocs = PrintRelocs::new(options.print);

@afonso360
Copy link
Copy Markdown
Contributor Author

afonso360 commented Jun 24, 2021

Oh, that's really nice, I didn't see that flag, here's what that outputs:

reloc_external: Aarch64TlsGdAdrPrel21 u1:0 0 at 36
reloc_external: Aarch64TlsGdAddLo12Nc u1:0 0 at 40
reloc_external: Call %ElfTlsGetAddr 0 at 44

So, it is doing the relocations, but i don't think those are the correct offsets to be emitting?

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

Looks like it is off by 4 bytes if these offsets are in decimal.

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

I think you need to move the relocs before the put4.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:module labels Jun 24, 2021
@afonso360
Copy link
Copy Markdown
Contributor Author

Yeah, that seems to be the right thing to do, we also put relocations before the call instruction and I didn't notice it before:

sink.add_reloc(loc, Reloc::Arm64Call, &info.dest, 0);
sink.put4(enc_jump26(0b100101, 0));

Unfortunately we get the same result, but I did another test, by compiling the following file with gcc and dumping relocations:

__thread int tl;

void bar(int b) {
  tl = b;
}

int foo() {
        return tl;
}

// aarch64-linux-gnu-gcc -O2 -fPIC -ftls-model=global-dynamic -mtls-dialect=trad -c ./reloc.c
// aarch64-linux-gnu-objdump -dr reloc.o

We get:

  10:   90000000        adrp    x0, 0 <bar>
                        10: R_AARCH64_TLSGD_ADR_PAGE21  tl
  14:   91000000        add     x0, x0, #0x0
                        14: R_AARCH64_TLSGD_ADD_LO12_NC tl
  18:   94000000        bl      0 <__tls_get_addr>
                        18: R_AARCH64_CALL26    __tls_get_addr
  1c:   d503201f        nop

Which is not the relocation we are emitting. I'm going to fix that now

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

Before linking there is:

    1730:       90000000        adrp    x0, 0 <_ZN67_$LT$mini_core..Option$LT$T$GT$$u20$as$u20$mini_core..PartialEq$GT$2ne17h6ea470a27df2424aE>
                        1730: R_AARCH64_TLSGD_ADR_PREL21        _ZN21mini_core_hello_world3TLS17hbf7bac7ea236bf4cE
    1734:       91000000        add     x0, x0, #0x0
                        1734: R_AARCH64_TLSGD_ADD_LO12_NC       _ZN21mini_core_hello_world3TLS17hbf7bac7ea236bf4cE
    1738:       94000000        bl      0 <__tls_get_addr>
                        1738: R_AARCH64_CALL26  __tls_get_addr
    173c:       d503201f        nop

which seems correct. However after linking it turns into:

    3f34:       580b0520        ldr     x0, 19fd8 <_GLOBAL_OFFSET_TABLE_+0x28>
    3f38:       d53bd041        mrs     x1, tpidr_el0
    3f3c:       8bfff9bd        .inst   0x8bfff9bd ; undefined
    3f40:       d503201f        nop

which has an undefined instruction.

Edit: missed your above comment.

@afonso360
Copy link
Copy Markdown
Contributor Author

I've updated it now, we should be emitting exactly what gcc is doing now.

By the way, can you tell me the steps to reproduce the test you are doing?

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

I am trying to run the test suite of cg_clif using TARGET_TRIPLE=aarch64-unknown-linux-gnu ./test.sh and the following patch:

diff --git a/Cargo.toml b/Cargo.toml
index ef68d7ee..f2bf9536 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -17,7 +17,7 @@ cranelift-jit = { git = "https://github.com/bytecodealliance/wasmtime.git", bran
 cranelift-object = { git = "https://github.com/bytecodealliance/wasmtime.git", branch = "main" }
 target-lexicon = "0.12.0"
 gimli = { version = "0.24.0", default-features = false, features = ["write"]}
-object = { version = "0.25.0", default-features = false, features = ["std", "read_core", "write", "archive", "coff", "elf", "macho", "pe"] }
+object = { version = "0.25.3", default-features = false, features = ["std", "read_core", "write", "archive", "coff", "elf", "macho", "pe"] }
 
 ar = { git = "https://github.com/bjorn3/rust-ar.git", branch = "do_not_remove_cg_clif_ranlib" }
 indexmap = "1.0.2"
@@ -25,13 +25,13 @@ libloading = { version = "0.6.0", optional = true }
 smallvec = "1.6.1"
 
 # Uncomment to use local checkout of cranelift
-#[patch."https://github.com/bytecodealliance/wasmtime.git"]
-#cranelift-codegen = { path = "../wasmtime/cranelift/codegen" }
-#cranelift-frontend = { path = "../wasmtime/cranelift/frontend" }
-#cranelift-module = { path = "../wasmtime/cranelift/module" }
-#cranelift-native = { path = "../wasmtime/cranelift/native" }
-#cranelift-jit = { path = "../wasmtime/cranelift/jit" }
-#cranelift-object = { path = "../wasmtime/cranelift/object" }
+[patch."https://github.com/bytecodealliance/wasmtime.git"]
+cranelift-codegen = { path = "../wasmtime/cranelift/codegen" }
+cranelift-frontend = { path = "../wasmtime/cranelift/frontend" }
+cranelift-module = { path = "../wasmtime/cranelift/module" }
+cranelift-native = { path = "../wasmtime/cranelift/native" }
+cranelift-jit = { path = "../wasmtime/cranelift/jit" }
+cranelift-object = { path = "../wasmtime/cranelift/object" }
 
 #[patch.crates-io]
 #gimli = { path = "../" }
diff --git a/example/mini_core_hello_world.rs b/example/mini_core_hello_world.rs
index 6570f2bf..3d2d0303 100644
--- a/example/mini_core_hello_world.rs
+++ b/example/mini_core_hello_world.rs
@@ -294,7 +294,7 @@ struct ExternTypeWrapper {
 
     #[cfg(all(not(jit), target_os = "linux"))]
     unsafe {
-        global_asm_test();
+        //global_asm_test();
     }
 
     // Both statics have a reference that points to the same anonymous allocation.
@@ -309,14 +309,14 @@ struct ExternTypeWrapper {
 }
 
 #[cfg(all(not(jit), target_os = "linux"))]
-global_asm! {
+/*global_asm! {
     "
     .global global_asm_test
     global_asm_test:
     // comment that would normally be removed by LLVM
     ret
     "
-}
+}*/
 
 #[repr(C)]
 enum c_void {
diff --git a/src/lib.rs b/src/lib.rs
index aed25a48..956113b2 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -289,7 +289,7 @@ fn build_isa(sess: &Session, backend_config: &BackendConfig) -> Box<dyn isa::Tar
                 cranelift_codegen::isa::lookup_variant(target_triple, variant).unwrap();
             // Don't use "haswell" as the default, as it implies `has_lzcnt`.
             // macOS CI is still at Ivy Bridge EP, so `lzcnt` is interpreted as `bsr`.
-            builder.enable("nehalem").unwrap();
+            //builder.enable("nehalem").unwrap();
             builder
         }
     };

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

mini_core_hello_world runs fine now!

Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD.

This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers.
However currently we only recognize elf_gd so lets start with that as a TLS implementation.
@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

libstd compiles fine too with a small patch to the build system to fix cross-compiling. (I recently rewrote it from shell scripts to rust) std_example doesn't compile yet. I think due to a select.i128.

@afonso360
Copy link
Copy Markdown
Contributor Author

afonso360 commented Jun 24, 2021

mini_core_hello_world runs fine now!

That's awesome!

Although it would be nice if we had a more definitive test that would catch issues like this (Wrong relocations).
cc: @cfallin Any ideas on this?

@bjorn3 I'm having some issues running cg_clif's test suite, is there anywhere I can ask for a bit more guidance, so we don't pollute this PR?

I think due to a select.i128.

I guess that's next on the i128 impl list

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Jun 24, 2021

When commenting out all i128 tests in std_example, it runs fine too.

@bjorn3 I'm having some issues running cg_clif's test suite, is there anywhere I can ask for a bit more guidance, so we don't pollute this PR?

Maybe zulip? You can PM me on either the rust-lang or bytecodealliance zulip.

@afonso360 afonso360 marked this pull request as ready for review June 24, 2021 11:35
@cfallin
Copy link
Copy Markdown
Member

cfallin commented Jun 24, 2021

Although it would be nice if we had a more definitive test that would catch issues like this (Wrong relocations).
cc: @cfallin Any ideas on this?

Yeah, I agree this would be nice; we don't really have a good answer for this right now. Probably the best answer would be an end-to-end test of some sort that uses the relocation and executes on the target platform. In the absence of that, we could at least print relocations in the VCode output and check them (as a golden output) in compile filetests. I'd be happy to review something for that if you would like.

Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! I'm really happy that cg_clif on aarch64 is so close to working.

Will merge once the whitespace-at-end-of-line nit commented on by bjorn3 is fixed.

@afonso360
Copy link
Copy Markdown
Contributor Author

Oops, it was fixed, but i forgot to mark it as resolved. Thanks

Probably the best answer would be an end-to-end test of some sort that uses the relocation and executes on the target platform. In the absence of that, we could at least print relocations in the VCode output and check them (as a golden output) in compile filetests. I'd be happy to review something for that if you would like.

Hmm, its not something i want to pick up right now, but let me open an issue with this, and maybe someone does

@cfallin cfallin merged commit 652f21e into bytecodealliance:main Jun 24, 2021
@afonso360 afonso360 deleted the aarch64-elf-tls branch June 25, 2021 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:module cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants