[AArch64][FastISel] Fallback on atomic stlr/cas with non-reg operands.#133987
Merged
ahmedbougacha merged 1 commit intomainfrom May 8, 2025
Merged
Conversation
8e9dd8b to
d71a694
Compare
Member
|
@llvm/pr-subscribers-backend-aarch64 Author: Ahmed Bougacha (ahmedbougacha) ChangesThis has been a latent bug for almost 10 years, but is relatively hard to trigger, needing an address operand that isn't handled by getRegForValue (in the test here, constexpr casts). When that happens, it returns 0, which FastISel happily uses as a register operand, all the way to asm, where we either get a crash on an invalid register, or a silently corrupt instruction. Unfortunately, FastISel is still enabled at -O0 for at least ILP32/arm64_32. Full diff: https://github.com/llvm/llvm-project/pull/133987.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index 5ef439f8224c1..5ddf83f45ac69 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -2190,6 +2190,8 @@ bool AArch64FastISel::selectStore(const Instruction *I) {
if (isReleaseOrStronger(Ord)) {
// The STLR addressing mode only supports a base reg; pass that directly.
Register AddrReg = getRegForValue(PtrV);
+ if (!AddrReg)
+ return false;
return emitStoreRelease(VT, SrcReg, AddrReg,
createMachineMemOperandFor(I));
}
@@ -5070,12 +5072,16 @@ bool AArch64FastISel::selectAtomicCmpXchg(const AtomicCmpXchgInst *I) {
const MCInstrDesc &II = TII.get(Opc);
- const Register AddrReg = constrainOperandRegClass(
- II, getRegForValue(I->getPointerOperand()), II.getNumDefs());
- const Register DesiredReg = constrainOperandRegClass(
- II, getRegForValue(I->getCompareOperand()), II.getNumDefs() + 1);
- const Register NewReg = constrainOperandRegClass(
- II, getRegForValue(I->getNewValOperand()), II.getNumDefs() + 2);
+ Register AddrReg = getRegForValue(I->getPointerOperand());
+ Register DesiredReg = getRegForValue(I->getCompareOperand());
+ Register NewReg = getRegForValue(I->getNewValOperand());
+
+ if (!AddrReg || !DesiredReg || !NewReg)
+ return false;
+
+ AddrReg = constrainOperandRegClass(II, AddrReg, II.getNumDefs());
+ DesiredReg = constrainOperandRegClass(II, DesiredReg, II.getNumDefs() + 1);
+ NewReg = constrainOperandRegClass(II, NewReg, II.getNumDefs() + 2);
const Register ResultReg1 = createResultReg(ResRC);
const Register ResultReg2 = createResultReg(&AArch64::GPR32RegClass);
diff --git a/llvm/test/CodeGen/AArch64/fast-isel-atomic-fallback.ll b/llvm/test/CodeGen/AArch64/fast-isel-atomic-fallback.ll
new file mode 100644
index 0000000000000..16ef0cbc8a810
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fast-isel-atomic-fallback.ll
@@ -0,0 +1,50 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=arm64_32-apple-darwin -O0 -fast-isel -verify-machineinstrs \
+; RUN: -aarch64-enable-atomic-cfg-tidy=0 -aarch64-enable-collect-loh=0 \
+; RUN: < %s | FileCheck %s
+
+; FastISel doesn't support cstexprs as operands here, but make
+; sure it knows to fallback, at least.
+
+define void @atomic_store_cstexpr_addr(i32 %val) #0 {
+; CHECK-LABEL: atomic_store_cstexpr_addr:
+; CHECK: ; %bb.0:
+; CHECK-NEXT: adrp x8, _g@PAGE
+; CHECK-NEXT: add x8, x8, _g@PAGEOFF
+; CHECK-NEXT: ; kill: def $w1 killed $w8 killed $x8
+; CHECK-NEXT: adrp x8, _g@PAGE
+; CHECK-NEXT: add x8, x8, _g@PAGEOFF
+; CHECK-NEXT: stlr w0, [x8]
+; CHECK-NEXT: ret
+ store atomic i32 %val, ptr inttoptr (i32 ptrtoint (ptr @g to i32) to ptr) release, align 4
+ ret void
+}
+
+define i32 @cmpxchg_cstexpr_addr(i32 %cmp, i32 %new, ptr %ps) #0 {
+; CHECK-LABEL: cmpxchg_cstexpr_addr:
+; CHECK: ; %bb.0:
+; CHECK-NEXT: mov w8, w0
+; CHECK-NEXT: adrp x10, _g@PAGE
+; CHECK-NEXT: add x10, x10, _g@PAGEOFF
+; CHECK-NEXT: LBB1_1: ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT: ldaxr w0, [x10]
+; CHECK-NEXT: cmp w0, w8
+; CHECK-NEXT: b.ne LBB1_3
+; CHECK-NEXT: ; %bb.2: ; in Loop: Header=BB1_1 Depth=1
+; CHECK-NEXT: stlxr w9, w1, [x10]
+; CHECK-NEXT: cbnz w9, LBB1_1
+; CHECK-NEXT: LBB1_3:
+; CHECK-NEXT: subs w8, w0, w8
+; CHECK-NEXT: cset w8, eq
+; CHECK-NEXT: ; kill: def $w1 killed $w8
+; CHECK-NEXT: str w8, [x2]
+; CHECK-NEXT: ret
+ %tmp0 = cmpxchg ptr inttoptr (i32 ptrtoint (ptr @g to i32) to ptr), i32 %cmp, i32 %new seq_cst seq_cst
+ %tmp1 = extractvalue { i32, i1 } %tmp0, 0
+ %tmp2 = extractvalue { i32, i1 } %tmp0, 1
+ %tmp3 = zext i1 %tmp2 to i32
+ store i32 %tmp3, ptr %ps
+ ret i32 %tmp1
+}
+
+@g = global i32 0
|
This has been a latent bug for almost 10 years, but is relatively hard to trigger, needing an address operand that isn't handled by getRegForValue (in the test here, constexpr casts). When that happens, it returns 0, which FastISel happily uses as a register operand, all the way to asm, where we either get a crash on an invalid register, or a silently corrupt instruction. Unfortunately, FastISel is still enabled at -O0 for at least ILP32/arm64_32.
d71a694 to
880aee1
Compare
IanButterworth
added a commit
to IanButterworth/julia
that referenced
this pull request
Dec 7, 2025
FastISel was disabled on AArch64 in 2015 (PR JuliaLang#13393) to fix issue JuliaLang#13321, but that issue was specifically about 32-bit ARM (ARMv7) segfaults during bootstrap. The AArch64 exclusion was added conservatively alongside the ARM fix. AArch64 FastISel has been actively maintained upstream with recent bug fixes: - llvm/llvm-project#75993 (Jan 2024) - llvm/llvm-project#133987 (May 2025) This enables faster instruction selection for JIT compilation on AArch64 at lower optimization levels, reducing compilation latency.
IanButterworth
added a commit
to IanButterworth/julia
that referenced
this pull request
Dec 7, 2025
FastISel was disabled on AArch64 in 2015 (PR JuliaLang#13393) to fix issue JuliaLang#13321, but that issue was specifically about 32-bit ARM (ARMv7) segfaults during bootstrap. The AArch64 exclusion was added conservatively alongside the ARM fix. AArch64 FastISel has been actively maintained upstream with recent bug fixes: - llvm/llvm-project#75993 (Jan 2024) - llvm/llvm-project#133987 (May 2025) This enables faster instruction selection for JIT compilation on AArch64 at lower optimization levels, reducing compilation latency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This has been a latent bug for almost 10 years, but is relatively hard to trigger, needing an address operand that isn't handled by getRegForValue (in the test here, constexpr casts). When that happens, it returns 0, which FastISel happily uses as a register operand, all the way to asm, where we either get a crash on an invalid register, or a silently corrupt instruction.
Unfortunately, FastISel is still enabled at -O0 for at least ILP32/arm64_32.