Skip to content

x86: fix musttail sibcall miscompilation#168956

Merged
rnk merged 11 commits intollvm:mainfrom
folkertdev:x86-musttail-sibcalls
Jan 14, 2026
Merged

x86: fix musttail sibcall miscompilation#168956
rnk merged 11 commits intollvm:mainfrom
folkertdev:x86-musttail-sibcalls

Conversation

@folkertdev
Copy link
Copy Markdown
Contributor

@folkertdev folkertdev commented Nov 20, 2025

fixes #56891
fixes #72390
fixes #147813

Currently the x86 backend miscompiles straightforward tail calls when the stack is used for argument passing. This program segfaults on any optimization level:

https://godbolt.org/z/5xr99jr4v

typedef struct {
    uint64_t x;
    uint64_t y;
    uint64_t z;
} S;

__attribute__((noinline))
uint64_t callee(S s) {
    return s.x + s.y + s.z;
}

__attribute__((noinline))
uint64_t caller(S s) {
    [[clang::musttail]]
    return callee(s);
}

The immediate issue is that caller decides to shuffle values around on the stack, and in the process writes to *rsp, which contains the return address. With the return address trashed, the ret in callee jumps to an invalid address.

caller:
        mov     rax, qword ptr [rsp + 24]
        mov     qword ptr [rsp + 16], rax
        movaps  xmm0, xmmword ptr [rsp + 8]
        movups  xmmword ptr [rsp], xmm0 ; <-- that is just all kinds of wrong
        movaps  xmmword ptr [rsp + 8], xmm0
        mov     qword ptr [rsp + 24], rax
        jmp     callee

However, I think the actual problem is that the x86 backend never considers musttail calls to be sibcalls. For sibcalls, no stack reshuffling is required at all, circumventing the problem here.

This PR essentially copies https://reviews.llvm.org/D131034 (cc @huangjd), but this time I hope we can actually land this, and solve this problem.

The aarch64 backend also miscompiled this example, but they appear to have fixed it in LLVM 20.

Tail calls just not working for any sort of non-trivial argument types is a blocker for tail call support in rust, see rust-lang/rust#144855 (comment).

@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Nov 20, 2025

@llvm/pr-subscribers-backend-x86

Author: Folkert de Vries (folkertdev)

Changes

fixes #56891
fixes #72390

Currently the x86 backend miscompiles straightforward tail calls when the stack is used for argument passing. This program segfaults on any optimization level:

https://godbolt.org/z/5xr99jr4v

typedef struct {
    uint64_t x;
    uint64_t y;
    uint64_t z;
} S;

__attribute__((noinline))
uint64_t callee(S s) {
    return s.x + s.y + s.z;
}

__attribute__((noinline))
uint64_t caller(S s) {
    [[clang::musttail]]
    return callee(s);
}

The immediate issue is that caller decides to shuffle values around on the stack, and in the process writes to *rsp, which contains the return address. With the return address trashed, the ret in callee jumps to an invalid address.

caller:
        mov     rax, qword ptr [rsp + 24]
        mov     qword ptr [rsp + 16], rax
        movaps  xmm0, xmmword ptr [rsp + 8]
        movups  xmmword ptr [rsp], xmm0 ; &lt;-- that is just all kinds of wrong
        movaps  xmmword ptr [rsp + 8], xmm0
        mov     qword ptr [rsp + 24], rax
        jmp     callee

However, I think the actual problem is that the x86 backend never considers musttail calls to be sibcalls. For sibcalls, no stack reshuffling is required at all, circumventing the problem here.

This PR essentially copies https://reviews.llvm.org/D131034, but this time I hope we can actually land this, and solve this problem.

The aarch64 backend also miscompiled this example, but they appear to have fixed it in LLVM 20.

Tail calls just not working for any sort of non-trivial argument types is a blocker for tail call support in rust, see rust-lang/rust#144855 (comment).


Full diff: https://github.com/llvm/llvm-project/pull/168956.diff

4 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLoweringCall.cpp (+12-8)
  • (added) llvm/test/CodeGen/X86/musttail-struct.ll (+57)
  • (modified) llvm/test/CodeGen/X86/musttail-tailcc.ll (-18)
  • (modified) llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll (+2-4)
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index 37d77728882b1..e53e5a2a02cb4 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -2098,15 +2098,18 @@ X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
       isTailCall = false;
   }
 
-  if (isTailCall && !IsMustTail) {
+  if (isTailCall) {
     // Check if it's really possible to do a tail call.
-    isTailCall = IsEligibleForTailCallOptimization(CLI, CCInfo, ArgLocs,
-                                                   IsCalleePopSRet);
+    IsSibcall = IsEligibleForTailCallOptimization(CLI, CCInfo, ArgLocs,
+                                                  IsCalleePopSRet);
+
+    if (!IsMustTail) {
+      isTailCall = IsSibcall;
 
-    // Sibcalls are automatically detected tailcalls which do not require
-    // ABI changes.
-    if (!IsGuaranteeTCO && isTailCall)
-      IsSibcall = true;
+      // Sibcalls are automatically detected tailcalls which do not require
+      // ABI changes.
+      IsSibcall = IsSibcall && !IsGuaranteeTCO;
+    }
 
     if (isTailCall)
       ++NumTailCalls;
@@ -2128,8 +2131,9 @@ X86TargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
   else if (IsGuaranteeTCO && canGuaranteeTCO(CallConv))
     NumBytes = GetAlignedArgumentStackSize(NumBytes, DAG);
 
+  // A sibcall is ABI-compatible and does not need to adjust the stack pointer.
   int FPDiff = 0;
-  if (isTailCall &&
+  if (isTailCall && !IsSibcall &&
       shouldGuaranteeTCO(CallConv,
                          MF.getTarget().Options.GuaranteedTailCallOpt)) {
     // Lower arguments at fp - stackoffset + fpdiff.
diff --git a/llvm/test/CodeGen/X86/musttail-struct.ll b/llvm/test/CodeGen/X86/musttail-struct.ll
new file mode 100644
index 0000000000000..0ae6b91d4fc02
--- /dev/null
+++ b/llvm/test/CodeGen/X86/musttail-struct.ll
@@ -0,0 +1,57 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown | FileCheck %s
+; RUN: llc < %s -mtriple=i686-unknown-unknown | FileCheck %s
+
+; Test correct handling of a musttail call with a byval struct argument.
+
+%struct.1xi32 = type { [1 x i32] }
+%struct.3xi32 = type { [3 x i32] }
+%struct.5xi32 = type { [5 x i32] }
+
+declare dso_local i32 @Func1(ptr byval(%struct.1xi32) %0)
+declare dso_local i32 @Func3(ptr byval(%struct.3xi32) %0)
+declare dso_local i32 @Func5(ptr byval(%struct.5xi32) %0)
+declare dso_local i32 @FuncManyArgs(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7)
+
+define dso_local i32 @test1(ptr byval(%struct.1xi32) %0) {
+; CHECK-LABEL: test1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jmp Func1 # TAILCALL
+  %r = musttail call i32 @Func1(ptr byval(%struct.1xi32) %0)
+  ret i32 %r
+}
+
+define dso_local i32 @test3(ptr byval(%struct.3xi32) %0) {
+; CHECK-LABEL: test3:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jmp Func3 # TAILCALL
+  %r = musttail call i32 @Func3(ptr byval(%struct.3xi32) %0)
+  ret i32 %r
+}
+
+; sizeof(%struct.5xi32) > 16, in x64 this is passed on stack.
+define dso_local i32 @test5(ptr byval(%struct.5xi32) %0) {
+; CHECK-LABEL: test5:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jmp Func5 # TAILCALL
+  %r = musttail call i32 @Func5(ptr byval(%struct.5xi32) %0)
+  ret i32 %r
+}
+
+; Test passing multiple arguments with different sizes on stack. In x64 Linux
+; the first 6 are passed by register.
+define dso_local i32 @testManyArgs(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7) {
+; CHECK-LABEL: testManyArgs:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jmp FuncManyArgs # TAILCALL
+  %r = musttail call i32 @FuncManyArgs(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7)
+  ret i32 %r
+}
+
+define dso_local i32 @testRecursion(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7) {
+; CHECK-LABEL: testRecursion:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    jmp testRecursion # TAILCALL
+  %r = musttail call i32 @testRecursion(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i8 %6, ptr byval(%struct.5xi32) %7)
+  ret i32 %r
+}
diff --git a/llvm/test/CodeGen/X86/musttail-tailcc.ll b/llvm/test/CodeGen/X86/musttail-tailcc.ll
index fae698d53b927..f1ffbcb1142c5 100644
--- a/llvm/test/CodeGen/X86/musttail-tailcc.ll
+++ b/llvm/test/CodeGen/X86/musttail-tailcc.ll
@@ -55,15 +55,6 @@ define dso_local tailcc void @void_test(i32, i32, i32, i32) {
 ;
 ; X86-LABEL: void_test:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    pushl %esi
-; X86-NEXT:    .cfi_def_cfa_offset 8
-; X86-NEXT:    .cfi_offset %esi, -8
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl %esi, {{[0-9]+}}(%esp)
-; X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
-; X86-NEXT:    popl %esi
-; X86-NEXT:    .cfi_def_cfa_offset 4
 ; X86-NEXT:    jmp void_test # TAILCALL
   entry:
    musttail call tailcc void @void_test( i32 %0, i32 %1, i32 %2, i32 %3)
@@ -77,15 +68,6 @@ define dso_local tailcc i1 @i1test(i32, i32, i32, i32) {
 ;
 ; X86-LABEL: i1test:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    pushl %esi
-; X86-NEXT:    .cfi_def_cfa_offset 8
-; X86-NEXT:    .cfi_offset %esi, -8
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl %esi, {{[0-9]+}}(%esp)
-; X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
-; X86-NEXT:    popl %esi
-; X86-NEXT:    .cfi_def_cfa_offset 4
 ; X86-NEXT:    jmp i1test # TAILCALL
   entry:
   %4 = musttail call tailcc i1 @i1test( i32 %0, i32 %1, i32 %2, i32 %3)
diff --git a/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll b/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll
index cd669768705e5..b901d22f66392 100644
--- a/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll
+++ b/llvm/test/CodeGen/X86/swifttailcc-store-ret-address-aliasing-stack-slot.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
 ; RUN: llc %s -o - | FileCheck %s
 
 target triple = "x86_64-apple-macosx"
@@ -24,9 +25,7 @@ define swifttailcc void @test(ptr %0, ptr swiftasync %1, i64 %2, i64 %3, ptr %4,
 ; CHECK-NEXT:    movq {{[0-9]+}}(%rsp), %r15
 ; CHECK-NEXT:    callq _foo
 ; CHECK-NEXT:    movq %r14, (%rax)
-; CHECK-NEXT:    movl [[OFF:[0-9]+]](%rsp), %edx
-; CHECK-NEXT:    movq {{[0-9]+}}(%rsp), %rcx
-; CHECK-NEXT:    movq %rcx, [[OFF]](%rsp)
+; CHECK-NEXT:    movl {{[0-9]+}}(%rsp), %edx
 ; CHECK-NEXT:    movq %rax, %r14
 ; CHECK-NEXT:    movq %r13, %rdi
 ; CHECK-NEXT:    movq %r15, %rsi
@@ -34,7 +33,6 @@ define swifttailcc void @test(ptr %0, ptr swiftasync %1, i64 %2, i64 %3, ptr %4,
 ; CHECK-NEXT:    addq $8, %rsp
 ; CHECK-NEXT:    popq %rbx
 ; CHECK-NEXT:    popq %r15
-; CHECK-NEXT:    addq $16, %rsp
 ; CHECK-NEXT:    jmp _tc_fn ## TAILCALL
 entry:
   %res = tail call ptr @foo()

@folkertdev folkertdev changed the title fix x86 musttail sibcall miscompilation x86: fix musttail sibcall miscompilation Nov 20, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 20, 2025

🐧 Linux x64 Test Results

  • 188516 tests passed
  • 5006 tests skipped

✅ The build succeeded and all tests passed.

@efriedma-quic
Copy link
Copy Markdown
Contributor

As discussed in https://reviews.llvm.org/D131034, this is still broken in cases where the argument lists don't match. If you want an example of how to handle this case, see #109943.

@folkertdev
Copy link
Copy Markdown
Contributor Author

As discussed in https://reviews.llvm.org/D131034, this is still broken in cases where the argument lists don't match. If you want an example of how to handle this case, see #109943.

Interesting, I had been looking for something like that. However, #168506 appears to use a different, simpler approach to order the reads and writes. Does that look right (and perhaps preferable) to you?

@efriedma-quic
Copy link
Copy Markdown
Contributor

As far as I can tell, LoongArchTargetLowering::LowerCall always copies byval arguments to a temporary, then copies them into the argument list. Which is a legal implementation, but not really optimized.

@folkertdev
Copy link
Copy Markdown
Contributor Author

Allright, so generally you'd like to see an x86 implementation that resembles the arm one more? Because currently the x86 implementation is quite different (and, as established, very broken).

This will be tough (for me) but I'm willing to give that a go.

@efriedma-quic
Copy link
Copy Markdown
Contributor

I'd prefer the more optimized implementation... but really, I'd be okay with any complete implementation that doesn't regress non-musttail cases. I just don't want to leave a lurking miscompile.

Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/test/CodeGen/X86/musttail-struct.ll
@folkertdev folkertdev force-pushed the x86-musttail-sibcalls branch from 839564d to 4636011 Compare November 22, 2025 19:52
@folkertdev
Copy link
Copy Markdown
Contributor Author

Allright, I gave this a go, but I'll need some help polishing this.

Comment thread llvm/test/CodeGen/X86/musttail-struct.ll
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Copy link
Copy Markdown
Contributor Author

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was very helpful, thanks!

Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
@folkertdev folkertdev force-pushed the x86-musttail-sibcalls branch 3 times, most recently from 05af31d to 238805f Compare November 25, 2025 12:44
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/test/CodeGen/X86/musttail-struct.ll Outdated
Comment thread llvm/test/CodeGen/X86/musttail-inalloca.ll Outdated
@folkertdev folkertdev force-pushed the x86-musttail-sibcalls branch from 238805f to 346a911 Compare November 25, 2025 19:08
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Currently this logic is only actually used for musttail calls.
In particular, we don't need to bother with shuffling arguments around
on the stack, because in the x86 backend, only functions that do not
need to move arguments around on the stack are considered sibcalls.
@folkertdev folkertdev force-pushed the x86-musttail-sibcalls branch from 346a911 to e5dbe7b Compare November 25, 2025 21:49
Copy link
Copy Markdown
Contributor

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when you're ready for another round of review.

Comment thread llvm/test/CodeGen/X86/musttail-struct.ll Outdated
Comment thread llvm/test/CodeGen/X86/musttail-varargs.ll Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@rnk
Copy link
Copy Markdown
Contributor

rnk commented Jan 14, 2026

Bit of a procedural question: how should I handle rebasing and force pushes? For me switching branches between different PRs kind of trashes the cache (ccache helps, but it's still a couple of minutes). So, instead I rebase to whatever latest upstream I've pulled in, but then every push is a force push, and reviewing those is difficult (and sometimes buggy).

The official recommendation is to merge main into the PR branch: https://llvm.org/docs/GitHub.html#updating-pull-requests

But, I agree, it's not very ergonomic and it usually makes rebuilds slow.

Copy link
Copy Markdown
Contributor

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should work towards merging this because correctness is the first priority.

However, if we want to be more ambitious and generate better code, I think we want to try to use the register file as the temporary storage for our copies. With this PR, we generate this sequence for each byval arg:

  • load from source
  • store to temp
  • load from temp
  • store to byval destination

Really what we want is a sequence of:

  • load all parts of all byval args into virtual registers
  • store all parts of all byval args into destination stack slots

Doing all the loads up front ensures that our writes don't alias our reads. If there's too much register pressure, the allocator will spill what it needs to.

However, implementing that is super complicated, and the common case is just forwarding byval parameters in place, so I think we should just iterate on the PR as is and get this unfortunate correctness issue (I apologize, it was pre-existing!) fixed ASAP.

I had some simplification suggestions to iterate on, but I don't see blockers.

// Argument value is currently in the outgoing argument area, but not at
// the correct offset, so needs copying via a temporary in local stack
// space.
CopyViaTemp,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I read this, I asked myself: should/could we handle this at the IR level? I believe, but am not 100% confident, that byval for all targets models large objects passed in memory. We could establish the invariant that all LLVM IR going into codegen either a) forwards a byval parameter in place or b) reads from a local stack object.

After reading the rest of the PR, I don't think it's worth the extra complexity of adding a new CodeGenPrepare invariant, but I felt like it was worth rasing the option.

Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
; X32-NEXT: mov dword ptr [esp + 8], 2
; X32-NEXT: mov dword ptr [esp + 12], 3
; X32-NEXT: mov dword ptr [esp + 16], 4
; X32-NEXT: mov dword ptr [esp + 24], 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently some LLVM subsystem can do MIR load-store-forwarding, that's wild.

Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@folkertdev folkertdev requested a review from rnk January 14, 2026 10:30
Copy link
Copy Markdown
Contributor

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some suggested edits to fix some final nits.

Thanks for the thorough test cases!

Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
}

if (isTailCall && !IsMustTail) {
if (isTailCall) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block can be simplified to avoid the eligibility check for the "guaranteed" TCO case, and I made some comment suggestions that hopefully explain this better.

Suggested change
if (isTailCall) {
// Check if this tail call is a "sibling" call, which is loosely defined to
// be a tail call that doesn't require heroics like moving the return address
// or swapping byval arguments.
bool IsSibcall = false;
if (isTailCall && !ShouldGuaranteeTCO) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tailcc tests regress when !ShouldGuaranteeTCO is included in the condition (they start explicitly copying arguments to/from the stack). If a tailcc call happens to have the shape of a sibcall, it should be fine to use that code path right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included some of your comment changes, but the code changes caused regressions. Can you maybe formulate what change you want to see here?

The variable names are not the most clear here, but I haven't really been able to come up with better ones. I think isTailCall sort of switches meaning here though, from "we want this to be a tail call if possible" to "this should codegen as a tail call".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate, because this logic is tricky, the regressions appear in cases like this one:

define dso_local tailcc void @void_test(i32, i32, i32, i32) {
; X64-LABEL: void_test:
; X64:       # %bb.0: # %entry
; X64-NEXT:    jmp void_test # TAILCALL
;
; X86-LABEL: void_test:
; X86:       # %bb.0: # %entry
; X86-NEXT:    jmp void_test # TAILCALL
  entry:
   musttail call tailcc void @void_test( i32 %0, i32 %1, i32 %2, i32 %3)
   ret void
}

With the current logic, both isTailCall and IsSibcall are true, with your suggestion IsSibcall is false. I think it is harmless to conclude that this is in fact a sibcall, and use the more optimal codegen that it enables.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this case is IsMustTail && ShouldGuaranteeTCO. If we skip the eligibility check in this case, we don't get the good codegen.

I think more simplifications are possible, but I'll torture the boolean conditionals later.

Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp
;
; X86-LABEL: void_test:
; X86: # %bb.0: # %entry
; X86-NEXT: pushl %esi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is evidence that this change fixes issue #147813, you've eliminated the extra stores by classifying certain calls as "sibling" calls.

Copy link
Copy Markdown
Contributor

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some suggested edits to fix some final nits.

Thanks for the thorough test cases!

@rnk
Copy link
Copy Markdown
Contributor

rnk commented Jan 14, 2026

Apparently llvm 22 branches this Tuesday. So this either makes it or it doesn't (that's life), just making sure I did everything I could.

I think it would be reasonable to cherry pick this to the release. It's a correctness issue.

I'm surprised these bugs lasted so long in the code. I think the story here is that guaranteed TCO support was added to LLVM back when it was an academic research project (late 2000s), where there's a lot more interest in TCO guarantees. As LLVM was adopted in production at Apple, Evan Cheng added the concept of sibcalls, and disabled guaranteed TCO by default, which hid these latent correctness issues from testing. Later, when I added musttail (late 2010s), I simply reused all this existing guaranteed TCO code to handle non-sibcall musttail calls, and assumed it was correct and covered by existing tests. Still, the bugs existed, and survived until now because musttail was mostly used for Microsoft C++ ABI details, which doesn't use byval for x86_64, until it was added as a Clang frontend attribute for interpreter loop optimization. Interpreter loop optimization is still a very niche use case, and the developers that used the feature are carefully optimizing the musttail arguments to ensure that everything is passed in registers, not memory. I recall one of the protobuf engineers reported this correctness issue to our team and I wanted to prioritize fixing it, but I was overworked and it was hard to prioritize the backlog.

@rnk
Copy link
Copy Markdown
Contributor

rnk commented Jan 14, 2026

cc @nigham @sbenzaquen This PR should fix a Google-internal bug that Sam filed about musttail.

@folkertdev
Copy link
Copy Markdown
Contributor Author

I think it would be reasonable to cherry pick this to the release. It's a correctness issue.

That would be neat, and would unblock further experimentation in rustc as well.

I'm surprised these bugs lasted so long in the code.

Thanks for that context. I really had been wondering how this could be so broken in the most-used LLVM backend.

@folkertdev folkertdev force-pushed the x86-musttail-sibcalls branch from 9e48082 to 8faac44 Compare January 14, 2026 19:18
@folkertdev folkertdev requested a review from rnk January 14, 2026 19:35
Copy link
Copy Markdown
Contributor

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suggested fixes to avoid truncating integers didn't get applied.

}

if (isTailCall && !IsMustTail) {
if (isTailCall) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this case is IsMustTail && ShouldGuaranteeTCO. If we skip the eligibility check in this case, we don't get the good codegen.

I think more simplifications are possible, but I'll torture the boolean conditionals later.

Comment thread llvm/lib/Target/X86/X86ISelLoweringCall.cpp Outdated
Copy link
Copy Markdown
Contributor

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, they were applied, I was looking at the wrong view

Copy link
Copy Markdown
Contributor

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(resolving comments)

@rnk rnk merged commit 782bf6a into llvm:main Jan 14, 2026
11 checks passed
@folkertdev
Copy link
Copy Markdown
Contributor Author

Awesome, thanks for the reviews!

Is there anything that should be done to nominate this for cherry-picking into LLVM 22?

@rnk
Copy link
Copy Markdown
Contributor

rnk commented Jan 14, 2026

LLVM has a substantial amount of post-commit CI, so I would let it bake for a few days to increase our confidence in its correctness. I set a reminder to propose it on Friday. I think these are the docs on the process:
https://releases.llvm.org/18.1.8/docs/GitHub.html#backporting-fixes-to-the-release-branches

Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
fixes llvm#56891
fixes llvm#72390
fixes llvm#147813

Currently the x86 backend miscompiles straightforward tail calls when
the stack is used for argument passing. This program segfaults on any
optimization level:

https://godbolt.org/z/5xr99jr4v

```c
typedef struct {
    uint64_t x;
    uint64_t y;
    uint64_t z;
} S;

__attribute__((noinline))
uint64_t callee(S s) {
    return s.x + s.y + s.z;
}

__attribute__((noinline))
uint64_t caller(S s) {
    [[clang::musttail]]
    return callee(s);
}
```

The immediate issue is that `caller` decides to shuffle values around on
the stack, and in the process writes to `*rsp`, which contains the
return address. With the return address trashed, the `ret` in `callee`
jumps to an invalid address.

```asm
caller:
        mov     rax, qword ptr [rsp + 24]
        mov     qword ptr [rsp + 16], rax
        movaps  xmm0, xmmword ptr [rsp + 8]
        movups  xmmword ptr [rsp], xmm0 ; <-- that is just all kinds of wrong
        movaps  xmmword ptr [rsp + 8], xmm0
        mov     qword ptr [rsp + 24], rax
        jmp     callee
```

However, I think the actual problem is that the x86 backend never
considers `musttail` calls to be sibcalls. For sibcalls, no stack
reshuffling is required at all, circumventing the problem here.

This PR essentially copies https://reviews.llvm.org/D131034 (cc
@huangjd), but this time I hope we can actually land this, and solve
this problem.

The aarch64 backend also miscompiled this example, but they appear to
have fixed it in LLVM 20.

Tail calls just not working for any sort of non-trivial argument types
is a blocker for tail call support in rust, see
rust-lang/rust#144855 (comment).
c-rhodes pushed a commit to llvmbot/llvm-project that referenced this pull request Jan 20, 2026
fixes llvm#56891
fixes llvm#72390
fixes llvm#147813

Currently the x86 backend miscompiles straightforward tail calls when
the stack is used for argument passing. This program segfaults on any
optimization level:

https://godbolt.org/z/5xr99jr4v

```c
typedef struct {
    uint64_t x;
    uint64_t y;
    uint64_t z;
} S;

__attribute__((noinline))
uint64_t callee(S s) {
    return s.x + s.y + s.z;
}

__attribute__((noinline))
uint64_t caller(S s) {
    [[clang::musttail]]
    return callee(s);
}
```

The immediate issue is that `caller` decides to shuffle values around on
the stack, and in the process writes to `*rsp`, which contains the
return address. With the return address trashed, the `ret` in `callee`
jumps to an invalid address.

```asm
caller:
        mov     rax, qword ptr [rsp + 24]
        mov     qword ptr [rsp + 16], rax
        movaps  xmm0, xmmword ptr [rsp + 8]
        movups  xmmword ptr [rsp], xmm0 ; <-- that is just all kinds of wrong
        movaps  xmmword ptr [rsp + 8], xmm0
        mov     qword ptr [rsp + 24], rax
        jmp     callee
```

However, I think the actual problem is that the x86 backend never
considers `musttail` calls to be sibcalls. For sibcalls, no stack
reshuffling is required at all, circumventing the problem here.

This PR essentially copies https://reviews.llvm.org/D131034 (cc
@huangjd), but this time I hope we can actually land this, and solve
this problem.

The aarch64 backend also miscompiled this example, but they appear to
have fixed it in LLVM 20.

Tail calls just not working for any sort of non-trivial argument types
is a blocker for tail call support in rust, see
rust-lang/rust#144855 (comment).

(cherry picked from commit 782bf6a)
folkertdev added a commit that referenced this pull request Feb 11, 2026
Basically #168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- #168956
- rust-lang/rust#148748

---------

Co-authored-by: WANG Rui <wangrui@loongson.cn>
llvm-sync Bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 11, 2026
Basically llvm/llvm-project#168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- llvm/llvm-project#168956
- rust-lang/rust#148748

---------

Co-authored-by: WANG Rui <wangrui@loongson.cn>
kevinwkt pushed a commit to kevinwkt/llvm-project that referenced this pull request Feb 16, 2026
Basically llvm#168506 but for
riscv, so to be clear the hard work here is @heiher 's. I figured we may
as well get some extra eyeballs on this from riscv too.

Previously the riscv backend could not handle `musttail` calls with more
arguments than fit in registers, or any explicit `byval` or `sret`
parameters/return values. Those have now been implemented.

This is part of my push to get more LLVM backends to support `byval` and
`sret` parameters so that rust can stabilize guaranteed tail call
support. See also:

- llvm#168956
- rust-lang/rust#148748

---------

Co-authored-by: WANG Rui <wangrui@loongson.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

4 participants