Using the check-cfc tool ( https://github.com/llvm/llvm-project/tree/master/clang/utils/check_cfc ) to spot a codegen difference depending on whether -g is specified or not. $ cat PowerParser.ii.cc template <typename, typename = int> class e; class allocator { public: ~allocator(); }; template <typename, typename> class e { public: e(char *, allocator = allocator()); }; template <typename b, typename c, typename d> bool operator==(e<c, d>, b); class f { public: f(int *, int *, int *, int, int, int, int); e<char> g(); void j(); }; int h, i; class k { void l(); bool m_fn4(); int m; int n; int q; int fmap; }; void k::l() { e<char> o = ""; for (;;) { int p = 0; for (;;) { if (m_fn4()) break; f a(&q, &fmap, &m, n, h, i, 0); if (a.g() == "") a.j(); } } } $ ./llvm-project/clang/utils/check_cfc/clang++ PowerParser.ii.cc -w -c -O1 -o tmp.ll Check CFC, checking: dash_g_no_change PowerParser.ii.cc Code difference detected with -g --- /tmp/tmpcdg_LH.o +++ /tmp/tmpfkaPkW.o @@ -19,6 +19,6 @@ 28: 4c 8d 73 08 lea 0x8(%rbx),%r14 2c: 4c 8d 7b 0c lea 0xc(%rbx),%r15 30: 4c 8d 64 24 08 lea 0x8(%rsp),%r12 - 35: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) - 3c: 00 00 00 - 3f: 90 nop + 35: eb 09 jmp 40 <_ZN1k1lEv+0x40> + 37: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) + 3e: 00 00 *** Diff truncated ***
Not sure below research is helpful: Compare the code with "-c -O3" and "-c -O3 -g", there are little different. ref to: https://godbolt.org/z/KeVedB while compare the line 29: e<char> o = ""; the differences are: baseline: jmp 40 <k::l()+0x40> nop WORD PTR [rax+rax*1+0x0] with debug: nop WORD PTR cs:[rax+rax*1+0x0] nop I try to analyze deeply, but it's hard for me to find the code which impacted the "-O" and "-g" while compare "e<char> o = """ to .ll, very appreciate if someone could give some suggests.
more debug info: using "-mllvm -opt-bisect-limit=2 -c -O3", find the difference, seems SROA pass has some issue there. Continue to look inside SROA pass. BISECT: running pass (1) Simplify the CFG on function (_ZN1k1lEv) BISECT: running pass (2) SROA on function (_ZN1k1lEv) BISECT: NOT running pass (3) Early CSE on function (_ZN1k1lEv) ###### < a.o: file format elf64-x86-64 --- > b.o: file format elf64-x86-64 18,55c18,54 < 28: eb 00 jmp 2a <_ZN1k1lEv+0x2a> < 2a: 48 89 df mov %rbx,%rdi < 2d: e8 00 00 00 00 callq 32 <_ZN1k1lEv+0x32> < 32: a8 01 test $0x1,%al < 34: 0f 85 86 00 00 00 jne c0 <_ZN1k1lEv+0xc0> < 3a: eb 15 jmp 51 <_ZN1k1lEv+0x51> < 3c: 48 89 c3 mov %rax,%rbx < 3f: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi < 44: e8 00 00 00 00 callq 49 <_ZN1k1lEv+0x49> < 49: 48 89 df mov %rbx,%rdi < 4c: e8 00 00 00 00 callq 51 <_ZN1k1lEv+0x51> < 51: 31 c0 xor %eax,%eax < 53: 48 89 de mov %rbx,%rsi < 56: 48 81 c6 08 00 00 00 add $0x8,%rsi < 5d: 48 89 da mov %rbx,%rdx < 60: 48 81 c2 0c 00 00 00 add $0xc,%rdx < 67: 44 8b 43 04 mov 0x4(%rbx),%r8d < 6b: 44 8b 0c 25 00 00 00 mov 0x0,%r9d < 72: 00 < 73: 8b 04 25 00 00 00 00 mov 0x0,%eax < 7a: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi < 7f: 48 89 d9 mov %rbx,%rcx < 82: 89 04 24 mov %eax,(%rsp) < 85: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp) < 8c: 00 < 8d: e8 00 00 00 00 callq 92 <_ZN1k1lEv+0x92> < 92: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi < 97: e8 00 00 00 00 callq 9c <_ZN1k1lEv+0x9c> < 9c: 48 bf 00 00 00 00 00 movabs $0x0,%rdi < a3: 00 00 00 < a6: e8 00 00 00 00 callq ab <_ZN1k1lEv+0xab> < ab: a8 01 test $0x1,%al < ad: 75 02 jne b1 <_ZN1k1lEv+0xb1> < af: eb 0a jmp bb <_ZN1k1lEv+0xbb> < b1: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi < b6: e8 00 00 00 00 callq bb <_ZN1k1lEv+0xbb> < bb: e9 6a ff ff ff jmpq 2a <_ZN1k1lEv+0x2a> < c0: e9 63 ff ff ff jmpq 28 <_ZN1k1lEv+0x28> --- > 28: 48 89 df mov %rbx,%rdi > 2b: e8 00 00 00 00 callq 30 <_ZN1k1lEv+0x30> > 30: a8 01 test $0x1,%al > 32: 0f 85 86 00 00 00 jne be <_ZN1k1lEv+0xbe> > 38: eb 15 jmp 4f <_ZN1k1lEv+0x4f> > 3a: 48 89 c3 mov %rax,%rbx > 3d: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi > 42: e8 00 00 00 00 callq 47 <_ZN1k1lEv+0x47> > 47: 48 89 df mov %rbx,%rdi > 4a: e8 00 00 00 00 callq 4f <_ZN1k1lEv+0x4f> > 4f: 31 c0 xor %eax,%eax > 51: 48 89 de mov %rbx,%rsi > 54: 48 81 c6 08 00 00 00 add $0x8,%rsi > 5b: 48 89 da mov %rbx,%rdx > 5e: 48 81 c2 0c 00 00 00 add $0xc,%rdx > 65: 44 8b 43 04 mov 0x4(%rbx),%r8d > 69: 44 8b 0c 25 00 00 00 mov 0x0,%r9d > 70: 00 > 71: 8b 04 25 00 00 00 00 mov 0x0,%eax > 78: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi > 7d: 48 89 d9 mov %rbx,%rcx > 80: 89 04 24 mov %eax,(%rsp) > 83: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp) > 8a: 00 > 8b: e8 00 00 00 00 callq 90 <_ZN1k1lEv+0x90> > 90: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi > 95: e8 00 00 00 00 callq 9a <_ZN1k1lEv+0x9a> > 9a: 48 bf 00 00 00 00 00 movabs $0x0,%rdi > a1: 00 00 00 > a4: e8 00 00 00 00 callq a9 <_ZN1k1lEv+0xa9> > a9: a8 01 test $0x1,%al > ab: 75 02 jne af <_ZN1k1lEv+0xaf> > ad: eb 0a jmp b9 <_ZN1k1lEv+0xb9> > af: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi > b4: e8 00 00 00 00 callq b9 <_ZN1k1lEv+0xb9> > b9: e9 6a ff ff ff jmpq 28 <_ZN1k1lEv+0x28> > be: e9 65 ff ff ff jmpq 28 <_ZN1k1lEv+0x28> ######
One thing to note is that it is not necessarily the SROA pass that's actually causing the issue. Even though you're telling the compiler to not run any passes after SROA it's still having to do a bunch of work later on in order to actually emit the code. It might just be the case that SROA is perfectly validly making some change to the code which happens to allow a code path later on containing the bug to be triggered which otherwise wouldn't have been.
Then, compare the -opt-bisect-limit=<Num> from max to min, find the difference is made by 133 - "Branch Probability Basic Block Placement on function (_ZN1k1lEv)", For "-g" the pass number is 137, not sure if it is correct debug method. clang++ -mllvm -opt-bisect-limit=133 PowerParser.cc -c -O1 -o a.o objdump -d a.o > a.obj clang++ -mllvm -opt-bisect-limit=137 PowerParser.cc -c -O1 -g -o ag.o objdump -d ag.o > ag.obj baseline: 35: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 3c: 00 00 00 3f: 90 nop 40: eb 0e jmp 50 <_ZN1k1lEv+0x50> 42: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 49: 00 00 00 4c: 0f 1f 40 00 nopl 0x0(%rax) with debug: 35: eb 09 jmp 40 <_ZN1k1lEv+0x40> 37: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) Bug from this assembly code, it is still not easy to find the problem code.
The issue seems caused by BranchFolderPass - "Control Flow Optimizer". line: ------------------------ while (I2 != MBB2->end() && I2->isCFIInstruction()) { https://github.com/llvm/llvm-project/blob/21599876be328ff6b5c6cf09544ade7e337cb48d/llvm/lib/CodeGen/BranchFolding.cpp#L403 ----------------------- While handling MBB2 instrs list below, DEBUG instrs should also be skipped when goto SkipTopCFIAndReturn. Begin Instr of the MBB is debug instruction, the second line instr is CFI, both debug and CFI should be skip when ComputeCommonTailLength. Otherwhise, the debug instr impace later Pass (MachineBlockReplacement) with "-g" LLVM_DEBUG printf: --------------------------- MBB2: bb.2 (%ir-block.9): ; predecessors: %bb.1, %bb.2, %bb.6 successors: %bb.2(0x40000000), %bb.4(0x40000000); %bb.2(50.00%), %bb.4(50.00%) liveins: $rbx, $r12, $r14, $r15 DBG_VALUE 0, $noreg, !"p", !DIExpression(), debug-location !76; PowerParser.cc:29:9 line no:29 CFI_INSTRUCTION <unserializable cfi directive>, debug-location !77; PowerParser.cc:31:11 $rdi = COPY renamable $rbx, debug-location !77; PowerParser.cc:31:11 --------------------------- After change the code, just like below, the issue could fixed. while (I2 != MBB2->end() && (I2->isCFIInstruction() || I2->isDebugInstr())) { ++I2; } If the analysis is correct, I would like to submit patch to fix the issue
Thanks for diagnosing -- that code definitely looks suspicious, the comment from line 378 even explains how problems could occur! It's a little odd that most of ComputeCommonTailLength uses the nearby "countsAsInstruction" helper to skip over debug instructions, but the last two loops don't. It might be worth looking at the history / git blame a little, just to see if there's some other justification; but if your patch fixes the reproducer in this test, it's definitely worth submitting.
Patch has been submitted to fix this issue: https://reviews.llvm.org/D66467
when writing MIR test code for this patch, meet one error "unnamed alloca", I try to compile the code without any PASS, the issue is still exist. It looks like the sample code itself is not correct when handling allocator. Steps to compile code without pass: ----------------------------------------- clang++ -g -w -O1 -S -emit-llvm PowerParser.ii.cc -mllvm -opt-bisect-limit=0 -o test.ll llc -stop-before=branch-folder test.ll -opt-bisect-limit=0 -o test.mir llc -o - test.mir -mtriple=x86_64-- -run-pass=branch-folder error: test.mir:298:20: alloca instruction named '<unnamed alloca>' isn't defined in the function '_ZN1k1lEv' - { id: 0, name: '<unnamed alloca>', type: default, offset: -48, size: 8, ----------------------------------------- After run test above, there are many "unnamed alloca" in test.mir. Not sure if it is sample code issue, rather then codegen issue?
Chris wrote: > After run test above, there are many "unnamed alloca" in test.mir. Not sure if it is sample code issue, rather then codegen issue? I've experienced this in the past, I think it's something weird / broken with the MIR representation -- I've never gotten to the bottom of it. Previously I've just fiddled with my test cases until they don't generate any allocas at all. Note that you don't necessarily need a MIR test input that comes straight from a C input: you can delete and modify the MIR until it stimulates the code path you're trying to test. That means you could delete anything to do with un-named allocas in the MIR output, alternately you could copy-and-edit an existing MIR test until it represents the behaviour in branch-folder that you're trying to fix.
closed by commit https://reviews.llvm.org/rGec32dff0b075055b30140c543e9f2bef608adc14
Thanks for working on this! Really happy to have you on board the LLVM project :)
@Greg Bedwell Hi Greg, just feel the LLVM community is very nice and many seniors would take time to kindly help a beginner, especially you, thanks so much :)
It can be reproduced using the PowerParser.ii.cc by clang 7809fa20400000fd40b4a4b56696c7fbcd0f0fa9 (committed at 2021-01-06). So I decide to reopen it. > clang -w -O1 -c PowerParser.ii.cc -o dbg.o -g > clang -w -O1 -c PowerParser.ii.cc -o rel.o Then > objdump -d dbg.o > dbg_objdump > objdump -d rel.o > rel_objdump > diff dbg_objdump rel_objdump 2c2 < dbg.o: file format elf64-x86-64 --- > rel.o: file format elf64-x86-64 26,29c26,29 < 40: eb 0e jmp 50 <_ZN1k1lEv+0x50> < 42: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) < 49: 00 00 00 < 4c: 0f 1f 40 00 nopl 0x0(%rax) --- > 40: e9 0b 00 00 00 jmpq 50 <_ZN1k1lEv+0x50> > 45: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) > 4c: 00 00 00 > 4f: 90 nop
Both debug and release version produce the same LLVM IR when exclude debug info. It seems like a assembler bug, since passing '-fno-integrated-as' to clang makes no difference on machine code.
(In reply to Zhiwei Chen from comment #13) > It can be reproduced using the PowerParser.ii.cc by clang > 7809fa20400000fd40b4a4b56696c7fbcd0f0fa9 (committed at 2021-01-06). So I > decide to reopen it. > > > clang -w -O1 -c PowerParser.ii.cc -o dbg.o -g > > clang -w -O1 -c PowerParser.ii.cc -o rel.o > > Then > > > objdump -d dbg.o > dbg_objdump > > objdump -d rel.o > rel_objdump > > diff dbg_objdump rel_objdump > > 2c2 > < dbg.o: file format elf64-x86-64 > --- > > rel.o: file format elf64-x86-64 > 26,29c26,29 > < 40: eb 0e jmp 50 <_ZN1k1lEv+0x50> > < 42: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) > < 49: 00 00 00 > < 4c: 0f 1f 40 00 nopl 0x0(%rax) > --- > > 40: e9 0b 00 00 00 jmpq 50 <_ZN1k1lEv+0x50> > > 45: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) > > 4c: 00 00 00 > > 4f: 90 nop The newly reproduced issue is due to an assembler optimization in llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp https://reviews.llvm.org/D75203#2491618 -mllvm -x86-pad-for-align=false is a workaround.
Closing the MC issue in favor of bug 48742.