42138 – Different codegen with/without -g

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 42138 - Different codegen with/without -g

Summary: Different codegen with/without -g

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	DebugInfo (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:	37728
	Show dependency tree

Reported:	2019-06-05 06:16 PDT by Christopher Dawson
Modified:	2021-01-13 13:34 PST (History)
CC List:	10 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Christopher Dawson 2019-06-05 06:16:54 PDT

Using the check-cfc tool ( https://github.com/llvm/llvm-project/tree/master/clang/utils/check_cfc ) to spot a codegen difference depending on whether -g is specified or not.

$ cat PowerParser.ii.cc

template <typename, typename = int> class e;
class allocator {
public:
  ~allocator();
};
template <typename, typename> class e {
public:
  e(char *, allocator = allocator());
};
template <typename b, typename c, typename d> bool operator==(e<c, d>, b);
class f {
public:
  f(int *, int *, int *, int, int, int, int);
  e<char> g();
  void j();
};
int h, i;
class k {
  void l();
  bool m_fn4();
  int m;
  int n;
  int q;
  int fmap;
};
void k::l() {
  e<char> o = "";
  for (;;) {
    int p = 0;
    for (;;) {
      if (m_fn4())
        break;
      f a(&q, &fmap, &m, n, h, i, 0);
      if (a.g() == "")
        a.j();
    }
  }
}

$ ./llvm-project/clang/utils/check_cfc/clang++ PowerParser.ii.cc -w -c -O1 -o tmp.ll

Check CFC, checking: dash_g_no_change
PowerParser.ii.cc Code difference detected with -g
--- /tmp/tmpcdg_LH.o

+++ /tmp/tmpfkaPkW.o

@@ -19,6 +19,6 @@

   28:  4c 8d 73 08             lea    0x8(%rbx),%r14
   2c:  4c 8d 7b 0c             lea    0xc(%rbx),%r15
   30:  4c 8d 64 24 08          lea    0x8(%rsp),%r12
-  35:  66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
-  3c:  00 00 00
-  3f:  90                      nop
+  35:  eb 09                   jmp    40 <_ZN1k1lEv+0x40>
+  37:  66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
+  3e:  00 00
*** Diff truncated ***

Comment 1 Chris Ye 2019-08-08 17:29:44 PDT

Not sure below research is helpful:

Compare the code with "-c -O3" and "-c -O3 -g", there are little different.
ref to: https://godbolt.org/z/KeVedB

while compare the line 29:  
   e<char> o = "";

the differences are: 
baseline:
 jmp    40 <k::l()+0x40>
 nop    WORD PTR [rax+rax*1+0x0]

with debug:
 nop    WORD PTR cs:[rax+rax*1+0x0]
 nop

I try to analyze deeply, but it's hard for me to find the code which impacted the "-O" and "-g" while compare "e<char> o = """ to .ll, very appreciate if someone could give some suggests.

Comment 2 Chris Ye 2019-08-09 07:58:42 PDT

more debug info：
using "-mllvm -opt-bisect-limit=2 -c -O3", find the difference, seems SROA pass has some issue there. Continue to look inside SROA pass.

BISECT: running pass (1) Simplify the CFG on function (_ZN1k1lEv)
BISECT: running pass (2) SROA on function (_ZN1k1lEv)
BISECT: NOT running pass (3) Early CSE on function (_ZN1k1lEv)

######
< a.o:     file format elf64-x86-64
---
> b.o:     file format elf64-x86-64
18,55c18,54
<   28: eb 00                   jmp    2a <_ZN1k1lEv+0x2a>
<   2a: 48 89 df                mov    %rbx,%rdi
<   2d: e8 00 00 00 00          callq  32 <_ZN1k1lEv+0x32>
<   32: a8 01                   test   $0x1,%al
<   34: 0f 85 86 00 00 00       jne    c0 <_ZN1k1lEv+0xc0>
<   3a: eb 15                   jmp    51 <_ZN1k1lEv+0x51>
<   3c: 48 89 c3                mov    %rax,%rbx
<   3f: 48 8d 7c 24 20          lea    0x20(%rsp),%rdi
<   44: e8 00 00 00 00          callq  49 <_ZN1k1lEv+0x49>
<   49: 48 89 df                mov    %rbx,%rdi
<   4c: e8 00 00 00 00          callq  51 <_ZN1k1lEv+0x51>
<   51: 31 c0                   xor    %eax,%eax
<   53: 48 89 de                mov    %rbx,%rsi
<   56: 48 81 c6 08 00 00 00    add    $0x8,%rsi
<   5d: 48 89 da                mov    %rbx,%rdx
<   60: 48 81 c2 0c 00 00 00    add    $0xc,%rdx
<   67: 44 8b 43 04             mov    0x4(%rbx),%r8d
<   6b: 44 8b 0c 25 00 00 00    mov    0x0,%r9d
<   72: 00
<   73: 8b 04 25 00 00 00 00    mov    0x0,%eax
<   7a: 48 8d 7c 24 18          lea    0x18(%rsp),%rdi
<   7f: 48 89 d9                mov    %rbx,%rcx
<   82: 89 04 24                mov    %eax,(%rsp)
<   85: c7 44 24 08 00 00 00    movl   $0x0,0x8(%rsp)
<   8c: 00
<   8d: e8 00 00 00 00          callq  92 <_ZN1k1lEv+0x92>
<   92: 48 8d 7c 24 18          lea    0x18(%rsp),%rdi
<   97: e8 00 00 00 00          callq  9c <_ZN1k1lEv+0x9c>
<   9c: 48 bf 00 00 00 00 00    movabs $0x0,%rdi
<   a3: 00 00 00
<   a6: e8 00 00 00 00          callq  ab <_ZN1k1lEv+0xab>
<   ab: a8 01                   test   $0x1,%al
<   ad: 75 02                   jne    b1 <_ZN1k1lEv+0xb1>
<   af: eb 0a                   jmp    bb <_ZN1k1lEv+0xbb>
<   b1: 48 8d 7c 24 18          lea    0x18(%rsp),%rdi
<   b6: e8 00 00 00 00          callq  bb <_ZN1k1lEv+0xbb>
<   bb: e9 6a ff ff ff          jmpq   2a <_ZN1k1lEv+0x2a>
<   c0: e9 63 ff ff ff          jmpq   28 <_ZN1k1lEv+0x28>
---
>   28: 48 89 df                mov    %rbx,%rdi
>   2b: e8 00 00 00 00          callq  30 <_ZN1k1lEv+0x30>
>   30: a8 01                   test   $0x1,%al
>   32: 0f 85 86 00 00 00       jne    be <_ZN1k1lEv+0xbe>
>   38: eb 15                   jmp    4f <_ZN1k1lEv+0x4f>
>   3a: 48 89 c3                mov    %rax,%rbx
>   3d: 48 8d 7c 24 20          lea    0x20(%rsp),%rdi
>   42: e8 00 00 00 00          callq  47 <_ZN1k1lEv+0x47>
>   47: 48 89 df                mov    %rbx,%rdi
>   4a: e8 00 00 00 00          callq  4f <_ZN1k1lEv+0x4f>
>   4f: 31 c0                   xor    %eax,%eax
>   51: 48 89 de                mov    %rbx,%rsi
>   54: 48 81 c6 08 00 00 00    add    $0x8,%rsi
>   5b: 48 89 da                mov    %rbx,%rdx
>   5e: 48 81 c2 0c 00 00 00    add    $0xc,%rdx
>   65: 44 8b 43 04             mov    0x4(%rbx),%r8d
>   69: 44 8b 0c 25 00 00 00    mov    0x0,%r9d
>   70: 00
>   71: 8b 04 25 00 00 00 00    mov    0x0,%eax
>   78: 48 8d 7c 24 18          lea    0x18(%rsp),%rdi
>   7d: 48 89 d9                mov    %rbx,%rcx
>   80: 89 04 24                mov    %eax,(%rsp)
>   83: c7 44 24 08 00 00 00    movl   $0x0,0x8(%rsp)
>   8a: 00
>   8b: e8 00 00 00 00          callq  90 <_ZN1k1lEv+0x90>
>   90: 48 8d 7c 24 18          lea    0x18(%rsp),%rdi
>   95: e8 00 00 00 00          callq  9a <_ZN1k1lEv+0x9a>
>   9a: 48 bf 00 00 00 00 00    movabs $0x0,%rdi
>   a1: 00 00 00
>   a4: e8 00 00 00 00          callq  a9 <_ZN1k1lEv+0xa9>
>   a9: a8 01                   test   $0x1,%al
>   ab: 75 02                   jne    af <_ZN1k1lEv+0xaf>
>   ad: eb 0a                   jmp    b9 <_ZN1k1lEv+0xb9>
>   af: 48 8d 7c 24 18          lea    0x18(%rsp),%rdi
>   b4: e8 00 00 00 00          callq  b9 <_ZN1k1lEv+0xb9>
>   b9: e9 6a ff ff ff          jmpq   28 <_ZN1k1lEv+0x28>
>   be: e9 65 ff ff ff          jmpq   28 <_ZN1k1lEv+0x28>
######

Comment 3 Greg Bedwell 2019-08-09 08:03:39 PDT

One thing to note is that it is not necessarily the SROA pass that's actually causing the issue. Even though you're telling the compiler to not run any passes after SROA it's still having to do a bunch of work later on in order to actually emit the code.  

It might just be the case that SROA is perfectly validly making some change to the code which happens to allow a code path later on containing the bug to be triggered which otherwise wouldn't have been.

Comment 4 Chris Ye 2019-08-10 01:40:59 PDT

Then, compare the -opt-bisect-limit=<Num> from max to min, find the difference is made by 133 - "Branch Probability Basic Block Placement on function (_ZN1k1lEv)", For "-g" the pass number is 137, not sure if it is correct debug method.

clang++ -mllvm  -opt-bisect-limit=133 PowerParser.cc -c -O1 -o a.o
objdump -d a.o > a.obj

clang++ -mllvm -opt-bisect-limit=137 PowerParser.cc -c -O1 -g -o ag.o
objdump -d ag.o > ag.obj

baseline:
  35:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  3c:	00 00 00 
  3f:	90                   	nop
  40:	eb 0e                	jmp    50 <_ZN1k1lEv+0x50>
  42:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  49:	00 00 00 
  4c:	0f 1f 40 00          	nopl   0x0(%rax)

with debug:
  35:	eb 09                	jmp    40 <_ZN1k1lEv+0x40>
  37:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)

Bug from this assembly code, it is still not easy to find the problem code.

Comment 5 Chris Ye 2019-08-19 00:49:53 PDT

The issue seems caused by BranchFolderPass - "Control Flow Optimizer".

line:
------------------------
while (I2 != MBB2->end() && I2->isCFIInstruction()) {
https://github.com/llvm/llvm-project/blob/21599876be328ff6b5c6cf09544ade7e337cb48d/llvm/lib/CodeGen/BranchFolding.cpp#L403
-----------------------

While handling MBB2 instrs list below, DEBUG instrs should also be skipped when goto SkipTopCFIAndReturn. Begin Instr of the MBB is debug instruction, the second line instr is CFI, both debug and CFI should be skip when ComputeCommonTailLength. Otherwhise, the debug instr impace later Pass (MachineBlockReplacement) with "-g"


LLVM_DEBUG printf:
---------------------------
MBB2: bb.2 (%ir-block.9):
; predecessors: %bb.1, %bb.2, %bb.6
  successors: %bb.2(0x40000000), %bb.4(0x40000000); %bb.2(50.00%), %bb.4(50.00%)
  liveins: $rbx, $r12, $r14, $r15
  DBG_VALUE 0, $noreg, !"p", !DIExpression(), debug-location !76; PowerParser.cc:29:9 line no:29
  CFI_INSTRUCTION <unserializable cfi directive>, debug-location !77; PowerParser.cc:31:11
  $rdi = COPY renamable $rbx, debug-location !77; PowerParser.cc:31:11
---------------------------

After change the code, just like below, the issue could fixed.
while (I2 != MBB2->end() && (I2->isCFIInstruction() || I2->isDebugInstr())) {
    ++I2;
}


If the analysis is correct, I would like to submit patch to fix the issue

Comment 6 Jeremy Morse 2019-08-19 01:34:25 PDT

Thanks for diagnosing -- that code definitely looks suspicious, the comment from line 378 even explains how problems could occur!

It's a little odd that most of ComputeCommonTailLength uses the nearby "countsAsInstruction" helper to skip over debug instructions, but the last two loops don't. It might be worth looking at the history / git blame a little, just to see if there's some other justification; but if your patch fixes the reproducer in this test, it's definitely worth submitting.

Comment 7 Chris Ye 2019-08-20 03:59:56 PDT

Patch has been submitted to fix this issue:
https://reviews.llvm.org/D66467

Comment 8 Chris Ye 2019-08-21 03:43:09 PDT

when writing MIR test code for this patch, meet one error "unnamed alloca", I try to compile the code without any PASS, the issue is still exist. It looks like the sample code itself is not correct when handling allocator.

Steps to compile code without pass:

-----------------------------------------
clang++ -g -w -O1 -S -emit-llvm PowerParser.ii.cc -mllvm -opt-bisect-limit=0 -o test.ll
llc -stop-before=branch-folder test.ll -opt-bisect-limit=0 -o test.mir
llc -o - test.mir -mtriple=x86_64-- -run-pass=branch-folder

error: test.mir:298:20: alloca instruction named '<unnamed alloca>' isn't defined in the function '_ZN1k1lEv'
  - { id: 0, name: '<unnamed alloca>', type: default, offset: -48, size: 8,
-----------------------------------------

After run test above, there are many "unnamed alloca" in test.mir. Not sure if it is sample code issue, rather then codegen issue?

Comment 9 Jeremy Morse 2019-08-21 03:55:01 PDT

Chris wrote:
> After run test above, there are many "unnamed alloca" in test.mir. Not sure if it is sample code issue, rather then codegen issue?

I've experienced this in the past, I think it's something weird / broken with the MIR representation -- I've never gotten to the bottom of it.

Previously I've just fiddled with my test cases until they don't generate any allocas at all. Note that you don't necessarily need a MIR test input that comes straight from a C input: you can delete and modify the MIR until it stimulates the code path you're trying to test. That means you could delete anything to do with un-named allocas in the MIR output, alternately you could copy-and-edit an existing MIR test until it represents the behaviour in branch-folder that you're trying to fix.

Comment 10 Chris Ye 2019-10-29 18:33:55 PDT

closed by commit https://reviews.llvm.org/rGec32dff0b075055b30140c543e9f2bef608adc14

Comment 11 Greg Bedwell 2019-10-30 00:58:26 PDT

Thanks for working on this! Really happy to have you on board the LLVM project :)

Comment 12 Chris Ye 2019-10-30 03:36:46 PDT

@Greg Bedwell
Hi Greg, just feel the LLVM community is very nice and many seniors would take time to kindly help a beginner, especially you, thanks so much :)

Comment 13 Zhiwei Chen 2021-01-08 03:05:45 PST

It can be reproduced using the PowerParser.ii.cc by clang 7809fa20400000fd40b4a4b56696c7fbcd0f0fa9 (committed at 2021-01-06). So I decide to reopen it.

> clang -w -O1 -c PowerParser.ii.cc -o dbg.o -g
> clang -w -O1 -c PowerParser.ii.cc -o rel.o

Then

> objdump -d dbg.o > dbg_objdump
> objdump -d rel.o > rel_objdump
> diff dbg_objdump rel_objdump

2c2
< dbg.o:     file format elf64-x86-64
---
> rel.o:     file format elf64-x86-64
26,29c26,29
<   40: eb 0e                   jmp    50 <_ZN1k1lEv+0x50>
<   42: 66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
<   49: 00 00 00
<   4c: 0f 1f 40 00             nopl   0x0(%rax)
---
>   40: e9 0b 00 00 00          jmpq   50 <_ZN1k1lEv+0x50>
>   45: 66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>   4c: 00 00 00
>   4f: 90                      nop

Comment 14 Zhiwei Chen 2021-01-10 00:32:27 PST

Both debug and release version produce the same LLVM IR when exclude debug info.

It seems like a assembler bug, since passing '-fno-integrated-as' to clang makes no difference on machine code.

Comment 15 Fangrui Song 2021-01-12 09:44:59 PST

(In reply to Zhiwei Chen from comment #13)
> It can be reproduced using the PowerParser.ii.cc by clang
> 7809fa20400000fd40b4a4b56696c7fbcd0f0fa9 (committed at 2021-01-06). So I
> decide to reopen it.
> 
> > clang -w -O1 -c PowerParser.ii.cc -o dbg.o -g
> > clang -w -O1 -c PowerParser.ii.cc -o rel.o
> 
> Then
> 
> > objdump -d dbg.o > dbg_objdump
> > objdump -d rel.o > rel_objdump
> > diff dbg_objdump rel_objdump
> 
> 2c2
> < dbg.o:     file format elf64-x86-64
> ---
> > rel.o:     file format elf64-x86-64
> 26,29c26,29
> <   40: eb 0e                   jmp    50 <_ZN1k1lEv+0x50>
> <   42: 66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
> <   49: 00 00 00
> <   4c: 0f 1f 40 00             nopl   0x0(%rax)
> ---
> >   40: e9 0b 00 00 00          jmpq   50 <_ZN1k1lEv+0x50>
> >   45: 66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
> >   4c: 00 00 00
> >   4f: 90                      nop

The newly reproduced issue is due to an assembler optimization in llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

https://reviews.llvm.org/D75203#2491618

-mllvm -x86-pad-for-align=false is a workaround.

Comment 16 Fangrui Song 2021-01-13 13:34:11 PST

Closing the MC issue in favor of bug 48742.