43964 – BranchFolder not debug invariant

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 43964 - BranchFolder not debug invariant

Summary: BranchFolder not debug invariant

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	PC Windows NT

Importance:	P enhancement
Assignee:	bjorn.a.pettersson

URL:
Keywords:

Depends on:
Blocks:	37728
	Show dependency tree

Reported:	2019-11-11 09:13 PST by bjorn.a.pettersson
Modified:	2019-11-21 10:26 PST (History)
CC List:	7 users (show)

See Also:
Fixed By Commit(s):	rG898de302919b

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description bjorn.a.pettersson 2019-11-11 09:13:48 PST

Unfortunately I see a lot more test cases failing due to codegen not being debug invariant after the fix of https://bugs.llvm.org/show_bug.cgi?id=42138 (see https://reviews.llvm.org/D66467).

Main reason is that the loops after SkipTopCFIAndReturn: now undo the work done just before the SkipTopCFIAndReturn label. In the past we have moved the iterators backwards, past any DBG_VALUE instructions. Now we do that, but directly afterward we move the iterators forward again, past both DBG_VALUE and CFI directives. Notice that main purpose with ComputeCommonTailLength is to count number of instructions that are common in the tails, and that number is used by ProfitableToMerge to answer the question if we should attempt to do a tail merge or not.
The answer given by ProfitableToMerge depends on the result of comparing the iterators with MBB->begin(). To make those checks debug invariant we need to move the iterators backward past any debug instructions! (or we would need to modify the checks in some way)

While investigating this, and when using some more or less hand-crafted MIR-tests, it turns out that this is a rather complicated problem. And afaict BranchFolding isn't that good at being debug invariant. And when it is, it isn't that smart when it comes to dealing with any debug instructions or CFI directives that isn't really part of the tail.

One thing that I'm not 100% sure about is if CFI-directives should be allowed to impact codegen or not. It seems like CFI directives, at least sometimes, are generated even without -g. If CFI directives shouldn't impact codegen, then what should the rules be for passes like BranchFolding when merging tails with different CFI directives in the involved basic blocks. CFI directive could both follow an earlier instruction, or it can be put in the beginning of a BB.

I'm afraid that if we simply move the iterators backward until just before a "real" instruction that isn't common (or until we reach beginning of BB). Then at least the transform done by BranchFolding is likely to be debug invariant. But the cost might be that we currently drop lots of DBG/CFI instructions, which at least when it comes to CFI instructions could be bad.

Maybe BranchFolder should execute before PrologEpilogInserter (assuming CFI directives are first introduced by PEI?). Or is BranchFolder typicallly used to merge epilogues, so that is why it is executed after PEI.


Here is a test case showing at least one problem with BranchFolding. The test1 and test2 functions only differ by DBG_VALUE and CFI_INSTRUCTION instructions. So I guess we want to see the same real instructions being generated for both functions.

#################################################################################

# RUN: llc -mtriple=x86_64-- -run-pass branch-folder -O3 -o - %s | FileCheck %s

---
name:            test1
body:             |
  bb.0:
    TEST8rr killed renamable $al, renamable $al, implicit-def $eflags
    JCC_1 %bb.2, 5, implicit killed $eflags

  bb.1:
    MOV8mi $r12, 1, $noreg, 0, $noreg, 0
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0

  bb.2:
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0
...

# CHECK-LABEL: name:            test1
# CHECK: bb.0:
# CHECK:   TEST8rr
# CHECK:   JCC_1
# CHECK: bb.1:
# CHECK:   successors: %bb.2
# CHECK:   MOV8mi $r12
# CHECK: bb.2:
# CHECK:   MOV8mi $r13
# CHECK:   RET 0

---
name:            test2
body:             |
  bb.0:
    TEST8rr killed renamable $al, renamable $al, implicit-def $eflags
    JCC_1 %bb.2, 5, implicit killed $eflags

  bb.1:
    MOV8mi $r12, 1, $noreg, 0, $noreg, 0
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0

  bb.2:
    DBG_VALUE
    DBG_VALUE
    CFI_INSTRUCTION def_cfa_offset 8
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0
...

# CHECK-LABEL: name:            test2
# CHECK: bb.0:
# CHECK:   TEST8rr
# CHECK:   JCC_1
# CHECK: bb.1:
# CHECK:   successors: %bb.2
# CHECK:   MOV8mi $r12
# CHECK: bb.2:
# CHECK:   MOV8mi $r13
# CHECK:   RET 0

#################################################################################


The result is two different function, since we do not :

####################################################################

name:            test1
  bb.0:
    successors: %bb.2(0x40000000), %bb.1(0x40000000)
    TEST8rr killed renamable $al, renamable $al, implicit-def $eflags
    JCC_1 %bb.2, 5, implicit $eflags
  bb.1:
    successors: %bb.2(0x80000000)
    MOV8mi $r12, 1, $noreg, 0, $noreg, 0
  bb.2:
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0


name:            test2
  bb.0:
    successors: %bb.2(0x40000000), %bb.1(0x40000000)
    TEST8rr killed renamable $al, renamable $al, implicit-def $eflags
    JCC_1 %bb.2, 5, implicit killed $eflags
  bb.1:
    MOV8mi $r12, 1, $noreg, 0, $noreg, 0
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0
  bb.2:
    DBG_VALUE
    DBG_VALUE
    CFI_INSTRUCTION def_cfa_offset 8
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0

####################################################################

Comment 1 Chris Ye 2019-11-12 02:37:28 PST

This is a good capture. The issue occurred when the beginning instruction of BB is CFI_INSTRUCTION, the case could simplify as below:

########################################
---
name:            test2
body:             |
  ...
  bb.2:
    CFI_INSTRUCTION def_cfa_offset 8
    MOV8mi $r13, 1, $noreg, 0, $noreg, 0
    RET 0
...
########################################

The SkipTopCFIAndReturn will move the iterators forward (implement ++I2) when meet CFIInstruction. If the begin() is exactly CFIInstruction, the I2 will be assigned as next "real" instruction and return by  ComputeCommonTailLength(), while handle ProfitableToMerge outside, the problem happen to deal with "if(I2 == MBB2->begin())".


With and without CFI_INSTRUCTION at the beginning will really make difference after BranchFolder optimization, is this normal behavior, or it is a bug? 

Is there any definition describe that both CFI and DEBUG should not impact CodeGen result?

Comment 2 Paul Robinson 2019-11-12 10:55:49 PST

>  It seems like CFI directives, at least sometimes, are generated even without -g. 

CFI (call frame information) is emitted whenever unwinding is a possibility.
That can be with -g, it can also be with -fexceptions (and other similar
codegen options).

So, if you're looking for bugs in debug-invariant code generation, I'd say
you should always add -fexceptions to avoid CFI-related differences.

> One thing that I'm not 100% sure about is if CFI-directives should be allowed to impact codegen or not.

I know they act as scheduling barriers, because they describe rather precisely
what is needed to unwind at any given instruction.  I haven't thought about
whether they should influence other optimizations, but conservatively they
should be required to exactly match if you're going to do tail merging or
other kinds of instruction-sequence merging.  Otherwise, unwinding through a
tail-merged sequence will sometimes do the wrong thing.

Comment 3 bjorn.a.pettersson 2019-11-15 08:53:06 PST

It seems like BlockFolder (BlockFolding.cpp) isn't caring that much about CFI instruction inside the tail, but it does make sure it isn't chopping of CFI instructions in the beginning of the block.

I'm not sure if that causes any problems (not aware of any trouble reports related to that), so I've tried to just fix the debug invariant problem I found here: https://reviews.llvm.org/D70091


One theory is that CFI instructions in the beginning of a basic block for example could be def_cfa_offset directives that kind of restores the state after a terminator in the previous block, and we can't just ignore those. 
On the other hand, CFI instructions that comes after an instruction usually describes the effect of that instruction (e.g. a push/pop that moves the SP). So if we fold away the tail of a BB and jump to some other BB with the same instruction sequence, then it does not really matter if we have compared the CFI instructions or not. The CFI instructions present in the BB we jump to should be correct for that position of the code.

Although, I'm not that familiar with how the CFI instructions are handled. So the theory above might be flaky. I've just tried to understand the logic in how CFI instructions has been dealt with in ComputeCommonTailLength (since  https://reviews.llvm.org/D42848 which is the patch that started to ignore CFI instructions when searching for a common tail).

Comment 4 Jeremy Morse 2019-11-20 05:56:37 PST

Hi,

Bjorn wrote:
> It seems like BlockFolder (BlockFolding.cpp) isn't caring that much about 
> CFI instruction inside the tail, but it does make sure it isn't chopping of 
> CFI instructions in the beginning of the block.

I get the impression that this might be an artefact of its "advance then rewind" approach, rather than something deliberate. One of the comments in the large block you delete in D70091 illustrates a CFI_INSTRUCTION inside a tail that shouldn't be "chopped off".

Then again, I know pretty much nothing about CFI anyway.

Paul wrote:
> CFI (call frame information) is emitted whenever unwinding is a possibility.
> That can be with -g, it can also be with -fexceptions (and other similar
> codegen options).

This sounds unpleasant -- if I understand correctly, then if only '-g' was specified then we want to prioritise not changing codegen; but if -fexceptions was specified then codegen _should_ change to make exceptions operate correctly? That could then (potentially) mean two different ways of treating CFI_INSTRUCTION.

Comment 5 bjorn.a.pettersson 2019-11-21 09:50:36 PST

The -g invariance introduced by https://reviews.llvm.org/D66467 has now been fixed in rG898de302919b (https://reviews.llvm.org/D70091).

The fix restores the old behavior to say that a found common tail cover the whole basic block even if debug instructions are found before the tail.

As the earlier comments talk a lot about CFI directives I want to highlight that the fix does not change anything related to CFI directives. If such instructions are found before the common tail, then we still need to split the BB when doing a tail merge.

Comment 6 Paul Robinson 2019-11-21 10:26:48 PST

(In reply to Jeremy Morse from comment #4)
> Paul wrote:
> > CFI (call frame information) is emitted whenever unwinding is a possibility.
> > That can be with -g, it can also be with -fexceptions (and other similar
> > codegen options).
> 
> This sounds unpleasant -- if I understand correctly, then if only '-g' was
> specified then we want to prioritise not changing codegen; but if
> -fexceptions was specified then codegen _should_ change to make exceptions
> operate correctly? That could then (potentially) mean two different ways of
> treating CFI_INSTRUCTION.

No no no...  Both -fexceptions and -g want to be told how to correctly unwind,
and they use exactly the same information to do that.  If you say -fexceptions
without -g, the data is the same, but we name the section .eh_frame instead of
.debug_frame.  That is the *only* difference.

This is a correctness issue!  The unwind info for -g must be correct, in order
for the debugger to walk up the stack without lying to you.

My position is that -g should not introduce any codegen differences when
compared to -fexceptions.