Skip to content

Better backtraces for lazy values#9469

Merged
nojb merged 3 commits intoocaml:trunkfrom
lpw25:fix-lazy-backtraces
Jun 14, 2020
Merged

Better backtraces for lazy values#9469
nojb merged 3 commits intoocaml:trunkfrom
lpw25:fix-lazy-backtraces

Conversation

@lpw25
Copy link
Copy Markdown
Contributor

@lpw25 lpw25 commented Apr 18, 2020

Currently code like:

let l1 : unit lazy_t = lazy (raise Not_found)

let test1 () =
  let () = Lazy.force l1 in ()

let () = test1 ()

produces a backtrace that does not include any location in the test1 function. Similarly,

let l2 : unit lazy_t = lazy (raise Not_found)

let test2 () =
  let (lazy ()) = l2 in ()

let () = test2 ()

produces a backtrace that does not include any location in the test2 function.

This PR fixes that by making sure that the code generated for forcing lazy values is given the location of the call or pattern that produced it.

Copy link
Copy Markdown
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a potentially useful change, and the patch is very small and simple. Not really familiar with Matching, but I am approving on the basis that the patch just replaces a Location.none by an actual location, and the supplied test.

@lpw25 could you rebase the PR please, and add Changes entry? Thanks! In the meantime maybe @gasche or @trefis (which are working on cleaning up Matching) can give a second approval for good measure.

@Octachron
Copy link
Copy Markdown
Member

The same location improvement in matching has already been merged in #9520 , so this part is indeed fine.

@xavierleroy
Copy link
Copy Markdown
Contributor

There are conflicts with #9520 (already merged). Could you please rebase, solve the conflicts, and add a Changes entry?

@lpw25 lpw25 force-pushed the fix-lazy-backtraces branch from 8f5a44b to 022ca03 Compare June 3, 2020 13:51
@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Jun 3, 2020

Rebased and Changes entry added.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 3, 2020

The test is failing: https://travis-ci.org/github/ocaml/ocaml/jobs/694263670#L5446-L5463
Looks like stack backtraces are not being recorded...

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 4, 2020

It is due to a bug in ocamltest (see #9633). As a fix, you can just call Printexc.record_backtrace true explicitly in the test (I tried to push directly to your branch, but did not have the required permissions).

@lpw25 lpw25 force-pushed the fix-lazy-backtraces branch from 022ca03 to 6029221 Compare June 5, 2020 09:19
@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Jun 5, 2020

I've pushed the change to avoid the ocamltest bug.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 5, 2020

I've pushed the change to avoid the ocamltest bug.

Thank you! Will merge once CI passes.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 5, 2020

There is another error (only in some Travis configs):

 ... testing 'lazy.ml' with 1 (native) => failed (Files
/home/travis/build/ocaml/ocaml/testsuite/_ocamltest/tests/backtrace/lazy/ocamlopt.byte/lazy.opt 
and 
/home/travis/build/ocaml/ocaml/testsuite/_ocamltest/tests/backtrace/lazy/ocamlopt.opt/lazy.opt 
are different)

@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Jun 14, 2020

The tests are failing because the executables produced by ocamlopt.opt and ocamlopt.byte differ. I don't know why or how we test that, but I can't tell which differences matter for that check. The disassembled code is identical.

@lpw25 lpw25 force-pushed the fix-lazy-backtraces branch from 6029221 to faa56c1 Compare June 14, 2020 06:57
@lpw25
Copy link
Copy Markdown
Contributor Author

lpw25 commented Jun 14, 2020

I've added compare_programs="false" to the new test. Some of the other tests in the backtrace directory have this incantation, so hopefully that is fine. I swear that every bugfix I write for OCaml most of the effort goes into making the accompanying test work.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 14, 2020

I've added compare_programs="false" to the new test. Some of the other tests in the backtrace directory have this incantation, so hopefully that is fine. I swear that every bugfix I write for OCaml most of the effort goes into making the accompanying test work.

I think this should be fine. Thanks. (As this is done in other tests in that directory, I think this can be merged once CI passes.)

Out of curiosity, I looked a bit into the problem. It appears there is a tiny difference (1 byte offset?) between the (C compiler's) debug info generated for the two versions:

nojebar@walras$ readelf -a lazy.opt.opt > lazy.opt.opt.elf
nojebar@walras$ readelf -a lazy.byte.opt > lazy.byte.opt.elf
nojebar@walras$ git diff --no-index lazy.byte.opt.elf lazy.opt.opt.elf
diff --git a/lazy.byte.opt.elf b/lazy.opt.opt.elf
index ae60d72..848217b 100644
--- a/lazy.byte.opt.elf
+++ b/lazy.opt.opt.elf
@@ -79,14 +79,14 @@ Section Headers:
   [27] .debug_aranges    PROGBITS         0000000000000000  00093c30
        0000000000006900  0000000000000000           0     0     16
   [28] .debug_info       PROGBITS         0000000000000000  0009a530
-       000000000004698b  0000000000000000           0     0     1
-  [29] .debug_abbrev     PROGBITS         0000000000000000  000e0ebb
+       000000000004698a  0000000000000000           0     0     1
+  [29] .debug_abbrev     PROGBITS         0000000000000000  000e0eba
        000000000000a3fd  0000000000000000           0     0     1
-  [30] .debug_line       PROGBITS         0000000000000000  000eb2b8
+  [30] .debug_line       PROGBITS         0000000000000000  000eb2b7
        0000000000017f14  0000000000000000           0     0     1
-  [31] .debug_str        PROGBITS         0000000000000000  001031cc
+  [31] .debug_str        PROGBITS         0000000000000000  001031cb
        00000000000089ff  0000000000000001  MS       0     0     1
-  [32] .debug_loc        PROGBITS         0000000000000000  0010bbcb
+  [32] .debug_loc        PROGBITS         0000000000000000  0010bbca
        00000000000450fc  0000000000000000           0     0     1
   [33] .debug_ranges     PROGBITS         0000000000000000  00150cd0
        000000000000d330  0000000000000000           0     0     16
@@ -6243,4 +6243,4 @@ Displaying notes found at file offset 0x00000254 with length 0x00000020:
 Displaying notes found at file offset 0x00000274 with length 0x00000024:
   Owner                 Data size      Description
   GNU                  0x00000014      NT_GNU_BUILD_ID (unique build ID bitstring)
-    Build ID: d4c417d02ecbf81e33029a65a3ec786bc0394b02
+    Build ID: 4d222207f8523892ac9439b2c2ffca8137e89e13

Maybe something to consider is to just drop the idea of comparing native binaries and restrict comparison to pure bytecode executables, where the output should be deterministic.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 14, 2020

OK, it turns out the difference is that the (C compiler's) debug information stores the path to the compilation directory which is different for both versions: _ocamltest/lazy/ocamlopt.byte/ vs _ocamltest/lazy/ocamlopt.opt/ (note that the difference in length is exactly one byte). See the difference in debug info (can be read using readelf --debug-dump=info ...):

diff --git a/lazy.byte.elf.debug b/lazy.opt.elf.debug
index 491613e..04673d2 100644
--- a/lazy.byte.elf.debug
+++ b/lazy.opt.elf.debug
@@ -14,7 +14,7 @@ Contents of the .debug_info section:
     <47>   DW_AT_producer    : GNU AS 2.24     
     <53>   DW_AT_language    : 32769   (MIPS assembler)
   Compilation Unit @ offset 0x55:
-   Length:        0x7e (32-bit)
+   Length:        0x7d (32-bit)
    Version:       2
    Abbrev Offset: 0x14
    Pointer Size:  8
@@ -23,120334 +23,120334 @@ Contents of the .debug_info section:
     <65>   DW_AT_low_pc      : 0x4201c0        
     <6d>   DW_AT_high_pc     : 0x4203c6        
     <75>   DW_AT_name        : lazy.ml 
-    <7d>   DW_AT_comp_dir    : /home/nojebar/ocaml/testsuite/tests/backtrace/_ocamltest/lazy/ocamlopt.byte     
-    <c9>   DW_AT_producer    : GNU AS 2.24     
-    <d5>   DW_AT_language    : 32769   (MIPS assembler)
-  Compilation Unit @ offset 0xd7:
+    <7d>   DW_AT_comp_dir    : /home/nojebar/ocaml/testsuite/tests/backtrace/_ocamltest/lazy/ocamlopt.opt      
+    <c8>   DW_AT_producer    : GNU AS 2.24     
+    <d4>   DW_AT_language    : 32769   (MIPS assembler)
+  Compilation Unit @ offset 0xd6:
    Length:        0x55 (32-bit)
    Version:       2
    Abbrev Offset: 0x28
    Pointer Size:  8

This means that comparing native programs is broken when passing -g in all cases. I will submit a PR to disable the comparison of programs in this case automatically.

@nojb nojb merged commit 33416d1 into ocaml:trunk Jun 14, 2020
@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 14, 2020

CI green, merged. Thank you for your efforts!

@xavierleroy
Copy link
Copy Markdown
Contributor

Thanks for the detective work. I would suggest to remove the comparison between ocamlopt.byte- and ocamlopt.opt-generated executables in all cases. We don't control the C linker and there are various reasons why it can generate different but functionally-equivalent executables.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 10, 2021

According to my testing, native programs now have better backtrace, but bytecode programs don't. In fact looking at the PR again, I see that the backtrace test is restricted to native compilation only. I looked at the code and I don't understand why the backtrace would not be improved on bytecode (the compilation path is different, but both paths generate a function call that should now have the correct location). Can someone explain the difference?

test.ml

let test =
  1 + 
  Lazy.force
   (lazy
     (raise Exit))

4.12+dev native backtrace:

Raised at Test.test in file "test.ml", line 5, characters 5-17
Called from CamlinternalLazy.force_lazy_block in file "camlinternalLazy.ml", line 31, characters 17-27
Re-raised at CamlinternalLazy.force_lazy_block in file "camlinternalLazy.ml", line 36, characters 4-11
Called from Test.test in file "test.ml", line 3, characters 2-40

4.12+dev bytecode backtrace:

Raised at Test.test in file "test.ml", line 5, characters 5-17
Called from CamlinternalLazy.force_lazy_block in file "camlinternalLazy.ml", line 31, characters 17-27
Re-raised at CamlinternalLazy.force_lazy_block in file "camlinternalLazy.ml", line 36, characters 4-11
Called from unknown location

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants