Skip to content

Allowing displaying multi-line backtrace locations in the same way as errors#10111

Merged
dra27 merged 7 commits intoocaml:trunkfrom
dra27:enhanced-debug-locs
Jun 8, 2023
Merged

Allowing displaying multi-line backtrace locations in the same way as errors#10111
dra27 merged 7 commits intoocaml:trunkfrom
dra27:enhanced-debug-locs

Conversation

@dra27
Copy link
Copy Markdown
Member

@dra27 dra27 commented Jan 1, 2021

This PR paves the way for changing how backtrace locations are printed to be the same as #8541, so with locations spanning multiple lines reporting the line and column of the end location, rather than simply the number of characters offset from the start line. The current format is fine for machines, but a nuisance for the occasions when humans have to look at the output!

As it happens, though, my primary motivation for this is a series of (smaller) PRs which eliminate the special handling needed for CRLF-checkouts of the compiler's codebase. I've quietly maintained the .gitattributes entries for these for a few years (we all have our foibles) with the intention of one day sorting it out properly. That one day arrived in September, because there are recent changes to our codebase which mean that it's not just some testsuite files which need forcing to use LF-endings but some of the compiler's sources too, it's just taken me a few months to get to tidy up the code.

There are three parts:

  1. Extending struct ev_info to record the extra fields in bytecode (simple change)
  2. Altering the encoding of debuginfo to record the extra data in native code (less simple - more below)
  3. Threading this through to Printexc and adding two new fields to Printexc.location

Actually displaying this information will be in a future PR.

At present, the 64-bit words for debug info allow 20 bits for the line number, 8 bits for the first character location and 10 bits for the end character location. In our tree, that at present leads to just over 1000 locations which overflow the representation (not that they're necessarily ever displayed). Now I steal 1 bit from the line number (so we technically can no longer display locations from lines 524288-1048575 in a file) which is used to indicate two possible formats for the 64-bit word. The first adds no further overhead by using 12 bits for the line number, 6 bits for the character offset of the start, 3 bits to encode the offset to the end-line (so 0 when the location is on the same line), 7 bits to encode the character offset of the end (relative to the end-line) and - in order to keep the information we had before - 9 bits to store the end character offset as before. This encoding captures 92% of the locations in a build of the compiler tree. When that overflows, the representation instead stores an extra 64bits in the name_info struct containing the defname (thus adding 8 bytes overhead), allowing 16bits each for the start and end character offset (the other int is used for the original end character offset relative to the start). I haven't shared defname strings as when I checked the tree there were only 50 locations where the strings were being duplicated, so the overhead (at least for the compiler sources) is small (indeed, the extra 4 byte offset in the other locations would dwarf the saving on the strings).

With these two representations, no locations in the compiler's tree are now unrepresentable.

Having stored that information, it clearly makes sense to be able to access it in Printexc - in this version I've simply added two fields to the end of the Printexc.location. The alternative would be an additional type with a different retrieval function.

@dra27 dra27 force-pushed the enhanced-debug-locs branch 2 times, most recently from 2fc86ed to e718e31 Compare January 4, 2021 09:39
@stedolan
Copy link
Copy Markdown
Contributor

stedolan commented Jan 4, 2021

This sounds good! I've definitely grown tired of seeing -1023 in memtrace files: the end_char offset can overflow quite quickly when it's referring to a whole multiline function.

Is the optimisation of partially packed debuginfos important? Naively I would have thought that since most debuginfos are fully packed, we don't lose much by not bothering to do any bitpacking in the overflow case.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks beneficial but technical work. It needs a reviewer. Do we have a volunteer? Would one of the people investing their time in debugging/tracing tools be willing to look at this?

(Naive question from just reading the PR description: have you kept some space in the format for eventual future extension? What do we do for locations, possibly in generated code, that actually have huge line numbers, or large character positions? Should we have a "safe/robust" mode where everything is 64bits, instead of making various assumptions?)

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Apr 14, 2021

This is also partly waiting for me to update the code to address @stedolan's comment!

@dra27 dra27 force-pushed the enhanced-debug-locs branch 2 times, most recently from 285992a to bc71853 Compare April 14, 2021 20:16
@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Apr 14, 2021

Rebased to get rid of the conflict and also updated to use a pair of uint16_ts instead of a manually packed uint32_t.

@stedolan and I had an offlineonline (but not on GitHub) discussion about this in January. This PR seeks maximum compatibility with the what was there before by maintaining the old end of character range field to be computed. The intention is that the next PR would do away with this compatibility (at the cost of a visible change in Printexc) where virtually all locations would be encoded in the packed format and we could indeed simplify the code slightly by doing away with the "partially packed" bit format because there would be so few locations overflowing the packed format that we wouldn't care at the slight increase in size of the struct.

@dra27 dra27 force-pushed the enhanced-debug-locs branch from bc71853 to a598655 Compare April 21, 2021 11:12
@dra27 dra27 force-pushed the enhanced-debug-locs branch from a598655 to 9fa9e96 Compare May 12, 2021 09:24
@dra27 dra27 force-pushed the enhanced-debug-locs branch from 9fa9e96 to e1b6ee5 Compare May 27, 2021 04:45
@dra27 dra27 force-pushed the enhanced-debug-locs branch from e1b6ee5 to 1b259ac Compare June 28, 2021 07:32
@dra27 dra27 force-pushed the enhanced-debug-locs branch 2 times, most recently from d20ad0b to f35155f Compare July 10, 2021 15:46
@stedolan stedolan self-assigned this Jul 14, 2021
@dra27 dra27 force-pushed the enhanced-debug-locs branch from f35155f to 1a4665f Compare July 27, 2021 08:59
@dra27 dra27 force-pushed the enhanced-debug-locs branch from 1a4665f to c9dbffb Compare August 30, 2021 14:13
@dra27 dra27 force-pushed the enhanced-debug-locs branch from c9dbffb to 5dea248 Compare September 22, 2021 12:36
@dra27 dra27 force-pushed the enhanced-debug-locs branch from 5dea248 to 6abc613 Compare October 24, 2021 12:56
@dra27 dra27 force-pushed the enhanced-debug-locs branch from 6abc613 to 98935bd Compare October 14, 2022 14:52
Copy link
Copy Markdown
Contributor

@abbysmal abbysmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the review, I am not extremely familiar with this part of the codebase, but it does look correct to me.
@stedolan did you get a chance to take another look at it?

@dra27 dra27 force-pushed the enhanced-debug-locs branch 2 times, most recently from da02bb2 to 85141ec Compare January 6, 2023 15:27
@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Jan 6, 2023

Thanks, @Engil! I updated the @since markings, and also put this through precheck#812

@dra27 dra27 force-pushed the enhanced-debug-locs branch from 85141ec to 9a24195 Compare February 3, 2023 21:16
@dra27 dra27 force-pushed the enhanced-debug-locs branch from 9a24195 to bb47354 Compare March 3, 2023 14:54
@dra27 dra27 force-pushed the enhanced-debug-locs branch from bb47354 to 07edcd7 Compare June 8, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants