Skip to content

fix(debuginfo): correct language detection for LTO-compiled binaries#961

Merged
loewenheim merged 4 commits intogetsentry:masterfrom
DataDog:nsavoire/dwarf_lto_language_fix
Mar 23, 2026
Merged

fix(debuginfo): correct language detection for LTO-compiled binaries#961
loewenheim merged 4 commits intogetsentry:masterfrom
DataDog:nsavoire/dwarf_lto_language_fix

Conversation

@nsavoire
Copy link
Copy Markdown
Contributor

LTO can produce compilation units whose DW_AT_language does not reflect the true source language of the functions they contain. Two cases arise:

  1. Artificial LTO CUs (e.g. artificial CU with C++ language tag that contains C functions):
    a top-level subprogram in such a CU carries a cross-unit DW_AT_abstract_origin pointing to the real CU. We now follow that reference in resolve_function_language to pick up the origin CU's language, which is then used for the symbol-table name, DWARF name, and fallback name of the function.

  2. Cross-language LTO inlinees (e.g. a C function inlined into Rust): the inlinee's DW_AT_abstract_origin references the C CU directly. resolve_function_name now reads the referenced CU's language via UnitRef::language() whenever it follows an abstract_origin across a unit boundary, overriding the language supplied by the caller.

To propagate the correctly-resolved language to all inlinees of a top-level subprogram, parse_function passes it down through parse_function_children and parse_inlinee. Same-unit abstract_origin references (LTO partial units without a further cross-unit link) keep the enclosing function's language as a fallback, which is correct for the common case where all code in an LTO CU shares the same language.

LTO can produce compilation units whose DW_AT_language does not reflect
the true source language of the functions they contain. Two cases arise:

1. Artificial LTO CUs (e.g. artificial CU with C++ language tag that
contains C functions):
   a top-level subprogram in such a CU carries a cross-unit
   DW_AT_abstract_origin pointing to the real CU. We now follow that
   reference in resolve_function_language to pick up the origin CU's
   language, which is then used for the symbol-table name, DWARF name,
   and fallback name of the function.

2. Cross-language LTO inlinees (e.g. a C function inlined into Rust):
   the inlinee's DW_AT_abstract_origin references the C CU directly.
   resolve_function_name now reads the referenced CU's language via
   UnitRef::language() whenever it follows an abstract_origin across a
   unit boundary, overriding the language supplied by the caller.

To propagate the correctly-resolved language to all inlinees of a
top-level subprogram, parse_function passes it down through
parse_function_children and parse_inlinee. Same-unit abstract_origin
references (LTO partial units without a further cross-unit link) keep
the enclosing function's language as a fallback, which is correct for
the common case where all code in an LTO CU shares the same language.
Copy link
Copy Markdown
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsavoire thank you for the contribution! Do you have any links to docs that describe the two cases you mention?

@nsavoire
Copy link
Copy Markdown
Contributor Author

@nsavoire thank you for the contribution! Do you have any links to docs that describe the two cases you mention?

I included two binaries that cover these cases:

Doing a lookup of address 0x5d740 with symcache_debug would return an incorrect language (C++):

> symcache_debug -d ../libjemalloc.so.debug --lookup 0x5d740
malloc_mutex_trylock_final (C++)
  at /build/jemalloc-zywF3d/jemalloc-5.3.0/include/jemalloc/internal/mutex.h line 157
malloc_mutex_lock (C++)
  at /build/jemalloc-zywF3d/jemalloc-5.3.0/include/jemalloc/internal/mutex.h line 216
je_tcache_arena_associate (C++)
  at /build/jemalloc-zywF3d/jemalloc-5.3.0/src/tcache.c line 588

With this change, it would return the correct language (C):

malloc_mutex_trylock_final (C)
  at /build/jemalloc-zywF3d/jemalloc-5.3.0/include/jemalloc/internal/mutex.h line 157
malloc_mutex_lock (C)
  at /build/jemalloc-zywF3d/jemalloc-5.3.0/include/jemalloc/internal/mutex.h line 216
je_tcache_arena_associate (C)
  at /build/jemalloc-zywF3d/jemalloc-5.3.0/src/tcache.c line 588
  • cross_language_lto.debug is the debug symbol file of simple Rust binary that calls into a C function built from this repo.

C function my_add is inlined by LTO into compute_sum. Doing a lookup of address 0x55470 with symcache_debug would return an incorrect language for my_add (Rust):

my_add (Rust)
  at /home/bits/dd/lto/c_src/math.c line 2
compute_sum (Rust)
  at /home/bits/go/src/github.com/DataDog/lto/src/main.rs line 22

With this change, it would return C:

my_add (C)
  at /home/bits/dd/lto/c_src/math.c line 2
compute_sum (Rust)
  at /home/bits/go/src/github.com/DataDog/lto/src/main.rs line 22

@nsavoire nsavoire force-pushed the nsavoire/dwarf_lto_language_fix branch from a49b0ee to bf2dfe9 Compare March 18, 2026 17:33
@jjbayer jjbayer requested a review from loewenheim March 23, 2026 08:01
Copy link
Copy Markdown
Contributor

@loewenheim loewenheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you for the contribution! Can you add a changelog entry as well under Fixes? Something like

- DWARF: Correctly detect languages in LTO-compiled binaries ([#961](https://github.com/getsentry/symbolic/pull/961))

@nsavoire
Copy link
Copy Markdown
Contributor Author

Looks great, thank you for the contribution! Can you add a changelog entry as well under Fixes? Something like

- DWARF: Correctly detect languages in LTO-compiled binaries ([#961](https://github.com/getsentry/symbolic/pull/961))

Done !

@loewenheim loewenheim merged commit 7b096cb into getsentry:master Mar 23, 2026
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants