fix(debuginfo): correct language detection for LTO-compiled binaries#961
Conversation
LTO can produce compilation units whose DW_AT_language does not reflect the true source language of the functions they contain. Two cases arise: 1. Artificial LTO CUs (e.g. artificial CU with C++ language tag that contains C functions): a top-level subprogram in such a CU carries a cross-unit DW_AT_abstract_origin pointing to the real CU. We now follow that reference in resolve_function_language to pick up the origin CU's language, which is then used for the symbol-table name, DWARF name, and fallback name of the function. 2. Cross-language LTO inlinees (e.g. a C function inlined into Rust): the inlinee's DW_AT_abstract_origin references the C CU directly. resolve_function_name now reads the referenced CU's language via UnitRef::language() whenever it follows an abstract_origin across a unit boundary, overriding the language supplied by the caller. To propagate the correctly-resolved language to all inlinees of a top-level subprogram, parse_function passes it down through parse_function_children and parse_inlinee. Same-unit abstract_origin references (LTO partial units without a further cross-unit link) keep the enclosing function's language as a fallback, which is correct for the common case where all code in an LTO CU shares the same language.
I included two binaries that cover these cases:
Doing a lookup of address With this change, it would return the correct language (C):
C function With this change, it would return C: |
a49b0ee to
bf2dfe9
Compare
loewenheim
left a comment
There was a problem hiding this comment.
Looks great, thank you for the contribution! Can you add a changelog entry as well under Fixes? Something like
- DWARF: Correctly detect languages in LTO-compiled binaries ([#961](https://github.com/getsentry/symbolic/pull/961))
Done ! |
LTO can produce compilation units whose DW_AT_language does not reflect the true source language of the functions they contain. Two cases arise:
Artificial LTO CUs (e.g. artificial CU with C++ language tag that contains C functions):
a top-level subprogram in such a CU carries a cross-unit DW_AT_abstract_origin pointing to the real CU. We now follow that reference in resolve_function_language to pick up the origin CU's language, which is then used for the symbol-table name, DWARF name, and fallback name of the function.
Cross-language LTO inlinees (e.g. a C function inlined into Rust): the inlinee's DW_AT_abstract_origin references the C CU directly. resolve_function_name now reads the referenced CU's language via UnitRef::language() whenever it follows an abstract_origin across a unit boundary, overriding the language supplied by the caller.
To propagate the correctly-resolved language to all inlinees of a top-level subprogram, parse_function passes it down through parse_function_children and parse_inlinee. Same-unit abstract_origin references (LTO partial units without a further cross-unit link) keep the enclosing function's language as a fallback, which is correct for the common case where all code in an LTO CU shares the same language.