[lld][WebAssembly] Drop llvm_cov sections when llvm_prf segments are discarded#172023
[lld][WebAssembly] Drop llvm_cov sections when llvm_prf segments are discarded#172023Spxg wants to merge 3 commits into
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-backend-webassembly Author: None (Spxg) ChangesWhile enabling coverage instrumentation for a Rust program, I encountered llvm-cov exiting with an error: "malformed instrumentation profile data: function name is empty". I found that the Rust linker defaults to using Every covfun record holds a hash of its symbol name, and However, WASM stores This flag records whether it is an Full diff: https://github.com/llvm/llvm-project/pull/172023.diff 4 Files Affected:
diff --git a/lld/wasm/InputChunks.h b/lld/wasm/InputChunks.h
index 1fe78d76631f1..7cc49da1064fc 100644
--- a/lld/wasm/InputChunks.h
+++ b/lld/wasm/InputChunks.h
@@ -83,6 +83,9 @@ class InputChunk {
bool isTLS() const { return flags & llvm::wasm::WASM_SEG_FLAG_TLS; }
bool isRetained() const { return flags & llvm::wasm::WASM_SEG_FLAG_RETAIN; }
+ bool isInstrProfSegment() const {
+ return flags & llvm::wasm::WASM_SEG_FLAG_PRF;
+ }
ObjFile *file;
OutputSection *outputSec = nullptr;
diff --git a/lld/wasm/InputFiles.cpp b/lld/wasm/InputFiles.cpp
index 387b5eb10ba2f..fedaa4b737e07 100644
--- a/lld/wasm/InputFiles.cpp
+++ b/lld/wasm/InputFiles.cpp
@@ -577,6 +577,23 @@ void ObjFile::parse(bool ignoreComdats) {
if (!seg->isTLS() &&
(seg->name.starts_with(".tdata") || seg->name.starts_with(".tbss")))
seg->flags |= WASM_SEG_FLAG_TLS;
+
+ // Every covfun record holds a hash of its symbol name, and `llvm-cov`
+ // will exit fatally if it can't resolve that hash back to an entry in
+ // the binary's `__llvm_prf_names` linker section.
+ //
+ // However, WASM stores `__llvm_covfun` in `CustomSection`, while
+ // `__llvm_prf_names` is stored in the `DATA` section. The former cannot be
+ // GC, whereas the latter may be GC, causing `llvm-cov` execution to fail.
+ //
+ // This flag records whether it is an `InstrProfSegment` so that it can be
+ // enqueued during GC.
+ if (seg->name == getInstrProfSectionName(IPSK_name, Triple::Wasm) ||
+ seg->name == getInstrProfSectionName(IPSK_cnts, Triple::Wasm) ||
+ seg->name == getInstrProfSectionName(IPSK_data, Triple::Wasm)) {
+ seg->flags |= WASM_SEG_FLAG_PRF;
+ }
+
segments.emplace_back(seg);
}
setRelocs(segments, dataSection);
diff --git a/lld/wasm/MarkLive.cpp b/lld/wasm/MarkLive.cpp
index 2b2cf19f14b30..c51214b6920cf 100644
--- a/lld/wasm/MarkLive.cpp
+++ b/lld/wasm/MarkLive.cpp
@@ -43,6 +43,7 @@ class MarkLive {
void enqueue(InputChunk *chunk);
void enqueueInitFunctions(const ObjFile *sym);
void enqueueRetainedSegments(const ObjFile *file);
+ void enqueueInstrProfSegments(const ObjFile *file);
void mark();
bool isCallCtorsLive();
@@ -104,6 +105,13 @@ void MarkLive::enqueueRetainedSegments(const ObjFile *file) {
enqueue(chunk);
}
+// Mark instr profile segments.
+void MarkLive::enqueueInstrProfSegments(const ObjFile *file) {
+ for (InputChunk *chunk : file->segments)
+ if (chunk->isInstrProfSegment())
+ enqueue(chunk);
+}
+
void MarkLive::run() {
// Add GC root symbols.
if (!ctx.arg.entry.empty())
@@ -117,7 +125,9 @@ void MarkLive::run() {
if (ctx.sym.callDtors)
enqueue(ctx.sym.callDtors);
- for (const ObjFile *obj : ctx.objectFiles)
+ for (const ObjFile *obj : ctx.objectFiles) {
+ // Enqueue instr profile segments
+ enqueueInstrProfSegments(obj);
if (obj->isLive()) {
// Enqueue constructors in objects explicitly live from the command-line.
enqueueInitFunctions(obj);
@@ -125,6 +135,7 @@ void MarkLive::run() {
// command-line.
enqueueRetainedSegments(obj);
}
+ }
mark();
diff --git a/llvm/include/llvm/BinaryFormat/Wasm.h b/llvm/include/llvm/BinaryFormat/Wasm.h
index cf90a43d0d7e8..1f76330a29efb 100644
--- a/llvm/include/llvm/BinaryFormat/Wasm.h
+++ b/llvm/include/llvm/BinaryFormat/Wasm.h
@@ -229,6 +229,7 @@ enum WasmSegmentFlag : unsigned {
WASM_SEG_FLAG_STRINGS = 0x1,
WASM_SEG_FLAG_TLS = 0x2,
WASM_SEG_FLAG_RETAIN = 0x4,
+ WASM_SEG_FLAG_PRF = 0x8,
};
// Kinds of tag attributes.
|
|
@llvm/pr-subscribers-lld-wasm Author: None (Spxg) ChangesWhile enabling coverage instrumentation for a Rust program, I encountered llvm-cov exiting with an error: "malformed instrumentation profile data: function name is empty". I found that the Rust linker defaults to using Every covfun record holds a hash of its symbol name, and However, WASM stores This flag records whether it is an Full diff: https://github.com/llvm/llvm-project/pull/172023.diff 4 Files Affected:
diff --git a/lld/wasm/InputChunks.h b/lld/wasm/InputChunks.h
index 1fe78d76631f1..7cc49da1064fc 100644
--- a/lld/wasm/InputChunks.h
+++ b/lld/wasm/InputChunks.h
@@ -83,6 +83,9 @@ class InputChunk {
bool isTLS() const { return flags & llvm::wasm::WASM_SEG_FLAG_TLS; }
bool isRetained() const { return flags & llvm::wasm::WASM_SEG_FLAG_RETAIN; }
+ bool isInstrProfSegment() const {
+ return flags & llvm::wasm::WASM_SEG_FLAG_PRF;
+ }
ObjFile *file;
OutputSection *outputSec = nullptr;
diff --git a/lld/wasm/InputFiles.cpp b/lld/wasm/InputFiles.cpp
index 387b5eb10ba2f..fedaa4b737e07 100644
--- a/lld/wasm/InputFiles.cpp
+++ b/lld/wasm/InputFiles.cpp
@@ -577,6 +577,23 @@ void ObjFile::parse(bool ignoreComdats) {
if (!seg->isTLS() &&
(seg->name.starts_with(".tdata") || seg->name.starts_with(".tbss")))
seg->flags |= WASM_SEG_FLAG_TLS;
+
+ // Every covfun record holds a hash of its symbol name, and `llvm-cov`
+ // will exit fatally if it can't resolve that hash back to an entry in
+ // the binary's `__llvm_prf_names` linker section.
+ //
+ // However, WASM stores `__llvm_covfun` in `CustomSection`, while
+ // `__llvm_prf_names` is stored in the `DATA` section. The former cannot be
+ // GC, whereas the latter may be GC, causing `llvm-cov` execution to fail.
+ //
+ // This flag records whether it is an `InstrProfSegment` so that it can be
+ // enqueued during GC.
+ if (seg->name == getInstrProfSectionName(IPSK_name, Triple::Wasm) ||
+ seg->name == getInstrProfSectionName(IPSK_cnts, Triple::Wasm) ||
+ seg->name == getInstrProfSectionName(IPSK_data, Triple::Wasm)) {
+ seg->flags |= WASM_SEG_FLAG_PRF;
+ }
+
segments.emplace_back(seg);
}
setRelocs(segments, dataSection);
diff --git a/lld/wasm/MarkLive.cpp b/lld/wasm/MarkLive.cpp
index 2b2cf19f14b30..c51214b6920cf 100644
--- a/lld/wasm/MarkLive.cpp
+++ b/lld/wasm/MarkLive.cpp
@@ -43,6 +43,7 @@ class MarkLive {
void enqueue(InputChunk *chunk);
void enqueueInitFunctions(const ObjFile *sym);
void enqueueRetainedSegments(const ObjFile *file);
+ void enqueueInstrProfSegments(const ObjFile *file);
void mark();
bool isCallCtorsLive();
@@ -104,6 +105,13 @@ void MarkLive::enqueueRetainedSegments(const ObjFile *file) {
enqueue(chunk);
}
+// Mark instr profile segments.
+void MarkLive::enqueueInstrProfSegments(const ObjFile *file) {
+ for (InputChunk *chunk : file->segments)
+ if (chunk->isInstrProfSegment())
+ enqueue(chunk);
+}
+
void MarkLive::run() {
// Add GC root symbols.
if (!ctx.arg.entry.empty())
@@ -117,7 +125,9 @@ void MarkLive::run() {
if (ctx.sym.callDtors)
enqueue(ctx.sym.callDtors);
- for (const ObjFile *obj : ctx.objectFiles)
+ for (const ObjFile *obj : ctx.objectFiles) {
+ // Enqueue instr profile segments
+ enqueueInstrProfSegments(obj);
if (obj->isLive()) {
// Enqueue constructors in objects explicitly live from the command-line.
enqueueInitFunctions(obj);
@@ -125,6 +135,7 @@ void MarkLive::run() {
// command-line.
enqueueRetainedSegments(obj);
}
+ }
mark();
diff --git a/llvm/include/llvm/BinaryFormat/Wasm.h b/llvm/include/llvm/BinaryFormat/Wasm.h
index cf90a43d0d7e8..1f76330a29efb 100644
--- a/llvm/include/llvm/BinaryFormat/Wasm.h
+++ b/llvm/include/llvm/BinaryFormat/Wasm.h
@@ -229,6 +229,7 @@ enum WasmSegmentFlag : unsigned {
WASM_SEG_FLAG_STRINGS = 0x1,
WASM_SEG_FLAG_TLS = 0x2,
WASM_SEG_FLAG_RETAIN = 0x4,
+ WASM_SEG_FLAG_PRF = 0x8,
};
// Kinds of tag attributes.
|
5ca22ad to
dc84939
Compare
c94ec45 to
cba9c10
Compare
|
OK, this change looks good now. Can we add a test though perhaps? |
d65c4a8 to
5127a43
Compare
Done! minimal example: // instr-prof.c
int __llvm_profile_runtime = 0;// malformed-prf1.c
extern void bar();
void foo() { bar(); }// malformed-prf2.c
void bar() {}clang -fprofile-instr-generate -fcoverage-mapping instr-prof.c malformed-prf1.c malformed-prf2.c -c -target wasm32-unknown-unknown
wasm-ld instr-prof.o malformed-prf1.o --start-lib malformed-prf2.o --end-lib --gc-sections --no-entry -o x.wasm
llvm-cov export --object x.wasm --empty-profile
# error: failed to load coverage: 'x.wasm': malformed instrumentation profile data: function name is empty
# error: could not load coverage information |
__llvm_covfun and __llvm_covmap if __llvm_prf_names was discarded
258ab97 to
3112c24
Compare
__llvm_covfun and __llvm_covmap if __llvm_prf_names was discarded
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) lldlld.wasm/malformed-prf.llIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
🪟 Windows x64 Test Results
Failed Tests(click on a test name to see its output) lldlld.wasm/malformed-prf.llIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
sbc100
left a comment
There was a problem hiding this comment.
It seems a little unfortunate to have to hack wasm-ld to support these llvm-specifics.
I wonder if we can find a way to avoid this in long term? For now it seems like this is needed to fix the bug in question though, so lgtm.
|
|
||
| !0 = !{i32 2, !"EnableValueProfiling", i32 0} | ||
| !1 = !{i32 1, !"wchar_size", i32 4} | ||
| !2 = !{!"clang version 21.1.6"} |
There was a problem hiding this comment.
Can we minify these .ll files by removing all the superfluous attributes?
There was a problem hiding this comment.
I generated it via clang -S -emit-llvm, I think we shouldn't make any changes, what makes me curious is why people don't use .c for testing, it's smaller and clearer.
There was a problem hiding this comment.
The reason is that we want to keep the system-under-test as small as possible. If we used clang -S -emit-llvm then changed to clang could effect lld tests, which would make them way less stable, and make test a failure way more involved to debug.
Even using llvm ir here is kind sub-optimal because it means the tests depends on large parts of llvm includes instruction selelction. Tests written in .s format depend only the MC layer and are far more stable/simple.
There was a problem hiding this comment.
Thanks for your answers! Could you trigger the CI task to run again? It seems the last run failed due to the missing llvm-cov command.
|
Special-casing LLVM code coverage sections feels like the wrong direction. Instead, we should use section metadata to explicitly encode dependency relationships (and propose adding this to the wasm binary format if it's currently lacking). For a solid reference, I recommend looking into Mach-O live_support, which handles similar linker garbage collection challenges. https://maskray.me/blog/2021-02-28-linker-garbage-collection |
That sounds like a long-term solution. It’s probably not this PR can address. @sbc100, what do you think? |
…stom sections wasm-ld silently drops user-defined wasm custom sections (wasm-bindgen descriptors, wit-bindgen component metadata, coverage / sanitizer sections, __llvm_prf_*, etc.) from archive members that have no externally-referenced symbol. Custom sections have no symbol-table entries by construction, so symbol-driven archive extraction cannot reach them: the archive member stays lazy and the section is silently dropped along with it. This affects builds with or without --gc-sections, since the bug is in archive lazy-extraction, not GC. The fix teaches ObjFile::parseLazy to scan top-level sections for WASM_SEC_CUSTOM entries, ignoring custom sections that the linker synthesizes or owns (linking, name, producers, target_features, reloc.*, .llvmbc, .llvmcmd, .debug_*, dylink*). If a user custom section is present, the member is eagerly extracted (lazy = false; markLive(); addFile(this)). This generalizes the case-specific approach in llvm#172023 (which added WASM_SEG_FLAG_PRF for __llvm_prf_* sections), addressing the comment by @sbc100 on that PR asking for a less special-cased solution. The PRF special case could later be retired in favor of this general mechanism. Tested end-to-end with the wasm-bindgen raytrace-parallel example, which previously failed with `error: import of __wbg_get_* doesn't have an adapter listed` and now builds successfully. Fixes llvm#200083. Note: this PR was authored by an LLM (Claude/OpenCode) under human direction, with human review of the diagnosis, implementation, and tests before submission.
While enabling coverage instrumentation for a Rust program, I encountered llvm-cov exiting with an error: "malformed instrumentation profile data: function name is empty". I found that the Rust linker defaults to using
--gc-sections. The issue is resolved by setting--no-gc-sections.Every covfun record holds a hash of its symbol name, and
llvm-covwill exit fatally if it can't resolve that hash back to an entry in the binary's__llvm_prf_nameslinker section.However, WASM stores
__llvm_covfuninCustomSection, while__llvm_prf_namesis stored in theDATAsection. The former not be GC, whereas the latter may be GC, causingllvm-covexecution to fail.This PR makes
__llvm_covfunand__llvm_covmapwill be discarded if the object is not live.resolve #192833