Skip to content

Avoid dlclose on glibc 2.34-2.36#7125

Merged
andrewlock merged 9 commits into
masterfrom
andrew/glibc-check
Jun 24, 2025
Merged

Avoid dlclose on glibc 2.34-2.36#7125
andrewlock merged 9 commits into
masterfrom
andrew/glibc-check

Conversation

@andrewlock

@andrewlock andrewlock commented Jun 20, 2025

Copy link
Copy Markdown
Member

Summary of changes

Reason for change

We want to make sure we only load the tracer/profiler libraries after we have run guardrails checks. However, repeated attempts to do this were causing crashes in our smoke tests on Fedora 35, on arm64 only. After an (extensive) investigation, we finally tracked this down to a bug in glibc itself. The full explanation is given below, but the upshot is that calling dlclose on this buggy glibc version can cause crashes at some later point.

Implementation details

  • Includes the "original" implementation that moves the tracer/profiler after guardrails checks
  • Before calling dlclose, check to see if we're on a buggy version of glibc
    • This is complicated by the fact we run the same binary on glibc and musl, so we have to make the glibc call dynamically, instead of linking directly against the gnu_get_libc_version method.
    • We avoid trying to load libc if we detect that we're on Alpine by checking for /etc/alpine-release. This is slightly annoying, but required in the native loader code.
    • If we do manage to call the glibc version, we check against the blocklist. If the version is on it, we avoid ever calling dlclose.

Test coverage

We have run repeated tests against the previously crashing tests, and this has resolved the issue. Nevertheless, our recommendation for customers should definitely be to upgrade to a stable version of glibc wherever possible

Other details

Overview of the bug:
In certain versions of glibc, there is a TLS-reuse bug that can cause crashes when unloading shared libraries. The bug was introduced in 2.34, fixed in 2.36 on x86-64, and fixed in 2.37 on aarch64.
See the bug here or the explanation of the bug on Fedora (which is where we saw the crashing issue).

glibc 2.34 shipped with a regression: after a dlclose() of a library that carried dynamic-TLS, the loader could reuse the same "module-ID" for a different library without first clearing the associated DTV (Dynamic Thread Vector) entry. The next time any code accessed that TLS slot it could read or write an unmapped address, which cases a SIGSEGV.

This manifested as a crash in the WAF when we called ddwaf_context_info on arm64. It explicitly happens on arm64 when we unload the continuous profiler shortly after loading it (which is normal because it's not supported).

It manifests in this scenario because ddwaf_context_init starts like this:

+128  bl   __tls_get_addr     ; ask glibc for the TLS slot for libddwaf
+136  mrs  x11, TPIDR_EL0     ; TLS base for this thread
+140  ldrb w9, [x11, x0]      ; <–– boom if x0 points to a stale DTV entry

When we unload the continuous profiler and call dlcose, it causes the glibc loader to hand out a recycled module-ID to libddwaf. When ddwaf_context_init tries to access TLS in the ldrb instruction, x11 + x0 is outside every mapped version, and so it crashes.

Note that although calling dlclose with the continuous profiler may trigger the issue (the actual crash is flaky depending on load/unload timing and address layout), unloading any library that is is built with __thread/thread_local data could trigger the crash. To minimize the risk of hitting this issue, we avoid calling dlclose entirely on the flaky glibc versions

For more details see doc

Affected distros:

Distribution Version / Release glibc Version
Fedora 35 2.34
Fedora 37 2.36
Ubuntu 21.10 ("Impish Indri") 2.34
Ubuntu 22.04 LTS ("Jammy Jellyfish") 2.35
Debian 12 ("Bookworm") 2.36
RHEL 9 2.34
CentOS 9 2.34
Amazon Linux 2023 default 2.34

"' due to buggy dlclose implementation on this system.",
"GLIBC version 2.34-2.36 has a TLS-reuse bug that can cause crashes when unloading"
" shared libraries. Consider updating the installed version of glibc. Found GLIBC version: ", glibc_version);
return false;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behaviour of the loader when we return false (failure) here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, we currently only call Unload() here:

And we ignore the return value there so it's fine 🤷‍♂️

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we at least start logging this value, or we just don't care?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️ I don't know tbh, I don't care 😂 As it's an existing issue, I'm inclined to punt on that question for a separate PR

@tonyredondo tonyredondo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, it looks correct. One thing missing is reporting the DD_INTERNAL_GLIBC_VERSION value to the telemetry or similar.

Another thing is, I guess we behave just fine in cases where this is used: https://github.com/Stantheman/gcompat
If not I guess it doesn't matter because we actually releases something for musl, we can just say we don't support that kind of stuff and use the musl version of the library.

Comment on lines +371 to +378
if (!is_buggy) {
// Not buggy, so we can close the handle
dlclose(handle);
return std::make_tuple(false, ::shared::ToWSTRING(version));
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahaha

@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 20, 2025

Copy link
Copy Markdown

Datadog Report

All test runs 4ae1189 🔗

2 Total Test Services: 0 Failed, 2 Passed

Test Services
Service Name Failed Known Flaky New Flaky Passed Skipped Total Time Test Service View
dd-trace-dotnet 0 0 0 262230 2926 39h 38m 15.1s Link
exploration_tests 0 0 0 22085 3 9m 57.61s Link

@pr-commenter

pr-commenter Bot commented Jun 20, 2025

Copy link
Copy Markdown

Benchmarks

Benchmarks Report for benchmark platform 🐌

Benchmarks for #7125 compared to master:

  • 2 benchmarks are slower, with geometric mean 1.272
  • 44 benchmarks have fewer allocations
  • 6 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.ActivityBenchmark.StartStopWithChild‑net472 6.09 KB 6.06 KB -32 B -0.53%
Benchmarks.Trace.ActivityBenchmark.StartStopWithChild‑netcoreapp3.1 5.75 KB 5.69 KB -68 B -1.18%
Benchmarks.Trace.ActivityBenchmark.StartStopWithChild‑net6.0 5.58 KB 5.51 KB -69 B -1.24%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 11.2μs 59.7ns 322ns 0 0 0 5.58 KB
master StartStopWithChild netcoreapp3.1 14.1μs 67.5ns 286ns 0 0 0 5.75 KB
master StartStopWithChild net472 22.1μs 119ns 651ns 1.03 0.411 0.103 6.09 KB
#7125 StartStopWithChild net6.0 10.4μs 54.8ns 279ns 0 0 0 5.51 KB
#7125 StartStopWithChild netcoreapp3.1 14.7μs 60.9ns 236ns 0 0 0 5.69 KB
#7125 StartStopWithChild net472 22μs 102ns 421ns 0.926 0.347 0 6.06 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑net472 3.33 KB 3.31 KB -23 B -0.69%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 927μs 26.6ns 103ns 0 0 0 2.71 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 1.04ms 404ns 1.56μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 1.25ms 357ns 1.38μs 0 0 0 3.33 KB
#7125 WriteAndFlushEnrichedTraces net6.0 941μs 1.01μs 3.9μs 0 0 0 2.71 KB
#7125 WriteAndFlushEnrichedTraces netcoreapp3.1 1.02ms 89.8ns 324ns 0 0 0 2.7 KB
#7125 WriteAndFlushEnrichedTraces net472 1.2ms 300ns 1.16μs 0 0 0 3.31 KB
Benchmarks.Trace.Asm.AppSecBodyBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody‑net472 236.35 KB 239.64 KB 3.28 KB 1.39%
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody‑net472 239.87 KB 243.15 KB 3.28 KB 1.37%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master AllCycleSimpleBody net6.0 331μs 1.76μs 8.42μs 0 0 0 197.06 KB
master AllCycleSimpleBody netcoreapp3.1 510μs 1.45μs 5.61μs 0 0 0 204.77 KB
master AllCycleSimpleBody net472 436μs 119ns 460ns 36.6 2.16 0 236.35 KB
master AllCycleMoreComplexBody net6.0 338μs 1.76μs 8.8μs 0 0 0 200.56 KB
master AllCycleMoreComplexBody netcoreapp3.1 495μs 987ns 3.56μs 0 0 0 208.18 KB
master AllCycleMoreComplexBody net472 446μs 106ns 412ns 36.6 2.16 0 239.87 KB
master ObjectExtractorSimpleBody net6.0 311ns 1.77ns 12.3ns 0 0 0 280 B
master ObjectExtractorSimpleBody netcoreapp3.1 409ns 1.93ns 8.2ns 0 0 0 272 B
master ObjectExtractorSimpleBody net472 303ns 0.175ns 0.676ns 0.0442 0 0 281 B
master ObjectExtractorMoreComplexBody net6.0 6.52μs 29.5ns 110ns 0 0 0 3.78 KB
master ObjectExtractorMoreComplexBody netcoreapp3.1 7.76μs 36.2ns 140ns 0 0 0 3.69 KB
master ObjectExtractorMoreComplexBody net472 6.66μs 0.89ns 3.33ns 0.599 0 0 3.8 KB
#7125 AllCycleSimpleBody net6.0 329μs 454ns 1.76μs 0 0 0 197.6 KB
#7125 AllCycleSimpleBody netcoreapp3.1 472μs 1.6μs 6.2μs 0 0 0 205.35 KB
#7125 AllCycleSimpleBody net472 444μs 180ns 697ns 36.6 2.16 0 239.64 KB
#7125 AllCycleMoreComplexBody net6.0 341μs 416ns 1.55μs 0 0 0 201.1 KB
#7125 AllCycleMoreComplexBody netcoreapp3.1 522μs 2.51μs 10μs 0 0 0 208.77 KB
#7125 AllCycleMoreComplexBody net472 455μs 340ns 1.32μs 37.9 2.23 0 243.15 KB
#7125 ObjectExtractorSimpleBody net6.0 329ns 1.63ns 6.71ns 0 0 0 280 B
#7125 ObjectExtractorSimpleBody netcoreapp3.1 398ns 2.21ns 13.5ns 0 0 0 272 B
#7125 ObjectExtractorSimpleBody net472 305ns 0.0174ns 0.0627ns 0.0444 0 0 281 B
#7125 ObjectExtractorMoreComplexBody net6.0 6.59μs 3.05ns 11.8ns 0 0 0 3.78 KB
#7125 ObjectExtractorMoreComplexBody netcoreapp3.1 7.59μs 33.3ns 129ns 0 0 0 3.69 KB
#7125 ObjectExtractorMoreComplexBody net472 6.94μs 6.57ns 25.5ns 0.591 0 0 3.8 KB
Benchmarks.Trace.Asm.AppSecEncoderBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs‑net6.0 2.16 KB 2.14 KB -12 B -0.56%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EncodeArgs net6.0 73.3μs 263ns 984ns 0 0 0 32.41 KB
master EncodeArgs netcoreapp3.1 95.7μs 36.2ns 130ns 0 0 0 32.4 KB
master EncodeArgs net472 107μs 17.6ns 65.8ns 4.82 0 0 32.51 KB
master EncodeLegacyArgs net6.0 143μs 122ns 472ns 0 0 0 2.16 KB
master EncodeLegacyArgs netcoreapp3.1 197μs 42.5ns 147ns 0 0 0 2.14 KB
master EncodeLegacyArgs net472 261μs 66.8ns 250ns 0 0 0 2.16 KB
#7125 EncodeArgs net6.0 72.8μs 351ns 1.36μs 0 0 0 32.4 KB
#7125 EncodeArgs netcoreapp3.1 95.2μs 33.2ns 128ns 0 0 0 32.4 KB
#7125 EncodeArgs net472 106μs 19.2ns 74.4ns 4.75 0 0 32.51 KB
#7125 EncodeLegacyArgs net6.0 146μs 289ns 1.12μs 0 0 0 2.14 KB
#7125 EncodeLegacyArgs netcoreapp3.1 198μs 176ns 636ns 0 0 0 2.14 KB
#7125 EncodeLegacyArgs net472 261μs 56.1ns 217ns 0 0 0 2.15 KB
Benchmarks.Trace.Asm.AppSecWafBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunWafRealisticBenchmark net6.0 271μs 133ns 499ns 0 0 0 4.55 KB
master RunWafRealisticBenchmark netcoreapp3.1 294μs 264ns 989ns 0 0 0 4.48 KB
master RunWafRealisticBenchmark net472 307μs 38.1ns 147ns 0 0 0 4.66 KB
master RunWafRealisticBenchmarkWithAttack net6.0 181μs 80.1ns 300ns 0 0 0 2.24 KB
master RunWafRealisticBenchmarkWithAttack netcoreapp3.1 198μs 104ns 404ns 0 0 0 2.22 KB
master RunWafRealisticBenchmarkWithAttack net472 207μs 45.3ns 169ns 0 0 0 2.28 KB
#7125 RunWafRealisticBenchmark net6.0 273μs 63.2ns 245ns 0 0 0 4.55 KB
#7125 RunWafRealisticBenchmark netcoreapp3.1 292μs 58.6ns 227ns 0 0 0 4.48 KB
#7125 RunWafRealisticBenchmark net472 309μs 54.1ns 195ns 0 0 0 4.66 KB
#7125 RunWafRealisticBenchmarkWithAttack net6.0 182μs 74.1ns 267ns 0 0 0 2.24 KB
#7125 RunWafRealisticBenchmarkWithAttack netcoreapp3.1 194μs 26.3ns 91.2ns 0 0 0 2.22 KB
#7125 RunWafRealisticBenchmarkWithAttack net472 207μs 33.2ns 129ns 0 0 0 2.28 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 61.1μs 29ns 104ns 0 0 0 14.53 KB
master SendRequest netcoreapp3.1 69.9μs 118ns 440ns 0 0 0 17.42 KB
master SendRequest net472 0.0171ns 0.00147ns 0.00551ns 0 0 0 0 b
#7125 SendRequest net6.0 62.1μs 169ns 653ns 0 0 0 14.52 KB
#7125 SendRequest netcoreapp3.1 70.6μs 64ns 248ns 0 0 0 17.42 KB
#7125 SendRequest net472 0.0144ns 0.00233ns 0.00901ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Slower ⚠️ More allocations ⚠️

Slower ⚠️ in #7125

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1 1.316 631,437.50 830,730.88

More allocations ⚠️ in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑net472 55.75 KB 56.46 KB 704 B 1.26%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 645μs 512ns 1.98μs 0 0 0 41.73 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 631μs 1.67μs 6.67μs 0 0 0 41.89 KB
master WriteAndFlushEnrichedTraces net472 923μs 1.93μs 7.2μs 4.46 0 0 55.75 KB
#7125 WriteAndFlushEnrichedTraces net6.0 630μs 441ns 1.76μs 0 0 0 41.74 KB
#7125 WriteAndFlushEnrichedTraces netcoreapp3.1 813μs 6.01μs 59.8μs 0 0 0 42.07 KB
#7125 WriteAndFlushEnrichedTraces net472 837μs 2.59μs 10μs 8.33 0 0 56.46 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑net6.0 1.03 KB 1.02 KB -8 B -0.78%
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑netcoreapp3.1 1.02 KB 1.02 KB -8 B -0.78%
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑net472 995 B 987 B -8 B -0.80%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.99μs 10.2ns 46.8ns 0 0 0 1.03 KB
master ExecuteNonQuery netcoreapp3.1 2.53μs 3.43ns 13.3ns 0 0 0 1.02 KB
master ExecuteNonQuery net472 2.7μs 2.41ns 9.34ns 0.147 0.0134 0 995 B
#7125 ExecuteNonQuery net6.0 1.99μs 9.25ns 37ns 0 0 0 1.02 KB
#7125 ExecuteNonQuery netcoreapp3.1 2.62μs 8.82ns 34.2ns 0 0 0 1.02 KB
#7125 ExecuteNonQuery net472 2.89μs 6.41ns 24.8ns 0.146 0.0146 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net472 1.11 KB 1.1 KB -8 B -0.72%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑netcoreapp3.1 1.09 KB 1.08 KB -8 B -0.74%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑net472 1.05 KB 1.04 KB -8 B -0.76%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑net6.0 1.04 KB 1.03 KB -8 B -0.77%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑netcoreapp3.1 1.04 KB 1.03 KB -8 B -0.77%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net6.0 1.02 KB 1.01 KB -8 B -0.79%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.83μs 5.85ns 22.7ns 0 0 0 1.04 KB
master CallElasticsearch netcoreapp3.1 2.3μs 11.8ns 56.6ns 0 0 0 1.04 KB
master CallElasticsearch net472 3.55μs 3.05ns 11.8ns 0.159 0 0 1.05 KB
master CallElasticsearchAsync net6.0 1.81μs 3.89ns 14.5ns 0 0 0 1.02 KB
master CallElasticsearchAsync netcoreapp3.1 2.35μs 7.26ns 28.1ns 0 0 0 1.09 KB
master CallElasticsearchAsync net472 3.84μs 3.19ns 12.4ns 0.169 0 0 1.11 KB
#7125 CallElasticsearch net6.0 1.78μs 8.09ns 31.3ns 0 0 0 1.03 KB
#7125 CallElasticsearch netcoreapp3.1 2.3μs 10.8ns 41.9ns 0 0 0 1.03 KB
#7125 CallElasticsearch net472 3.67μs 4.78ns 17.9ns 0.163 0 0 1.04 KB
#7125 CallElasticsearchAsync net6.0 1.82μs 2.84ns 10.6ns 0 0 0 1.01 KB
#7125 CallElasticsearchAsync netcoreapp3.1 2.49μs 12.1ns 50.1ns 0 0 0 1.08 KB
#7125 CallElasticsearchAsync net472 3.72μs 6.03ns 23.3ns 0.167 0 0 1.1 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net6.0 960 B 952 B -8 B -0.83%
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑netcoreapp3.1 960 B 952 B -8 B -0.83%
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net472 923 B 915 B -8 B -0.87%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.78μs 9.27ns 43.5ns 0 0 0 960 B
master ExecuteAsync netcoreapp3.1 2.29μs 7.45ns 25.8ns 0 0 0 960 B
master ExecuteAsync net472 2.58μs 1.78ns 6.88ns 0.143 0 0 923 B
#7125 ExecuteAsync net6.0 1.89μs 2.15ns 8.05ns 0 0 0 952 B
#7125 ExecuteAsync netcoreapp3.1 2.33μs 6.32ns 24.5ns 0 0 0 952 B
#7125 ExecuteAsync net472 2.69μs 2.86ns 11.1ns 0.135 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 7.1μs 10.4ns 39ns 0 0 0 2.37 KB
master SendAsync netcoreapp3.1 8.69μs 14.1ns 54.6ns 0 0 0 2.9 KB
master SendAsync net472 12.5μs 10.3ns 38.5ns 0.498 0 0 3.19 KB
#7125 SendAsync net6.0 6.96μs 5.94ns 22.2ns 0 0 0 2.36 KB
#7125 SendAsync netcoreapp3.1 8.49μs 32.9ns 128ns 0 0 0 2.9 KB
#7125 SendAsync net472 12.6μs 5.84ns 21.9ns 0.503 0 0 3.18 KB
Benchmarks.Trace.Iast.StringAspectsBenchmark - Slower ⚠️ More allocations ⚠️

Slower ⚠️ in #7125

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark‑netcoreapp3.1 1.230 413,700.00 509,000.00

More allocations ⚠️ in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark‑net472 57.34 KB 65.54 KB 8.19 KB 14.29%
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark‑net6.0 259.96 KB 275.06 KB 15.1 KB 5.81%
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark‑netcoreapp3.1 42.64 KB 42.87 KB 232 B 0.54%

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark‑net6.0 43.83 KB 43.33 KB -504 B -1.15%
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark‑net472 286.72 KB 280.64 KB -6.08 KB -2.12%
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark‑netcoreapp3.1 274.93 KB 255.76 KB -19.17 KB -6.97%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StringConcatBenchmark net6.0 44.6μs 212ns 1.6μs 0 0 0 43.83 KB
master StringConcatBenchmark netcoreapp3.1 47.2μs 224ns 838ns 0 0 0 42.64 KB
master StringConcatBenchmark net472 56.8μs 259ns 968ns 0 0 0 57.34 KB
master StringConcatAspectBenchmark net6.0 458μs 1.08μs 3.89μs 0 0 0 259.96 KB
master StringConcatAspectBenchmark netcoreapp3.1 447μs 6.44μs 63.7μs 0 0 0 274.93 KB
master StringConcatAspectBenchmark net472 410μs 2.07μs 9.27μs 0 0 0 286.72 KB
#7125 StringConcatBenchmark net6.0 41.8μs 164ns 568ns 0 0 0 43.33 KB
#7125 StringConcatBenchmark netcoreapp3.1 51.2μs 382ns 3.63μs 0 0 0 42.87 KB
#7125 StringConcatBenchmark net472 58.4μs 179ns 672ns 0 0 0 65.54 KB
#7125 StringConcatAspectBenchmark net6.0 507μs 1.25μs 5.73μs 0 0 0 275.06 KB
#7125 StringConcatAspectBenchmark netcoreapp3.1 508μs 2.18μs 10.9μs 0 0 0 255.76 KB
#7125 StringConcatAspectBenchmark net472 409μs 2.25μs 13.3μs 0 0 0 280.64 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑net6.0 1.76 KB 1.7 KB -56 B -3.18%
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑netcoreapp3.1 1.76 KB 1.7 KB -56 B -3.18%
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑net472 1.69 KB 1.64 KB -56 B -3.31%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.63μs 2.6ns 10.1ns 0 0 0 1.76 KB
master EnrichedLog netcoreapp3.1 3.44μs 4.38ns 17ns 0 0 0 1.76 KB
master EnrichedLog net472 4.05μs 4.83ns 18.7ns 0.265 0 0 1.69 KB
#7125 EnrichedLog net6.0 2.57μs 0.944ns 3.4ns 0 0 0 1.7 KB
#7125 EnrichedLog netcoreapp3.1 3.59μs 18ns 84.6ns 0 0 0 1.7 KB
#7125 EnrichedLog net472 3.88μs 3.31ns 12.8ns 0.251 0 0 1.64 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Log4netBenchmark.EnrichedLog‑net472 4.57 KB 4.52 KB -55 B -1.20%
Benchmarks.Trace.Log4netBenchmark.EnrichedLog‑net6.0 4.37 KB 4.31 KB -56 B -1.28%
Benchmarks.Trace.Log4netBenchmark.EnrichedLog‑netcoreapp3.1 4.37 KB 4.31 KB -56 B -1.28%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 123μs 145ns 563ns 0 0 0 4.37 KB
master EnrichedLog netcoreapp3.1 126μs 330ns 1.24μs 0 0 0 4.37 KB
master EnrichedLog net472 167μs 161ns 603ns 0 0 0 4.57 KB
#7125 EnrichedLog net6.0 124μs 80.4ns 301ns 0 0 0 4.31 KB
#7125 EnrichedLog netcoreapp3.1 126μs 74.4ns 278ns 0 0 0 4.31 KB
#7125 EnrichedLog net472 169μs 37.3ns 134ns 0 0 0 4.52 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.NLogBenchmark.EnrichedLog‑net6.0 2.32 KB 2.26 KB -56 B -2.41%
Benchmarks.Trace.NLogBenchmark.EnrichedLog‑netcoreapp3.1 2.32 KB 2.26 KB -56 B -2.41%
Benchmarks.Trace.NLogBenchmark.EnrichedLog‑net472 2.14 KB 2.08 KB -56 B -2.62%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 4.94μs 19.3ns 74.6ns 0 0 0 2.32 KB
master EnrichedLog netcoreapp3.1 6.77μs 22.2ns 79.9ns 0 0 0 2.32 KB
master EnrichedLog net472 7.45μs 8.04ns 31.1ns 0.335 0 0 2.14 KB
#7125 EnrichedLog net6.0 4.9μs 23.5ns 91.1ns 0 0 0 2.26 KB
#7125 EnrichedLog netcoreapp3.1 6.77μs 19.1ns 68.8ns 0 0 0 2.26 KB
#7125 EnrichedLog net472 7.6μs 5.12ns 19.2ns 0.305 0 0 2.08 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.RedisBenchmark.SendReceive‑net472 1.21 KB 1.2 KB -8 B -0.66%
Benchmarks.Trace.RedisBenchmark.SendReceive‑net6.0 1.21 KB 1.2 KB -8 B -0.66%
Benchmarks.Trace.RedisBenchmark.SendReceive‑netcoreapp3.1 1.21 KB 1.2 KB -8 B -0.66%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 2.04μs 10.5ns 49.4ns 0 0 0 1.21 KB
master SendReceive netcoreapp3.1 2.53μs 11.8ns 47.4ns 0 0 0 1.21 KB
master SendReceive net472 3.28μs 2.71ns 10.5ns 0.178 0 0 1.21 KB
#7125 SendReceive net6.0 2.1μs 10.7ns 48.9ns 0 0 0 1.2 KB
#7125 SendReceive netcoreapp3.1 2.56μs 13.4ns 62.8ns 0 0 0 1.2 KB
#7125 SendReceive net472 3.13μs 4.57ns 17.7ns 0.187 0 0 1.2 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.SerilogBenchmark.EnrichedLog‑net472 2.08 KB 2.03 KB -56 B -2.69%
Benchmarks.Trace.SerilogBenchmark.EnrichedLog‑netcoreapp3.1 1.69 KB 1.63 KB -56 B -3.32%
Benchmarks.Trace.SerilogBenchmark.EnrichedLog‑net6.0 1.64 KB 1.58 KB -56 B -3.41%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 4.18μs 0.92ns 3.44ns 0 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 5.64μs 23ns 89.3ns 0 0 0 1.69 KB
master EnrichedLog net472 6.67μs 7.28ns 27.3ns 0.298 0 0 2.08 KB
#7125 EnrichedLog net6.0 4.08μs 7.05ns 24.4ns 0 0 0 1.58 KB
#7125 EnrichedLog netcoreapp3.1 5.52μs 20.3ns 78.7ns 0 0 0 1.63 KB
#7125 EnrichedLog net472 6.44μs 6.32ns 23.6ns 0.292 0 0 2.03 KB
Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net6.0 704 B 696 B -8 B -1.14%
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑netcoreapp3.1 704 B 696 B -8 B -1.14%
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net472 666 B 658 B -8 B -1.20%
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net472 586 B 578 B -8 B -1.37%
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 584 B 576 B -8 B -1.37%
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑netcoreapp3.1 584 B 576 B -8 B -1.37%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 751ns 3.97ns 22.4ns 0 0 0 584 B
master StartFinishSpan netcoreapp3.1 953ns 4.42ns 17.7ns 0 0 0 584 B
master StartFinishSpan net472 914ns 0.809ns 3.13ns 0.0912 0 0 586 B
master StartFinishScope net6.0 918ns 0.479ns 1.73ns 0 0 0 704 B
master StartFinishScope netcoreapp3.1 1.15μs 6.24ns 34.2ns 0 0 0 704 B
master StartFinishScope net472 1.09μs 0.174ns 0.652ns 0.104 0 0 666 B
#7125 StartFinishSpan net6.0 750ns 0.495ns 1.92ns 0 0 0 576 B
#7125 StartFinishSpan netcoreapp3.1 939ns 0.575ns 2.23ns 0 0 0 576 B
#7125 StartFinishSpan net472 907ns 0.433ns 1.56ns 0.0908 0 0 578 B
#7125 StartFinishScope net6.0 890ns 4.9ns 28.1ns 0 0 0 696 B
#7125 StartFinishScope netcoreapp3.1 1.14μs 6.03ns 28.9ns 0 0 0 696 B
#7125 StartFinishScope net472 1.11μs 0.176ns 0.66ns 0.1 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7125

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net6.0 704 B 697 B -7 B -0.99%
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑netcoreapp3.1 704 B 696 B -8 B -1.14%
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net472 666 B 658 B -8 B -1.20%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 1.02μs 5.36ns 26.8ns 0 0 0 704 B
master RunOnMethodBegin netcoreapp3.1 1.38μs 2.28ns 8.82ns 0 0 0 704 B
master RunOnMethodBegin net472 1.36μs 0.141ns 0.544ns 0.102 0 0 666 B
#7125 RunOnMethodBegin net6.0 1.05μs 5.48ns 26.9ns 0 0 0 697 B
#7125 RunOnMethodBegin netcoreapp3.1 1.38μs 6.9ns 28.4ns 0 0 0 696 B
#7125 RunOnMethodBegin net472 1.34μs 0.245ns 0.947ns 0.102 0 0 658 B

@andrewlock

Copy link
Copy Markdown
Member Author

One thing missing is reporting the DD_INTERNAL_GLIBC_VERSION value to the telemetry or similar.

Currently, we're only doing the work to try to grab the glibc_version if we need to call dlclose. That means the variable might only be set at an arbitrary point, and will make reporting it in telemetry unreliable.

Do you think it's worth us doing the work upfront as part of the native loader's load path to make sure this variable is set, so that we get reliable reporting? Is the overhead worth it? 🤔

Another thing is, I guess we behave just fine in cases where this is used: https://github.com/Stantheman/gcompat If not I guess it doesn't matter because we actually releases something for musl, we can just say we don't support that kind of stuff and use the musl version of the library.

Yeah, we essentially just don't support this at the moment. If you're running on musl, we'll load the musl binaries, but then we'll also hit the glibc path in this PR, which means we will incorrectly not close some libraries. I don't think that's a bigger enough thing to worry about, and if we want to change the behaviour, we could potentially move the glibc detection to be part of the IsAlpine() call, but I don't think we need to worry about it for now

@gleocadie gleocadie left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.
I'll look into the profiler for dlclose
The ApiWrapper does not call dlclose just wrap it

@andrewlock andrewlock changed the title Avoid dlclose on glibc 2.34-2.36 Avoid dlclose on glibc 2.34-2.36 Jun 23, 2025
@andrewlock andrewlock added the area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native) label Jun 23, 2025
@andrewlock andrewlock force-pushed the andrew/glibc-check branch from 942a772 to 4ae1189 Compare June 23, 2025 12:12
@andrewlock andrewlock changed the base branch from andrew/revert-revert-revert-revert to master June 23, 2025 12:12
@andrewlock andrewlock marked this pull request as ready for review June 23, 2025 12:12
@andrewlock andrewlock requested review from a team as code owners June 23, 2025 12:12

@daniel-romano-DD daniel-romano-DD left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

@andrewlock andrewlock merged commit 8ae2420 into master Jun 24, 2025
130 of 131 checks passed
@andrewlock andrewlock deleted the andrew/glibc-check branch June 24, 2025 09:59
@github-actions github-actions Bot added this to the vNext-v3 milestone Jun 24, 2025
lucaspimentel pushed a commit that referenced this pull request Jul 1, 2025
## Summary of changes

- Revert "Revert 'Reapply "Revert Load the tracer/profiler after
guardrails checks" (#6959)' (#6986)" #7051
- Avoid calling `dlclose` on glibc 2.34-2.36

## Reason for change

We want to make sure we only load the tracer/profiler libraries _after_
we have run guardrails checks. However, repeated attempts to do this
were causing crashes in our smoke tests on Fedora 35, on arm64 only.
After an (extensive) investigation, we finally tracked this down to a
bug in glibc itself. The full explanation is given below, but the upshot
is that calling `dlclose` on this buggy glibc version can cause crashes
at some later point.

## Implementation details

- Includes the "original" implementation that moves the tracer/profiler
after guardrails checks
- Before calling `dlclose`, check to see if we're on a buggy version of
glibc
- This is complicated by the fact we run the _same_ binary on glibc and
musl, so we have to make the glibc call _dynamically_, instead of
linking directly against the `gnu_get_libc_version` method.
- We avoid trying to load libc if we detect that we're on Alpine by
checking for `/etc/alpine-release`. This is slightly annoying, but
required in the native loader code.
- If we _do_ manage to call the glibc version, we check against the
blocklist. If the version is on it, we avoid ever calling dlclose.

## Test coverage

We have run repeated tests against the previously crashing tests, and
this has resolved the issue. Nevertheless, our recommendation for
customers should definitely be to upgrade to a stable version of glibc
wherever possible

## Other details

Overview of the bug: 
In certain versions of glibc, there is a TLS-reuse bug that can cause
crashes when unloading shared libraries. The bug was introduced in 2.34,
fixed in 2.36 on x86-64, and fixed in 2.37 on aarch64.
See [the bug
here](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3921c5b40f293c57cb326f58713c924b0662ef59)
or [the explanation of the bug on
Fedora](https://bugzilla.redhat.com/show_bug.cgi?id=2251557) (which is
where we saw the crashing issue).

glibc 2.34 shipped with a regression: after a `dlclose()` of a library
that carried dynamic-TLS, the loader could reuse the same "module-ID"
for a different library _without_ first clearing the associated DTV
(Dynamic Thread Vector) entry. The next time any code accessed that TLS
slot it could read or write an unmapped address, which cases a
`SIGSEGV`.

This manifested as a crash in the WAF when we called
`ddwaf_context_info` on arm64. It explicitly happens on arm64 when we
unload the continuous profiler shortly after loading it (which is normal
because it's not supported).

It manifests in this scenario because `ddwaf_context_init` starts like
this:

```assembly
+128  bl   __tls_get_addr     ; ask glibc for the TLS slot for libddwaf
+136  mrs  x11, TPIDR_EL0     ; TLS base for this thread
+140  ldrb w9, [x11, x0]      ; <–– boom if x0 points to a stale DTV entry
```

When we unload the continuous profiler and call `dlcose`, it causes the
glibc loader to hand out a recycled module-ID to `libddwaf`. When
`ddwaf_context_init` tries to access TLS in the `ldrb` instruction, `x11
+ x0` is outside every mapped version, and so it crashes.

Note that although calling `dlclose` with the continuous profiler _may_
trigger the issue (the actual crash is flaky depending on load/unload
timing and address layout), unloading _any_ library that is is built
with `__thread`/`thread_local` data could trigger the crash. To minimize
the risk of hitting this issue, we avoid calling `dlclose` entirely on
the flaky glibc versions

For more details [see
doc](https://docs.google.com/document/d/1aptwmprnd83VTZMKxrrTqmBF6eOayjCL36kj9LujCE8/edit?tab=t.0#heading=h.nytiofltvdb5)

Affected distros: 

| Distribution      | Version / Release             | glibc Version |
| ----------------- | ----------------------------- | ------------- |
| Fedora            | 35                            | 2.34          |
| Fedora            | 37                            | 2.36          |
| Ubuntu            | 21.10 ("Impish Indri")        | 2.34          |
| Ubuntu            | 22.04 LTS ("Jammy Jellyfish") | 2.35          |
| Debian            | 12 ("Bookworm")               | 2.36          |
| RHEL              | 9                             | 2.34          |
| CentOS            | 9                             | 2.34          |
| Amazon Linux 2023 | default                       | 2.34          |
andrewlock added a commit that referenced this pull request Jul 4, 2025
## Summary of changes

- Don't try to load the profiler on Linux arm64 at all
- Cherry pick #7153 to see if it resolves the issues

## Reason for change

#7153 ran into what we think is glibc `dlclose` issue (i.e. #7125).
Given the profiler always bails out on arm64 as it's not yet supported,
this is the easiest way to ensure that we don't load (and therefore,
more importantly, we don't _unload_ the profiler on arm64).

## Implementation details

Comment out the profiler from the loader.conf for arm64. That ensures we
don't _try_ to load it on arm64, while making it easy to reenable for
testing etc.

## Test coverage

Cherry picked the #7153 smoke tests, to confirm it resolves the issues

## Other details

Supersedes #7153 given that it enables the tests to verify the fix

---------

Co-authored-by: Flavien Darche <11708575+e-n-0@users.noreply.github.com>
andrewlock added a commit that referenced this pull request Jul 7, 2025
## Summary of changes

- Don't try to load the profiler on Linux arm64 at all
- Cherry pick #7153 to see if it resolves the issues

## Reason for change

#7153 ran into what we think is glibc `dlclose` issue (i.e. #7125).
Given the profiler always bails out on arm64 as it's not yet supported,
this is the easiest way to ensure that we don't load (and therefore,
more importantly, we don't _unload_ the profiler on arm64).

## Implementation details

Comment out the profiler from the loader.conf for arm64. That ensures we
don't _try_ to load it on arm64, while making it easy to reenable for
testing etc.

## Test coverage

Cherry picked the #7153 smoke tests, to confirm it resolves the issues

## Other details

Supersedes #7153 given that it enables the tests to verify the fix

---------

Co-authored-by: Flavien Darche <11708575+e-n-0@users.noreply.github.com>
chojomok pushed a commit that referenced this pull request Jul 15, 2025
## Summary of changes

- Revert "Revert 'Reapply "Revert Load the tracer/profiler after
guardrails checks" (#6959)' (#6986)" #7051
- Avoid calling `dlclose` on glibc 2.34-2.36

## Reason for change

We want to make sure we only load the tracer/profiler libraries _after_
we have run guardrails checks. However, repeated attempts to do this
were causing crashes in our smoke tests on Fedora 35, on arm64 only.
After an (extensive) investigation, we finally tracked this down to a
bug in glibc itself. The full explanation is given below, but the upshot
is that calling `dlclose` on this buggy glibc version can cause crashes
at some later point.

## Implementation details

- Includes the "original" implementation that moves the tracer/profiler
after guardrails checks
- Before calling `dlclose`, check to see if we're on a buggy version of
glibc
- This is complicated by the fact we run the _same_ binary on glibc and
musl, so we have to make the glibc call _dynamically_, instead of
linking directly against the `gnu_get_libc_version` method.
- We avoid trying to load libc if we detect that we're on Alpine by
checking for `/etc/alpine-release`. This is slightly annoying, but
required in the native loader code.
- If we _do_ manage to call the glibc version, we check against the
blocklist. If the version is on it, we avoid ever calling dlclose.

## Test coverage

We have run repeated tests against the previously crashing tests, and
this has resolved the issue. Nevertheless, our recommendation for
customers should definitely be to upgrade to a stable version of glibc
wherever possible

## Other details

Overview of the bug: 
In certain versions of glibc, there is a TLS-reuse bug that can cause
crashes when unloading shared libraries. The bug was introduced in 2.34,
fixed in 2.36 on x86-64, and fixed in 2.37 on aarch64.
See [the bug
here](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3921c5b40f293c57cb326f58713c924b0662ef59)
or [the explanation of the bug on
Fedora](https://bugzilla.redhat.com/show_bug.cgi?id=2251557) (which is
where we saw the crashing issue).

glibc 2.34 shipped with a regression: after a `dlclose()` of a library
that carried dynamic-TLS, the loader could reuse the same "module-ID"
for a different library _without_ first clearing the associated DTV
(Dynamic Thread Vector) entry. The next time any code accessed that TLS
slot it could read or write an unmapped address, which cases a
`SIGSEGV`.

This manifested as a crash in the WAF when we called
`ddwaf_context_info` on arm64. It explicitly happens on arm64 when we
unload the continuous profiler shortly after loading it (which is normal
because it's not supported).

It manifests in this scenario because `ddwaf_context_init` starts like
this:

```assembly
+128  bl   __tls_get_addr     ; ask glibc for the TLS slot for libddwaf
+136  mrs  x11, TPIDR_EL0     ; TLS base for this thread
+140  ldrb w9, [x11, x0]      ; <–– boom if x0 points to a stale DTV entry
```

When we unload the continuous profiler and call `dlcose`, it causes the
glibc loader to hand out a recycled module-ID to `libddwaf`. When
`ddwaf_context_init` tries to access TLS in the `ldrb` instruction, `x11
+ x0` is outside every mapped version, and so it crashes.

Note that although calling `dlclose` with the continuous profiler _may_
trigger the issue (the actual crash is flaky depending on load/unload
timing and address layout), unloading _any_ library that is is built
with `__thread`/`thread_local` data could trigger the crash. To minimize
the risk of hitting this issue, we avoid calling `dlclose` entirely on
the flaky glibc versions

For more details [see
doc](https://docs.google.com/document/d/1aptwmprnd83VTZMKxrrTqmBF6eOayjCL36kj9LujCE8/edit?tab=t.0#heading=h.nytiofltvdb5)

Affected distros: 

| Distribution      | Version / Release             | glibc Version |
| ----------------- | ----------------------------- | ------------- |
| Fedora            | 35                            | 2.34          |
| Fedora            | 37                            | 2.36          |
| Ubuntu            | 21.10 ("Impish Indri")        | 2.34          |
| Ubuntu            | 22.04 LTS ("Jammy Jellyfish") | 2.35          |
| Debian            | 12 ("Bookworm")               | 2.36          |
| RHEL              | 9                             | 2.34          |
| CentOS            | 9                             | 2.34          |
| Amazon Linux 2023 | default                       | 2.34          |
chojomok pushed a commit that referenced this pull request Jul 15, 2025
## Summary of changes

- Don't try to load the profiler on Linux arm64 at all
- Cherry pick #7153 to see if it resolves the issues

## Reason for change

#7153 ran into what we think is glibc `dlclose` issue (i.e. #7125).
Given the profiler always bails out on arm64 as it's not yet supported,
this is the easiest way to ensure that we don't load (and therefore,
more importantly, we don't _unload_ the profiler on arm64).

## Implementation details

Comment out the profiler from the loader.conf for arm64. That ensures we
don't _try_ to load it on arm64, while making it easy to reenable for
testing etc.

## Test coverage

Cherry picked the #7153 smoke tests, to confirm it resolves the issues

## Other details

Supersedes #7153 given that it enables the tests to verify the fix

---------

Co-authored-by: Flavien Darche <11708575+e-n-0@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants