Skip to content

Revert "Revert 'Reapply "Revert Load the tracer/profiler after guardrails checks" (#6959)' (#6986)"#7051

Closed
andrewlock wants to merge 2 commits into
masterfrom
andrew/revert-revert-revert-revert
Closed

Revert "Revert 'Reapply "Revert Load the tracer/profiler after guardrails checks" (#6959)' (#6986)"#7051
andrewlock wants to merge 2 commits into
masterfrom
andrew/revert-revert-revert-revert

Conversation

@andrewlock

Copy link
Copy Markdown
Member

Summary of changes

Reapply the loading the tracer/profiler after guardrails etc checks

Reason for change

We want to make sure we bail out before loading the tracer/profiler dlls to limit memory footprint.

Implementation details

Revert the revert that reverted the revert

Also apply @daniel-romano-DD's "turn it off and back on again" fix from #6982 when on linux arm64. This seems to resolve the issue (that we could only repro on fedora 35 on arm64) where the WAF would crash the installer smoke tests.

Test coverage

I've run both the arm64 installer tests 3 times with this PR, and not seen any crashes. Similarly, we ran the #6982 fix version several times with this fix and it always worked.

Other details

We're still trying to understand the source of the issue. Initial consideration was something tied to glibc or kernel issues, but there's no difference between the faulty and successful images (which makes sense given they're running in docker I think). @gleocadie is investigating further

Distro Glibc Version Kernel version
Fedora 35 2.36-9+deb12u10 5.15.0-1088-azure
Fedora 34 2.36-9+deb12u10 5.15.0-1088-azure
Fedora 40 2.36-9+deb12u10 5.15.0-1088-azure
Debian focal 2.36-9+deb12u10 5.15.0-1088-azure

@andrewlock andrewlock requested review from a team as code owners June 3, 2025 14:19
@andrewlock andrewlock added area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native) area:shared-components labels Jun 3, 2025
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 3, 2025

Copy link
Copy Markdown

Datadog Report

All test runs 2ddabba 🔗

2 Total Test Services: 0 Failed, 2 Passed

Test Services
Service Name Failed Known Flaky New Flaky Passed Skipped Total Time Test Service View
dd-trace-dotnet 0 0 0 249135 2514 18h 0m 54.63s Link
exploration_tests 0 0 0 22085 3 2m 29.18s Link

@andrewlock

andrewlock commented Jun 3, 2025

Copy link
Copy Markdown
Member Author

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (7051) - mean (68ms)  : 65, 71
     .   : milestone, 68,
    master - mean (68ms)  : 64, 72
     .   : milestone, 68,

    section CallTarget+Inlining+NGEN
    This PR (7051) - mean (1,005ms)  : 975, 1036
     .   : milestone, 1005,
    master - mean (1,004ms)  : 978, 1029
     .   : milestone, 1004,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (7051) - mean (101ms)  : 100, 103
     .   : milestone, 101,
    master - mean (101ms)  : 99, 103
     .   : milestone, 101,

    section CallTarget+Inlining+NGEN
    This PR (7051) - mean (690ms)  : 672, 707
     .   : milestone, 690,
    master - mean (692ms)  : 675, 709
     .   : milestone, 692,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (7051) - mean (90ms)  : 87, 92
     .   : milestone, 90,
    master - mean (89ms)  : 87, 90
     .   : milestone, 89,

    section CallTarget+Inlining+NGEN
    This PR (7051) - mean (660ms)  : 636, 685
     .   : milestone, 660,
    master - mean (658ms)  : 637, 680
     .   : milestone, 658,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (7051) - mean (189ms)  : 185, 193
     .   : milestone, 189,
    master - mean (188ms)  : 185, 192
     .   : milestone, 188,

    section CallTarget+Inlining+NGEN
    This PR (7051) - mean (1,110ms)  : 1074, 1145
     .   : milestone, 1110,
    master - mean (1,112ms)  : 1079, 1145
     .   : milestone, 1112,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (7051) - mean (268ms)  : 264, 273
     .   : milestone, 268,
    master - mean (268ms)  : 264, 271
     .   : milestone, 268,

    section CallTarget+Inlining+NGEN
    This PR (7051) - mean (884ms)  : 851, 917
     .   : milestone, 884,
    master - mean (881ms)  : 845, 916
     .   : milestone, 881,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (7051) - mean (261ms)  : 258, 263
     .   : milestone, 261,
    master - mean (261ms)  : 257, 266
     .   : milestone, 261,

    section CallTarget+Inlining+NGEN
    This PR (7051) - mean (872ms)  : 841, 904
     .   : milestone, 872,
    master - mean (876ms)  : 842, 910
     .   : milestone, 876,

Loading

@bouwkast

bouwkast commented Jun 3, 2025

Copy link
Copy Markdown
Collaborator

Revert "Revert 'Reapply "Revert Load the tracer/profiler after guardrails checks" (#6959)' (#6986)"

📜

@andrewlock andrewlock force-pushed the andrew/revert-revert-revert-revert branch from 88cd7d0 to d73e750 Compare June 6, 2025 11:05
@pr-commenter

pr-commenter Bot commented Jun 6, 2025

Copy link
Copy Markdown

Benchmarks

Benchmarks Report for benchmark platform 🐌

Benchmarks for #7051 compared to master:

  • 1 benchmarks are slower, with geometric mean 1.197
  • 40 benchmarks have fewer allocations
  • 6 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.ActivityBenchmark.StartStopWithChild‑net472 6.09 KB 6 KB -84 B -1.38%
Benchmarks.Trace.ActivityBenchmark.StartStopWithChild‑net6.0 5.58 KB 5.48 KB -101 B -1.81%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 11.2μs 59.7ns 322ns 0 0 0 5.58 KB
master StartStopWithChild netcoreapp3.1 14.1μs 67.5ns 286ns 0 0 0 5.75 KB
master StartStopWithChild net472 22.1μs 119ns 651ns 1.03 0.411 0.103 6.09 KB
#7051 StartStopWithChild net6.0 10.8μs 59.9ns 354ns 0 0 0 5.48 KB
#7051 StartStopWithChild netcoreapp3.1 13.8μs 67ns 292ns 0 0 0 5.73 KB
#7051 StartStopWithChild net472 21.2μs 102ns 445ns 0.864 0.216 0 6 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces‑net472 3.33 KB 3.35 KB 23 B 0.69%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 927μs 26.6ns 103ns 0 0 0 2.71 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 1.04ms 404ns 1.56μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 1.25ms 357ns 1.38μs 0 0 0 3.33 KB
#7051 WriteAndFlushEnrichedTraces net6.0 944μs 39.2ns 141ns 0 0 0 2.7 KB
#7051 WriteAndFlushEnrichedTraces netcoreapp3.1 1.03ms 101ns 362ns 0 0 0 2.7 KB
#7051 WriteAndFlushEnrichedTraces net472 1.2ms 58.3ns 226ns 0 0 0 3.35 KB
Benchmarks.Trace.Asm.AppSecBodyBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody‑net472 236.35 KB 239.64 KB 3.28 KB 1.39%
Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleMoreComplexBody‑net472 239.87 KB 243.15 KB 3.28 KB 1.37%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master AllCycleSimpleBody net6.0 331μs 1.76μs 8.42μs 0 0 0 197.06 KB
master AllCycleSimpleBody netcoreapp3.1 510μs 1.45μs 5.61μs 0 0 0 204.77 KB
master AllCycleSimpleBody net472 436μs 119ns 460ns 36.6 2.16 0 236.35 KB
master AllCycleMoreComplexBody net6.0 338μs 1.76μs 8.8μs 0 0 0 200.56 KB
master AllCycleMoreComplexBody netcoreapp3.1 495μs 987ns 3.56μs 0 0 0 208.18 KB
master AllCycleMoreComplexBody net472 446μs 106ns 412ns 36.6 2.16 0 239.87 KB
master ObjectExtractorSimpleBody net6.0 311ns 1.77ns 12.3ns 0 0 0 280 B
master ObjectExtractorSimpleBody netcoreapp3.1 409ns 1.93ns 8.2ns 0 0 0 272 B
master ObjectExtractorSimpleBody net472 303ns 0.175ns 0.676ns 0.0442 0 0 281 B
master ObjectExtractorMoreComplexBody net6.0 6.52μs 29.5ns 110ns 0 0 0 3.78 KB
master ObjectExtractorMoreComplexBody netcoreapp3.1 7.76μs 36.2ns 140ns 0 0 0 3.69 KB
master ObjectExtractorMoreComplexBody net472 6.66μs 0.89ns 3.33ns 0.599 0 0 3.8 KB
#7051 AllCycleSimpleBody net6.0 330μs 1.48μs 5.53μs 0 0 0 197.59 KB
#7051 AllCycleSimpleBody netcoreapp3.1 482μs 594ns 2.3μs 0 0 0 205.35 KB
#7051 AllCycleSimpleBody net472 442μs 107ns 413ns 36.6 2.16 0 239.64 KB
#7051 AllCycleMoreComplexBody net6.0 337μs 1.14μs 4.43μs 0 0 0 201.1 KB
#7051 AllCycleMoreComplexBody netcoreapp3.1 502μs 985ns 3.81μs 0 0 0 208.77 KB
#7051 AllCycleMoreComplexBody net472 457μs 329ns 1.28μs 37.9 2.23 0 243.15 KB
#7051 ObjectExtractorSimpleBody net6.0 310ns 1.56ns 6.99ns 0 0 0 280 B
#7051 ObjectExtractorSimpleBody netcoreapp3.1 410ns 1.99ns 8.22ns 0 0 0 272 B
#7051 ObjectExtractorSimpleBody net472 307ns 0.0293ns 0.11ns 0.0442 0 0 281 B
#7051 ObjectExtractorMoreComplexBody net6.0 6.4μs 30.6ns 119ns 0 0 0 3.78 KB
#7051 ObjectExtractorMoreComplexBody netcoreapp3.1 7.75μs 37.4ns 154ns 0 0 0 3.69 KB
#7051 ObjectExtractorMoreComplexBody net472 6.83μs 2.68ns 9.65ns 0.58 0 0 3.8 KB
Benchmarks.Trace.Asm.AppSecEncoderBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Asm.AppSecEncoderBenchmark.EncodeLegacyArgs‑net6.0 2.16 KB 2.15 KB -11 B -0.51%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EncodeArgs net6.0 73.3μs 263ns 984ns 0 0 0 32.41 KB
master EncodeArgs netcoreapp3.1 95.7μs 36.2ns 130ns 0 0 0 32.4 KB
master EncodeArgs net472 107μs 17.6ns 65.8ns 4.82 0 0 32.51 KB
master EncodeLegacyArgs net6.0 143μs 122ns 472ns 0 0 0 2.16 KB
master EncodeLegacyArgs netcoreapp3.1 197μs 42.5ns 147ns 0 0 0 2.14 KB
master EncodeLegacyArgs net472 261μs 66.8ns 250ns 0 0 0 2.16 KB
#7051 EncodeArgs net6.0 72.7μs 302ns 1.13μs 0 0 0 32.4 KB
#7051 EncodeArgs netcoreapp3.1 94.2μs 159ns 596ns 0 0 0 32.4 KB
#7051 EncodeArgs net472 105μs 59.5ns 230ns 4.75 0 0 32.51 KB
#7051 EncodeLegacyArgs net6.0 146μs 138ns 534ns 0 0 0 2.15 KB
#7051 EncodeLegacyArgs netcoreapp3.1 195μs 175ns 676ns 0 0 0 2.14 KB
#7051 EncodeLegacyArgs net472 263μs 28.2ns 102ns 0 0 0 2.16 KB
Benchmarks.Trace.Asm.AppSecWafBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunWafRealisticBenchmark net6.0 271μs 133ns 499ns 0 0 0 4.55 KB
master RunWafRealisticBenchmark netcoreapp3.1 294μs 264ns 989ns 0 0 0 4.48 KB
master RunWafRealisticBenchmark net472 307μs 38.1ns 147ns 0 0 0 4.66 KB
master RunWafRealisticBenchmarkWithAttack net6.0 181μs 80.1ns 300ns 0 0 0 2.24 KB
master RunWafRealisticBenchmarkWithAttack netcoreapp3.1 198μs 104ns 404ns 0 0 0 2.22 KB
master RunWafRealisticBenchmarkWithAttack net472 207μs 45.3ns 169ns 0 0 0 2.28 KB
#7051 RunWafRealisticBenchmark net6.0 270μs 42.8ns 160ns 0 0 0 4.55 KB
#7051 RunWafRealisticBenchmark netcoreapp3.1 294μs 455ns 1.64μs 0 0 0 4.48 KB
#7051 RunWafRealisticBenchmark net472 308μs 55.4ns 215ns 0 0 0 4.66 KB
#7051 RunWafRealisticBenchmarkWithAttack net6.0 183μs 24.7ns 92.6ns 0 0 0 2.24 KB
#7051 RunWafRealisticBenchmarkWithAttack netcoreapp3.1 195μs 43.7ns 163ns 0 0 0 2.22 KB
#7051 RunWafRealisticBenchmarkWithAttack net472 208μs 36.5ns 132ns 0 0 0 2.29 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 61.1μs 29ns 104ns 0 0 0 14.53 KB
master SendRequest netcoreapp3.1 69.9μs 118ns 440ns 0 0 0 17.42 KB
master SendRequest net472 0.0171ns 0.00147ns 0.00551ns 0 0 0 0 b
#7051 SendRequest net6.0 61.8μs 42.3ns 146ns 0 0 0 14.52 KB
#7051 SendRequest netcoreapp3.1 70.8μs 164ns 615ns 0 0 0 17.42 KB
#7051 SendRequest net472 0.0175ns 0.00186ns 0.00719ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑net472 55.75 KB 56.25 KB 499 B 0.90%
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1 41.89 KB 42.25 KB 359 B 0.86%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 645μs 512ns 1.98μs 0 0 0 41.73 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 631μs 1.67μs 6.67μs 0 0 0 41.89 KB
master WriteAndFlushEnrichedTraces net472 923μs 1.93μs 7.2μs 4.46 0 0 55.75 KB
#7051 WriteAndFlushEnrichedTraces net6.0 680μs 1.19μs 4.44μs 0 0 0 41.79 KB
#7051 WriteAndFlushEnrichedTraces netcoreapp3.1 668μs 1.89μs 7.31μs 0 0 0 42.25 KB
#7051 WriteAndFlushEnrichedTraces net472 877μs 3.53μs 13.7μs 8.33 0 0 56.25 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑net6.0 1.03 KB 1.02 KB -8 B -0.78%
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑netcoreapp3.1 1.02 KB 1.02 KB -8 B -0.78%
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑net472 995 B 987 B -8 B -0.80%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.99μs 10.2ns 46.8ns 0 0 0 1.03 KB
master ExecuteNonQuery netcoreapp3.1 2.53μs 3.43ns 13.3ns 0 0 0 1.02 KB
master ExecuteNonQuery net472 2.7μs 2.41ns 9.34ns 0.147 0.0134 0 995 B
#7051 ExecuteNonQuery net6.0 2.03μs 2.21ns 8.26ns 0 0 0 1.02 KB
#7051 ExecuteNonQuery netcoreapp3.1 2.52μs 11.7ns 46.9ns 0 0 0 1.02 KB
#7051 ExecuteNonQuery net472 2.83μs 4.85ns 18.8ns 0.156 0.0142 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net472 1.11 KB 1.1 KB -8 B -0.72%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑netcoreapp3.1 1.09 KB 1.08 KB -8 B -0.74%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑net472 1.05 KB 1.04 KB -8 B -0.76%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑net6.0 1.04 KB 1.03 KB -8 B -0.77%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑netcoreapp3.1 1.04 KB 1.03 KB -8 B -0.77%
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net6.0 1.02 KB 1.01 KB -8 B -0.79%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.83μs 5.85ns 22.7ns 0 0 0 1.04 KB
master CallElasticsearch netcoreapp3.1 2.3μs 11.8ns 56.6ns 0 0 0 1.04 KB
master CallElasticsearch net472 3.55μs 3.05ns 11.8ns 0.159 0 0 1.05 KB
master CallElasticsearchAsync net6.0 1.81μs 3.89ns 14.5ns 0 0 0 1.02 KB
master CallElasticsearchAsync netcoreapp3.1 2.35μs 7.26ns 28.1ns 0 0 0 1.09 KB
master CallElasticsearchAsync net472 3.84μs 3.19ns 12.4ns 0.169 0 0 1.11 KB
#7051 CallElasticsearch net6.0 1.83μs 7.04ns 27.2ns 0 0 0 1.03 KB
#7051 CallElasticsearch netcoreapp3.1 2.34μs 11.5ns 45.8ns 0 0 0 1.03 KB
#7051 CallElasticsearch net472 3.56μs 4.14ns 16ns 0.161 0 0 1.04 KB
#7051 CallElasticsearchAsync net6.0 1.84μs 2.07ns 8.01ns 0 0 0 1.01 KB
#7051 CallElasticsearchAsync netcoreapp3.1 2.48μs 11.1ns 42.9ns 0 0 0 1.08 KB
#7051 CallElasticsearchAsync net472 3.82μs 2.09ns 8.08ns 0.172 0 0 1.1 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net6.0 960 B 952 B -8 B -0.83%
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑netcoreapp3.1 960 B 952 B -8 B -0.83%
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net472 923 B 915 B -8 B -0.87%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.78μs 9.27ns 43.5ns 0 0 0 960 B
master ExecuteAsync netcoreapp3.1 2.29μs 7.45ns 25.8ns 0 0 0 960 B
master ExecuteAsync net472 2.58μs 1.78ns 6.88ns 0.143 0 0 923 B
#7051 ExecuteAsync net6.0 1.95μs 9.31ns 36ns 0 0 0 952 B
#7051 ExecuteAsync netcoreapp3.1 2.33μs 7.01ns 27.2ns 0 0 0 952 B
#7051 ExecuteAsync net472 2.56μs 0.994ns 3.85ns 0.141 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 7.1μs 10.4ns 39ns 0 0 0 2.37 KB
master SendAsync netcoreapp3.1 8.69μs 14.1ns 54.6ns 0 0 0 2.9 KB
master SendAsync net472 12.5μs 10.3ns 38.5ns 0.498 0 0 3.19 KB
#7051 SendAsync net6.0 6.93μs 3.8ns 14.7ns 0 0 0 2.36 KB
#7051 SendAsync netcoreapp3.1 8.78μs 11.7ns 45.3ns 0 0 0 2.9 KB
#7051 SendAsync net472 12.6μs 11ns 42.5ns 0.504 0 0 3.18 KB
Benchmarks.Trace.Iast.StringAspectsBenchmark - Slower ⚠️ More allocations ⚠️

Slower ⚠️ in #7051

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark‑netcoreapp3.1 1.197 413,700.00 495,050.00

More allocations ⚠️ in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark‑net6.0 259.96 KB 275.38 KB 15.42 KB 5.93%

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark‑netcoreapp3.1 274.93 KB 257.55 KB -17.38 KB -6.32%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StringConcatBenchmark net6.0 44.6μs 212ns 1.6μs 0 0 0 43.83 KB
master StringConcatBenchmark netcoreapp3.1 47.2μs 224ns 838ns 0 0 0 42.64 KB
master StringConcatBenchmark net472 56.8μs 259ns 968ns 0 0 0 57.34 KB
master StringConcatAspectBenchmark net6.0 458μs 1.08μs 3.89μs 0 0 0 259.96 KB
master StringConcatAspectBenchmark netcoreapp3.1 447μs 6.44μs 63.7μs 0 0 0 274.93 KB
master StringConcatAspectBenchmark net472 410μs 2.07μs 9.27μs 0 0 0 286.72 KB
#7051 StringConcatBenchmark net6.0 48.4μs 267ns 1.53μs 0 0 0 44.05 KB
#7051 StringConcatBenchmark netcoreapp3.1 49μs 244ns 1.66μs 0 0 0 42.8 KB
#7051 StringConcatBenchmark net472 57.1μs 219ns 820ns 0 0 0 57.34 KB
#7051 StringConcatAspectBenchmark net6.0 478μs 1.01μs 3.51μs 0 0 0 275.38 KB
#7051 StringConcatAspectBenchmark netcoreapp3.1 496μs 1.94μs 7.25μs 0 0 0 257.55 KB
#7051 StringConcatAspectBenchmark net472 438μs 2.1μs 13.6μs 0 0 0 286.72 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑net6.0 1.76 KB 1.7 KB -56 B -3.18%
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑netcoreapp3.1 1.76 KB 1.7 KB -56 B -3.18%
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑net472 1.69 KB 1.64 KB -56 B -3.31%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.63μs 2.6ns 10.1ns 0 0 0 1.76 KB
master EnrichedLog netcoreapp3.1 3.44μs 4.38ns 17ns 0 0 0 1.76 KB
master EnrichedLog net472 4.05μs 4.83ns 18.7ns 0.265 0 0 1.69 KB
#7051 EnrichedLog net6.0 2.58μs 8.84ns 34.2ns 0 0 0 1.7 KB
#7051 EnrichedLog netcoreapp3.1 3.52μs 15.9ns 61.5ns 0 0 0 1.7 KB
#7051 EnrichedLog net472 3.9μs 5.04ns 19.5ns 0.253 0 0 1.64 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.Log4netBenchmark.EnrichedLog‑net6.0 4.37 KB 4.31 KB -56 B -1.28%
Benchmarks.Trace.Log4netBenchmark.EnrichedLog‑netcoreapp3.1 4.37 KB 4.31 KB -56 B -1.28%
Benchmarks.Trace.Log4netBenchmark.EnrichedLog‑net472 4.57 KB 4.51 KB -60 B -1.31%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 123μs 145ns 563ns 0 0 0 4.37 KB
master EnrichedLog netcoreapp3.1 126μs 330ns 1.24μs 0 0 0 4.37 KB
master EnrichedLog net472 167μs 161ns 603ns 0 0 0 4.57 KB
#7051 EnrichedLog net6.0 123μs 31ns 112ns 0 0 0 4.31 KB
#7051 EnrichedLog netcoreapp3.1 128μs 58.1ns 201ns 0 0 0 4.31 KB
#7051 EnrichedLog net472 166μs 42.1ns 158ns 0 0 0 4.51 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.NLogBenchmark.EnrichedLog‑net6.0 2.32 KB 2.26 KB -56 B -2.41%
Benchmarks.Trace.NLogBenchmark.EnrichedLog‑netcoreapp3.1 2.32 KB 2.26 KB -56 B -2.41%
Benchmarks.Trace.NLogBenchmark.EnrichedLog‑net472 2.14 KB 2.08 KB -56 B -2.62%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 4.94μs 19.3ns 74.6ns 0 0 0 2.32 KB
master EnrichedLog netcoreapp3.1 6.77μs 22.2ns 79.9ns 0 0 0 2.32 KB
master EnrichedLog net472 7.45μs 8.04ns 31.1ns 0.335 0 0 2.14 KB
#7051 EnrichedLog net6.0 4.98μs 7.12ns 27.6ns 0 0 0 2.26 KB
#7051 EnrichedLog netcoreapp3.1 6.6μs 24.5ns 94.9ns 0 0 0 2.26 KB
#7051 EnrichedLog net472 7.54μs 7.78ns 30.1ns 0.3 0 0 2.08 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.RedisBenchmark.SendReceive‑net472 1.21 KB 1.2 KB -8 B -0.66%
Benchmarks.Trace.RedisBenchmark.SendReceive‑net6.0 1.21 KB 1.2 KB -8 B -0.66%
Benchmarks.Trace.RedisBenchmark.SendReceive‑netcoreapp3.1 1.21 KB 1.2 KB -8 B -0.66%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 2.04μs 10.5ns 49.4ns 0 0 0 1.21 KB
master SendReceive netcoreapp3.1 2.53μs 11.8ns 47.4ns 0 0 0 1.21 KB
master SendReceive net472 3.28μs 2.71ns 10.5ns 0.178 0 0 1.21 KB
#7051 SendReceive net6.0 1.99μs 7.04ns 27.3ns 0 0 0 1.2 KB
#7051 SendReceive netcoreapp3.1 2.61μs 6.69ns 25.9ns 0 0 0 1.2 KB
#7051 SendReceive net472 3.24μs 2.04ns 7.91ns 0.177 0 0 1.2 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.SerilogBenchmark.EnrichedLog‑net472 2.08 KB 2.03 KB -56 B -2.69%
Benchmarks.Trace.SerilogBenchmark.EnrichedLog‑netcoreapp3.1 1.69 KB 1.63 KB -56 B -3.32%
Benchmarks.Trace.SerilogBenchmark.EnrichedLog‑net6.0 1.64 KB 1.58 KB -56 B -3.41%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 4.18μs 0.92ns 3.44ns 0 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 5.64μs 23ns 89.3ns 0 0 0 1.69 KB
master EnrichedLog net472 6.67μs 7.28ns 27.3ns 0.298 0 0 2.08 KB
#7051 EnrichedLog net6.0 4.09μs 4.73ns 17.7ns 0 0 0 1.58 KB
#7051 EnrichedLog netcoreapp3.1 5.64μs 15.2ns 59ns 0 0 0 1.63 KB
#7051 EnrichedLog net472 6.67μs 7.16ns 27.7ns 0.3 0 0 2.03 KB
Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net6.0 704 B 696 B -8 B -1.14%
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑netcoreapp3.1 704 B 696 B -8 B -1.14%
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net472 666 B 658 B -8 B -1.20%
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net472 586 B 578 B -8 B -1.37%
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 584 B 576 B -8 B -1.37%
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑netcoreapp3.1 584 B 576 B -8 B -1.37%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 751ns 3.97ns 22.4ns 0 0 0 584 B
master StartFinishSpan netcoreapp3.1 953ns 4.42ns 17.7ns 0 0 0 584 B
master StartFinishSpan net472 914ns 0.809ns 3.13ns 0.0912 0 0 586 B
master StartFinishScope net6.0 918ns 0.479ns 1.73ns 0 0 0 704 B
master StartFinishScope netcoreapp3.1 1.15μs 6.24ns 34.2ns 0 0 0 704 B
master StartFinishScope net472 1.09μs 0.174ns 0.652ns 0.104 0 0 666 B
#7051 StartFinishSpan net6.0 745ns 3.96ns 20.2ns 0 0 0 576 B
#7051 StartFinishSpan netcoreapp3.1 943ns 4.75ns 21.2ns 0 0 0 576 B
#7051 StartFinishSpan net472 926ns 0.134ns 0.485ns 0.089 0 0 578 B
#7051 StartFinishScope net6.0 900ns 4.15ns 16.1ns 0 0 0 696 B
#7051 StartFinishScope netcoreapp3.1 1.19μs 5.36ns 20.8ns 0 0 0 696 B
#7051 StartFinishScope net472 1.11μs 0.572ns 2.22ns 0.0991 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #7051

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net6.0 704 B 696 B -8 B -1.14%
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑netcoreapp3.1 704 B 696 B -8 B -1.14%
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net472 666 B 658 B -8 B -1.20%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 1.02μs 5.36ns 26.8ns 0 0 0 704 B
master RunOnMethodBegin netcoreapp3.1 1.38μs 2.28ns 8.82ns 0 0 0 704 B
master RunOnMethodBegin net472 1.36μs 0.141ns 0.544ns 0.102 0 0 666 B
#7051 RunOnMethodBegin net6.0 1.04μs 5.61ns 32.3ns 0 0 0 696 B
#7051 RunOnMethodBegin netcoreapp3.1 1.43μs 1.06ns 4.09ns 0 0 0 696 B
#7051 RunOnMethodBegin net472 1.38μs 0.857ns 3.32ns 0.103 0 0 658 B

@andrewlock andrewlock force-pushed the andrew/revert-revert-revert-revert branch 4 times, most recently from 5dca5b1 to 91973ad Compare June 11, 2025 08:28

@daniel-romano-DD daniel-romano-DD left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙃

@andrewlock andrewlock force-pushed the andrew/revert-revert-revert-revert branch 2 times, most recently from 6f8c028 to 032aed3 Compare June 12, 2025 08:08
@andrewlock andrewlock force-pushed the andrew/revert-revert-revert-revert branch from 032aed3 to 2ddabba Compare June 20, 2025 12:08
@andrewlock

Copy link
Copy Markdown
Member Author

Superseded by

@andrewlock andrewlock closed this Jun 23, 2025
andrewlock added a commit that referenced this pull request Jun 24, 2025
## Summary of changes

- Revert "Revert 'Reapply "Revert Load the tracer/profiler after
guardrails checks" (#6959)' (#6986)" #7051
- Avoid calling `dlclose` on glibc 2.34-2.36

## Reason for change

We want to make sure we only load the tracer/profiler libraries _after_
we have run guardrails checks. However, repeated attempts to do this
were causing crashes in our smoke tests on Fedora 35, on arm64 only.
After an (extensive) investigation, we finally tracked this down to a
bug in glibc itself. The full explanation is given below, but the upshot
is that calling `dlclose` on this buggy glibc version can cause crashes
at some later point.

## Implementation details

- Includes the "original" implementation that moves the tracer/profiler
after guardrails checks
- Before calling `dlclose`, check to see if we're on a buggy version of
glibc
- This is complicated by the fact we run the _same_ binary on glibc and
musl, so we have to make the glibc call _dynamically_, instead of
linking directly against the `gnu_get_libc_version` method.
- We avoid trying to load libc if we detect that we're on Alpine by
checking for `/etc/alpine-release`. This is slightly annoying, but
required in the native loader code.
- If we _do_ manage to call the glibc version, we check against the
blocklist. If the version is on it, we avoid ever calling dlclose.

## Test coverage

We have run repeated tests against the previously crashing tests, and
this has resolved the issue. Nevertheless, our recommendation for
customers should definitely be to upgrade to a stable version of glibc
wherever possible

## Other details

Overview of the bug: 
In certain versions of glibc, there is a TLS-reuse bug that can cause
crashes when unloading shared libraries. The bug was introduced in 2.34,
fixed in 2.36 on x86-64, and fixed in 2.37 on aarch64.
See [the bug
here](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3921c5b40f293c57cb326f58713c924b0662ef59)
or [the explanation of the bug on
Fedora](https://bugzilla.redhat.com/show_bug.cgi?id=2251557) (which is
where we saw the crashing issue).

glibc 2.34 shipped with a regression: after a `dlclose()` of a library
that carried dynamic-TLS, the loader could reuse the same "module-ID"
for a different library _without_ first clearing the associated DTV
(Dynamic Thread Vector) entry. The next time any code accessed that TLS
slot it could read or write an unmapped address, which cases a
`SIGSEGV`.

This manifested as a crash in the WAF when we called
`ddwaf_context_info` on arm64. It explicitly happens on arm64 when we
unload the continuous profiler shortly after loading it (which is normal
because it's not supported).

It manifests in this scenario because `ddwaf_context_init` starts like
this:

```assembly
+128  bl   __tls_get_addr     ; ask glibc for the TLS slot for libddwaf
+136  mrs  x11, TPIDR_EL0     ; TLS base for this thread
+140  ldrb w9, [x11, x0]      ; <–– boom if x0 points to a stale DTV entry
```

When we unload the continuous profiler and call `dlcose`, it causes the
glibc loader to hand out a recycled module-ID to `libddwaf`. When
`ddwaf_context_init` tries to access TLS in the `ldrb` instruction, `x11
+ x0` is outside every mapped version, and so it crashes.

Note that although calling `dlclose` with the continuous profiler _may_
trigger the issue (the actual crash is flaky depending on load/unload
timing and address layout), unloading _any_ library that is is built
with `__thread`/`thread_local` data could trigger the crash. To minimize
the risk of hitting this issue, we avoid calling `dlclose` entirely on
the flaky glibc versions

For more details [see
doc](https://docs.google.com/document/d/1aptwmprnd83VTZMKxrrTqmBF6eOayjCL36kj9LujCE8/edit?tab=t.0#heading=h.nytiofltvdb5)

Affected distros: 

| Distribution      | Version / Release             | glibc Version |
| ----------------- | ----------------------------- | ------------- |
| Fedora            | 35                            | 2.34          |
| Fedora            | 37                            | 2.36          |
| Ubuntu            | 21.10 ("Impish Indri")        | 2.34          |
| Ubuntu            | 22.04 LTS ("Jammy Jellyfish") | 2.35          |
| Debian            | 12 ("Bookworm")               | 2.36          |
| RHEL              | 9                             | 2.34          |
| CentOS            | 9                             | 2.34          |
| Amazon Linux 2023 | default                       | 2.34          |
lucaspimentel pushed a commit that referenced this pull request Jul 1, 2025
## Summary of changes

- Revert "Revert 'Reapply "Revert Load the tracer/profiler after
guardrails checks" (#6959)' (#6986)" #7051
- Avoid calling `dlclose` on glibc 2.34-2.36

## Reason for change

We want to make sure we only load the tracer/profiler libraries _after_
we have run guardrails checks. However, repeated attempts to do this
were causing crashes in our smoke tests on Fedora 35, on arm64 only.
After an (extensive) investigation, we finally tracked this down to a
bug in glibc itself. The full explanation is given below, but the upshot
is that calling `dlclose` on this buggy glibc version can cause crashes
at some later point.

## Implementation details

- Includes the "original" implementation that moves the tracer/profiler
after guardrails checks
- Before calling `dlclose`, check to see if we're on a buggy version of
glibc
- This is complicated by the fact we run the _same_ binary on glibc and
musl, so we have to make the glibc call _dynamically_, instead of
linking directly against the `gnu_get_libc_version` method.
- We avoid trying to load libc if we detect that we're on Alpine by
checking for `/etc/alpine-release`. This is slightly annoying, but
required in the native loader code.
- If we _do_ manage to call the glibc version, we check against the
blocklist. If the version is on it, we avoid ever calling dlclose.

## Test coverage

We have run repeated tests against the previously crashing tests, and
this has resolved the issue. Nevertheless, our recommendation for
customers should definitely be to upgrade to a stable version of glibc
wherever possible

## Other details

Overview of the bug: 
In certain versions of glibc, there is a TLS-reuse bug that can cause
crashes when unloading shared libraries. The bug was introduced in 2.34,
fixed in 2.36 on x86-64, and fixed in 2.37 on aarch64.
See [the bug
here](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3921c5b40f293c57cb326f58713c924b0662ef59)
or [the explanation of the bug on
Fedora](https://bugzilla.redhat.com/show_bug.cgi?id=2251557) (which is
where we saw the crashing issue).

glibc 2.34 shipped with a regression: after a `dlclose()` of a library
that carried dynamic-TLS, the loader could reuse the same "module-ID"
for a different library _without_ first clearing the associated DTV
(Dynamic Thread Vector) entry. The next time any code accessed that TLS
slot it could read or write an unmapped address, which cases a
`SIGSEGV`.

This manifested as a crash in the WAF when we called
`ddwaf_context_info` on arm64. It explicitly happens on arm64 when we
unload the continuous profiler shortly after loading it (which is normal
because it's not supported).

It manifests in this scenario because `ddwaf_context_init` starts like
this:

```assembly
+128  bl   __tls_get_addr     ; ask glibc for the TLS slot for libddwaf
+136  mrs  x11, TPIDR_EL0     ; TLS base for this thread
+140  ldrb w9, [x11, x0]      ; <–– boom if x0 points to a stale DTV entry
```

When we unload the continuous profiler and call `dlcose`, it causes the
glibc loader to hand out a recycled module-ID to `libddwaf`. When
`ddwaf_context_init` tries to access TLS in the `ldrb` instruction, `x11
+ x0` is outside every mapped version, and so it crashes.

Note that although calling `dlclose` with the continuous profiler _may_
trigger the issue (the actual crash is flaky depending on load/unload
timing and address layout), unloading _any_ library that is is built
with `__thread`/`thread_local` data could trigger the crash. To minimize
the risk of hitting this issue, we avoid calling `dlclose` entirely on
the flaky glibc versions

For more details [see
doc](https://docs.google.com/document/d/1aptwmprnd83VTZMKxrrTqmBF6eOayjCL36kj9LujCE8/edit?tab=t.0#heading=h.nytiofltvdb5)

Affected distros: 

| Distribution      | Version / Release             | glibc Version |
| ----------------- | ----------------------------- | ------------- |
| Fedora            | 35                            | 2.34          |
| Fedora            | 37                            | 2.36          |
| Ubuntu            | 21.10 ("Impish Indri")        | 2.34          |
| Ubuntu            | 22.04 LTS ("Jammy Jellyfish") | 2.35          |
| Debian            | 12 ("Bookworm")               | 2.36          |
| RHEL              | 9                             | 2.34          |
| CentOS            | 9                             | 2.34          |
| Amazon Linux 2023 | default                       | 2.34          |
chojomok pushed a commit that referenced this pull request Jul 15, 2025
## Summary of changes

- Revert "Revert 'Reapply "Revert Load the tracer/profiler after
guardrails checks" (#6959)' (#6986)" #7051
- Avoid calling `dlclose` on glibc 2.34-2.36

## Reason for change

We want to make sure we only load the tracer/profiler libraries _after_
we have run guardrails checks. However, repeated attempts to do this
were causing crashes in our smoke tests on Fedora 35, on arm64 only.
After an (extensive) investigation, we finally tracked this down to a
bug in glibc itself. The full explanation is given below, but the upshot
is that calling `dlclose` on this buggy glibc version can cause crashes
at some later point.

## Implementation details

- Includes the "original" implementation that moves the tracer/profiler
after guardrails checks
- Before calling `dlclose`, check to see if we're on a buggy version of
glibc
- This is complicated by the fact we run the _same_ binary on glibc and
musl, so we have to make the glibc call _dynamically_, instead of
linking directly against the `gnu_get_libc_version` method.
- We avoid trying to load libc if we detect that we're on Alpine by
checking for `/etc/alpine-release`. This is slightly annoying, but
required in the native loader code.
- If we _do_ manage to call the glibc version, we check against the
blocklist. If the version is on it, we avoid ever calling dlclose.

## Test coverage

We have run repeated tests against the previously crashing tests, and
this has resolved the issue. Nevertheless, our recommendation for
customers should definitely be to upgrade to a stable version of glibc
wherever possible

## Other details

Overview of the bug: 
In certain versions of glibc, there is a TLS-reuse bug that can cause
crashes when unloading shared libraries. The bug was introduced in 2.34,
fixed in 2.36 on x86-64, and fixed in 2.37 on aarch64.
See [the bug
here](https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3921c5b40f293c57cb326f58713c924b0662ef59)
or [the explanation of the bug on
Fedora](https://bugzilla.redhat.com/show_bug.cgi?id=2251557) (which is
where we saw the crashing issue).

glibc 2.34 shipped with a regression: after a `dlclose()` of a library
that carried dynamic-TLS, the loader could reuse the same "module-ID"
for a different library _without_ first clearing the associated DTV
(Dynamic Thread Vector) entry. The next time any code accessed that TLS
slot it could read or write an unmapped address, which cases a
`SIGSEGV`.

This manifested as a crash in the WAF when we called
`ddwaf_context_info` on arm64. It explicitly happens on arm64 when we
unload the continuous profiler shortly after loading it (which is normal
because it's not supported).

It manifests in this scenario because `ddwaf_context_init` starts like
this:

```assembly
+128  bl   __tls_get_addr     ; ask glibc for the TLS slot for libddwaf
+136  mrs  x11, TPIDR_EL0     ; TLS base for this thread
+140  ldrb w9, [x11, x0]      ; <–– boom if x0 points to a stale DTV entry
```

When we unload the continuous profiler and call `dlcose`, it causes the
glibc loader to hand out a recycled module-ID to `libddwaf`. When
`ddwaf_context_init` tries to access TLS in the `ldrb` instruction, `x11
+ x0` is outside every mapped version, and so it crashes.

Note that although calling `dlclose` with the continuous profiler _may_
trigger the issue (the actual crash is flaky depending on load/unload
timing and address layout), unloading _any_ library that is is built
with `__thread`/`thread_local` data could trigger the crash. To minimize
the risk of hitting this issue, we avoid calling `dlclose` entirely on
the flaky glibc versions

For more details [see
doc](https://docs.google.com/document/d/1aptwmprnd83VTZMKxrrTqmBF6eOayjCL36kj9LujCE8/edit?tab=t.0#heading=h.nytiofltvdb5)

Affected distros: 

| Distribution      | Version / Release             | glibc Version |
| ----------------- | ----------------------------- | ------------- |
| Fedora            | 35                            | 2.34          |
| Fedora            | 37                            | 2.36          |
| Ubuntu            | 21.10 ("Impish Indri")        | 2.34          |
| Ubuntu            | 22.04 LTS ("Jammy Jellyfish") | 2.35          |
| Debian            | 12 ("Bookworm")               | 2.36          |
| RHEL              | 9                             | 2.34          |
| CentOS            | 9                             | 2.34          |
| Amazon Linux 2023 | default                       | 2.34          |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:native-library Automatic instrumentation native C++ code (Datadog.Trace.ClrProfiler.Native) area:shared-components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants