Skip to content

Allow skipping span generation in ProcessStart integration#5280

Merged
andrewlock merged 6 commits intomasterfrom
andrew/dont-initialize-tracer-in-process-start
Mar 14, 2024
Merged

Allow skipping span generation in ProcessStart integration#5280
andrewlock merged 6 commits intomasterfrom
andrew/dont-initialize-tracer-in-process-start

Conversation

@andrewlock
Copy link
Member

@andrewlock andrewlock commented Mar 6, 2024

Summary of changes

Adds a workaround to avoid recursive initialisation of the tracer

Reason for change

When running on a Linux host, when we can't get the ContainerId using ContainerMetadata, we call GetCgroupInodeInternal which tries to get the Inode number by calling stat --printf=%i {path} using Process.Start().

Unfortunately, we try to do this during tracer initialization, before it's complete. The Process.Start instrumentation kicks in, and we hit a recursive code path that causes the ContainerMetadata to throw.

Implementation details

When we're starting a process, add a flag to the Environment collection, check for the flag in the integration, and bail out if we find it This runs into issues because it's lazy initialized, and if it's been read, then it throws an Exception if UseShellExecute==true

Instead, taken a different approach using a [ThreadStatic] which we set just before calling Process.Start() and reset in the integration.

Test coverage

Couldn't repro the actual issue at all, but added tests for the workaround behaviour

  • Unit tests that we set and reset the threadstatic correctly
  • Integration tests that we don't trace a span for which it's set

Other details

@andrewlock andrewlock added type:bug area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) area:automatic-instrumentation Automatic instrumentation managed C# code (Datadog.Trace.ClrProfiler.Managed) identified-by:telemetry labels Mar 6, 2024
@andrewlock andrewlock requested a review from a team as a code owner March 6, 2024 17:33
@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Mar 6, 2024

Datadog Report

Branch report: andrew/dont-initialize-tracer-in-process-start
Commit report: 7fb4cff
Test service: dd-trace-dotnet

✅ 0 Failed, 330176 Passed, 1564 Skipped, 35m 46.78s Wall Time

Copy link
Member

@lucaspimentel lucaspimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM... if it works, and you add tests 😉

@andrewlock
Copy link
Member Author

andrewlock commented Mar 6, 2024

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5280) - mean (74ms)  : 64, 83
     .   : milestone, 74,
    master - mean (72ms)  : 65, 78
     .   : milestone, 72,

    section CallTarget+Inlining+NGEN
    This PR (5280) - mean (1,013ms)  : 995, 1030
     .   : milestone, 1013,
    master - mean (1,014ms)  : 995, 1032
     .   : milestone, 1014,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5280) - mean (109ms)  : 106, 112
     .   : milestone, 109,
    master - mean (110ms)  : 107, 113
     .   : milestone, 110,

    section CallTarget+Inlining+NGEN
    This PR (5280) - mean (736ms)  : 724, 749
     .   : milestone, 736,
    master - mean (740ms)  : 723, 756
     .   : milestone, 740,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5280) - mean (94ms)  : 90, 99
     .   : milestone, 94,
    master - mean (93ms)  : 89, 97
     .   : milestone, 93,

    section CallTarget+Inlining+NGEN
    This PR (5280) - mean (697ms)  : 679, 715
     .   : milestone, 697,
    master - mean (694ms)  : 678, 710
     .   : milestone, 694,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5280) - mean (188ms)  : 184, 191
     .   : milestone, 188,
    master - mean (187ms)  : 184, 189
     .   : milestone, 187,

    section CallTarget+Inlining+NGEN
    This PR (5280) - mean (1,088ms)  : 1062, 1113
     .   : milestone, 1088,
    master - mean (1,084ms)  : 1059, 1110
     .   : milestone, 1084,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5280) - mean (270ms)  : 266, 275
     .   : milestone, 270,
    master - mean (269ms)  : 264, 275
     .   : milestone, 269,

    section CallTarget+Inlining+NGEN
    This PR (5280) - mean (886ms)  : 867, 906
     .   : milestone, 886,
    master - mean (883ms)  : 859, 906
     .   : milestone, 883,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5280) - mean (260ms)  : 256, 263
     .   : milestone, 260,
    master - mean (260ms)  : 255, 264
     .   : milestone, 260,

    section CallTarget+Inlining+NGEN
    This PR (5280) - mean (869ms)  : 846, 893
     .   : milestone, 869,
    master - mean (867ms)  : 839, 895
     .   : milestone, 867,

Loading

@lucaspimentel lucaspimentel requested a review from a team March 6, 2024 20:13
Copy link
Contributor

@zacharycmontoya zacharycmontoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. What Lucas said 😆

@andrewlock andrewlock force-pushed the andrew/dont-initialize-tracer-in-process-start branch from ee09414 to 89bbc1b Compare March 7, 2024 17:31
@andrewlock andrewlock force-pushed the andrew/dont-initialize-tracer-in-process-start branch from 89bbc1b to 5d8e138 Compare March 13, 2024 12:04
@andrewlock
Copy link
Member Author

andrewlock commented Mar 13, 2024

Benchmarks Report for tracer 🐌

Benchmarks for #5280 compared to master:

  • 1 benchmarks are faster, with geometric mean 1.119
  • 2 benchmarks are slower, with geometric mean 1.162
  • All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 9.34μs 49ns 259ns 0.0327 0.00934 0 7.49 KB
master StartStopWithChild netcoreapp3.1 11.5μs 61.4ns 353ns 0.0279 0.0112 0 7.58 KB
master StartStopWithChild net472 21.7μs 387ns 3.87μs 1.38 0.367 0.119 8.14 KB
#5280 StartStopWithChild net6.0 8.63μs 47.9ns 291ns 0.0336 0.0126 0 7.5 KB
#5280 StartStopWithChild netcoreapp3.1 10.8μs 59.2ns 365ns 0.0315 0.0158 0 7.58 KB
#5280 StartStopWithChild net472 17.1μs 65.9ns 246ns 1.38 0.355 0.112 8.12 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 454μs 254ns 950ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 598μs 122ns 458ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 811μs 342ns 1.32μs 0.406 0 0 3.3 KB
#5280 WriteAndFlushEnrichedTraces net6.0 442μs 280ns 1.08μs 0 0 0 2.7 KB
#5280 WriteAndFlushEnrichedTraces netcoreapp3.1 616μs 231ns 866ns 0 0 0 2.7 KB
#5280 WriteAndFlushEnrichedTraces net472 806μs 217ns 841ns 0.403 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 170μs 223ns 864ns 0.17 0 0 18.3 KB
master SendRequest netcoreapp3.1 192μs 233ns 873ns 0.191 0 0 20.46 KB
master SendRequest net472 0.00111ns 0.000348ns 0.00126ns 0 0 0 0 b
#5280 SendRequest net6.0 169μs 223ns 864ns 0.255 0 0 18.3 KB
#5280 SendRequest netcoreapp3.1 190μs 345ns 1.33μs 0.19 0 0 20.46 KB
#5280 SendRequest net472 0.00128ns 0.000388ns 0.0015ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 552μs 828ns 3.1μs 0.563 0 0 41.67 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 675μs 682ns 2.55μs 0.331 0 0 41.86 KB
master WriteAndFlushEnrichedTraces net472 890μs 3.34μs 12.9μs 8.08 2.55 0.425 53.23 KB
#5280 WriteAndFlushEnrichedTraces net6.0 545μs 394ns 1.47μs 0.536 0 0 41.71 KB
#5280 WriteAndFlushEnrichedTraces netcoreapp3.1 642μs 903ns 3.5μs 0.322 0 0 41.72 KB
#5280 WriteAndFlushEnrichedTraces net472 841μs 3.86μs 15μs 8.28 2.48 0.414 53.25 KB
Benchmarks.Trace.DbCommandBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #5280

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑net472 1.119 1,864.34 1,666.07

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.07μs 0.622ns 2.41ns 0.0109 0 0 776 B
master ExecuteNonQuery netcoreapp3.1 1.53μs 1.07ns 4.13ns 0.0108 0 0 776 B
master ExecuteNonQuery net472 1.87μs 1.79ns 6.69ns 0.117 0 0 738 B
#5280 ExecuteNonQuery net6.0 1.11μs 0.458ns 1.78ns 0.0111 0 0 776 B
#5280 ExecuteNonQuery netcoreapp3.1 1.59μs 4.81ns 18.6ns 0.0102 0 0 776 B
#5280 ExecuteNonQuery net472 1.67μs 2.14ns 8ns 0.117 0 0 738 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.26μs 0.416ns 1.56ns 0.0139 0 0 1 KB
master CallElasticsearch netcoreapp3.1 1.55μs 0.673ns 2.52ns 0.0133 0 0 1 KB
master CallElasticsearch net472 2.47μs 0.862ns 3.22ns 0.16 0 0 1.01 KB
master CallElasticsearchAsync net6.0 1.37μs 1.16ns 4.49ns 0.0137 0 0 976 B
master CallElasticsearchAsync netcoreapp3.1 1.62μs 0.682ns 2.55ns 0.0146 0 0 1.05 KB
master CallElasticsearchAsync net472 2.67μs 0.843ns 2.92ns 0.169 0 0 1.07 KB
#5280 CallElasticsearch net6.0 1.22μs 0.439ns 1.64ns 0.014 0 0 1 KB
#5280 CallElasticsearch netcoreapp3.1 1.57μs 2.31ns 8.64ns 0.0135 0 0 1 KB
#5280 CallElasticsearch net472 2.7μs 1.88ns 6.79ns 0.16 0 0 1.01 KB
#5280 CallElasticsearchAsync net6.0 1.27μs 0.545ns 2.04ns 0.0133 0 0 976 B
#5280 CallElasticsearchAsync netcoreapp3.1 1.69μs 1.19ns 4.47ns 0.0142 0 0 1.05 KB
#5280 CallElasticsearchAsync net472 2.57μs 1.54ns 5.76ns 0.17 0 0 1.07 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.27μs 0.912ns 3.53ns 0.0128 0 0 920 B
master ExecuteAsync netcoreapp3.1 1.64μs 0.722ns 2.8ns 0.0123 0 0 920 B
master ExecuteAsync net472 1.93μs 1.02ns 3.96ns 0.14 0 0 883 B
#5280 ExecuteAsync net6.0 1.37μs 1.03ns 3.84ns 0.0131 0 0 920 B
#5280 ExecuteAsync netcoreapp3.1 1.62μs 0.456ns 1.64ns 0.0121 0 0 920 B
#5280 ExecuteAsync net472 1.95μs 1.61ns 6.23ns 0.14 0 0 883 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.24μs 6.29ns 24.4ns 0.0297 0 0 2.15 KB
master SendAsync netcoreapp3.1 5.04μs 1.78ns 6.41ns 0.0379 0 0 2.69 KB
master SendAsync net472 7.77μs 4.32ns 16.2ns 0.532 0.00388 0 3.37 KB
#5280 SendAsync net6.0 4.26μs 5.19ns 20.1ns 0.0297 0 0 2.15 KB
#5280 SendAsync netcoreapp3.1 5.06μs 4.65ns 18ns 0.0353 0 0 2.69 KB
#5280 SendAsync net472 7.82μs 2.91ns 11.3ns 0.533 0 0 3.37 KB
Benchmarks.Trace.ILoggerBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #5280

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑netcoreapp3.1 1.123 2,069.98 2,324.93

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.44μs 0.468ns 1.75ns 0.0228 0 0 1.63 KB
master EnrichedLog netcoreapp3.1 2.07μs 1.01ns 3.8ns 0.0217 0 0 1.63 KB
master EnrichedLog net472 2.58μs 1.07ns 4.16ns 0.247 0 0 1.56 KB
#5280 EnrichedLog net6.0 1.42μs 0.717ns 2.68ns 0.0232 0 0 1.63 KB
#5280 EnrichedLog netcoreapp3.1 2.32μs 1.31ns 4.91ns 0.0221 0 0 1.63 KB
#5280 EnrichedLog net472 2.47μs 1.71ns 6.4ns 0.247 0 0 1.56 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 114μs 166ns 643ns 0.057 0 0 4.22 KB
master EnrichedLog netcoreapp3.1 118μs 97.7ns 379ns 0 0 0 4.22 KB
master EnrichedLog net472 146μs 112ns 433ns 0.657 0.219 0 4.4 KB
#5280 EnrichedLog net6.0 112μs 79.3ns 297ns 0 0 0 4.22 KB
#5280 EnrichedLog netcoreapp3.1 118μs 115ns 429ns 0 0 0 4.22 KB
#5280 EnrichedLog net472 146μs 57.3ns 222ns 0.656 0.219 0 4.4 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 3.07μs 0.946ns 3.41ns 0.0307 0 0 2.19 KB
master EnrichedLog netcoreapp3.1 4.36μs 2.21ns 8.26ns 0.0305 0 0 2.19 KB
master EnrichedLog net472 4.93μs 2.56ns 9.93ns 0.318 0 0 2.01 KB
#5280 EnrichedLog net6.0 2.94μs 2.33ns 9.02ns 0.0307 0 0 2.19 KB
#5280 EnrichedLog netcoreapp3.1 4.2μs 2.54ns 9.82ns 0.0293 0 0 2.19 KB
#5280 EnrichedLog net472 4.91μs 15.7ns 60.8ns 0.318 0 0 2.01 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.36μs 1.21ns 4.53ns 0.0162 0 0 1.17 KB
master SendReceive netcoreapp3.1 1.79μs 1.62ns 6.06ns 0.0161 0 0 1.17 KB
master SendReceive net472 2.34μs 1.66ns 6.45ns 0.186 0 0 1.17 KB
#5280 SendReceive net6.0 1.39μs 0.821ns 2.96ns 0.0165 0 0 1.17 KB
#5280 SendReceive netcoreapp3.1 1.82μs 1.14ns 4.43ns 0.0153 0 0 1.17 KB
#5280 SendReceive net472 2.14μs 3.66ns 13.2ns 0.186 0 0 1.17 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.85μs 1.69ns 6.32ns 0.0213 0 0 1.54 KB
master EnrichedLog netcoreapp3.1 4.02μs 0.845ns 3.27ns 0.02 0 0 1.58 KB
master EnrichedLog net472 4.46μs 3.06ns 11.9ns 0.311 0 0 1.97 KB
#5280 EnrichedLog net6.0 2.62μs 0.639ns 2.39ns 0.0222 0 0 1.54 KB
#5280 EnrichedLog netcoreapp3.1 3.86μs 2.04ns 7.37ns 0.0213 0 0 1.58 KB
#5280 EnrichedLog net472 4.31μs 3.45ns 13.4ns 0.314 0 0 1.97 KB
Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 462ns 0.147ns 0.551ns 0.00759 0 0 544 B
master StartFinishSpan netcoreapp3.1 725ns 1.39ns 5.4ns 0.0074 0 0 544 B
master StartFinishSpan net472 784ns 1.44ns 5.59ns 0.0864 0 0 546 B
master StartFinishScope net6.0 612ns 0.871ns 3.37ns 0.00916 0 0 664 B
master StartFinishScope netcoreapp3.1 794ns 0.945ns 3.66ns 0.00892 0 0 664 B
master StartFinishScope net472 942ns 1.79ns 6.72ns 0.0991 0 0 626 B
#5280 StartFinishSpan net6.0 511ns 0.842ns 3.26ns 0.00757 0 0 544 B
#5280 StartFinishSpan netcoreapp3.1 664ns 1.02ns 3.8ns 0.00751 0 0 544 B
#5280 StartFinishSpan net472 719ns 1.6ns 6.2ns 0.0865 0 0 546 B
#5280 StartFinishScope net6.0 587ns 0.44ns 1.7ns 0.0094 0 0 664 B
#5280 StartFinishScope netcoreapp3.1 823ns 1.59ns 6.17ns 0.00895 0 0 664 B
#5280 StartFinishScope net472 924ns 4.06ns 15.7ns 0.0992 0 0 626 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #5280

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net6.0 1.201 603.46 724.86

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 603ns 0.253ns 0.98ns 0.00927 0 0 664 B
master RunOnMethodBegin netcoreapp3.1 921ns 0.835ns 3.23ns 0.00879 0 0 664 B
master RunOnMethodBegin net472 1.04μs 1.83ns 7.08ns 0.099 0 0 626 B
#5280 RunOnMethodBegin net6.0 724ns 1.03ns 3.99ns 0.00942 0 0 664 B
#5280 RunOnMethodBegin netcoreapp3.1 902ns 1.08ns 4.05ns 0.00892 0 0 664 B
#5280 RunOnMethodBegin net472 1.07μs 1.75ns 6.78ns 0.099 0 0 626 B

@andrewlock
Copy link
Member Author

andrewlock commented Mar 13, 2024

Throughput/Crank Report:zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5280) (10.951M)   : 0, 10950843
    master (11.075M)   : 0, 11074769
    benchmarks/2.9.0 (11.247M)   : 0, 11247223

    section Automatic
    This PR (5280) (7.520M)   : 0, 7519659
    master (7.693M)   : 0, 7692695
    benchmarks/2.9.0 (8.075M)   : 0, 8075266

    section Trace stats
    This PR (5280) (7.984M)   : 0, 7984211
    master (8.071M)   : 0, 8071154

    section Manual
    This PR (5280) (9.564M)   : 0, 9564244
    master (9.710M)   : 0, 9709792

    section Manual + Automatic
    This PR (5280) (7.217M)   : 0, 7217252
    master (7.351M)   : 0, 7351473

    section Version Conflict
    This PR (5280) (6.434M)   : 0, 6434399
    master (6.502M)   : 0, 6501874

Loading
gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5280) (9.594M)   : 0, 9593990
    master (9.679M)   : 0, 9678523
    benchmarks/2.9.0 (9.694M)   : 0, 9694479

    section Automatic
    This PR (5280) (6.527M)   : 0, 6526785
    master (6.566M)   : 0, 6566475

    section Trace stats
    This PR (5280) (6.924M)   : 0, 6924274
    master (6.914M)   : 0, 6914281

    section Manual
    This PR (5280) (8.288M)   : 0, 8287968
    master (8.374M)   : 0, 8374285

    section Manual + Automatic
    This PR (5280) (6.319M)   : 0, 6318671
    master (6.237M)   : 0, 6237327

    section Version Conflict
    This PR (5280) (5.802M)   : 0, 5802042
    master (5.709M)   : 0, 5708852

Loading
gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (5280) (9.343M)   : 0, 9343037
    master (9.570M)   : 0, 9570319
    benchmarks/2.9.0 (9.301M)   : 0, 9300867

    section Automatic
    This PR (5280) (6.608M)   : 0, 6607801
    master (6.702M)   : 0, 6701601
    benchmarks/2.9.0 (7.022M)   : 0, 7021879

    section Trace stats
    This PR (5280) (6.939M)   : 0, 6938927
    master (6.986M)   : 0, 6986474

    section Manual
    This PR (5280) (8.172M)   : 0, 8171637
    master (8.284M)   : 0, 8283598

    section Manual + Automatic
    This PR (5280) (6.234M)   : 0, 6234113
    master (6.435M)   : 0, 6434689

    section Version Conflict
    This PR (5280) (5.753M)   : 0, 5753313
    master (5.952M)   : 0, 5952245

Loading

@andrewlock andrewlock force-pushed the andrew/dont-initialize-tracer-in-process-start branch from 5d8e138 to 13fc598 Compare March 13, 2024 17:31
@andrewlock andrewlock force-pushed the andrew/dont-initialize-tracer-in-process-start branch from 13fc598 to a018cf6 Compare March 14, 2024 08:22
@andrewlock andrewlock merged commit 23d98c8 into master Mar 14, 2024
@andrewlock andrewlock deleted the andrew/dont-initialize-tracer-in-process-start branch March 14, 2024 14:49
@github-actions github-actions bot added this to the vNext milestone Mar 14, 2024
andrewlock added a commit that referenced this pull request Nov 11, 2025
## Summary of changes

- Adds a workaround for the version-conflict issue that occurs on app
startup on Linux
- Allow doing call target modification of version conflict dll
- Fix a bug in CallTarget instrumentation where we get the assembly
reference wrong

## Reason for change

As part of app startup, on _some_ linux distros, we shell out to `stat`
to build the container tags/entity ID. We don't want to trace this
`Process.Start()` call, so in
#5280 we added a flag to
skip instrumenting these calls. However, this fix relies on a
`[ThreadStatic]` variable, and in version-conflict scenarios (2.x.x
manual, 3.x.x automatic) we end up still instrumenting this call, which
causes recursion in `Tracer` initialization and
[errors](https://app.datadoghq.com/error-tracking?query=service%3Ainstrumentation-telemetry-data%20source%3Adotnet%20%40tracer_version%3A3.29.0.0%20-%40error.is_crash%3Atrue&et-side=activity&order=total_count&refresh_mode=sliding&source=all&sp=%5B%7B%22p%22%3A%7B%22issueId%22%3A%22cbce5fc2-3adf-11f0-a4be-da7ad0900002%22%7D%2C%22i%22%3A%22error-tracking-issue%22%7D%5D&view=spans&from_ts=1761134402500&to_ts=1762344002500&live=true).

> Note that since #7453
we don't do a process start at all, but that doesn't help in this
situation, because it's the 2.x.x library that's doing the
`Process.Start()`

## Implementation details

- Use standard call target instrumentation on the 2.x.x version of
`Datadog.Trace` (i.e. version conflict only)
- Hook the `ProcessHelpers.StartWithDoNotTrace()` method, and set the
3.x.x `_doNotTrace` variable for the duration of the method call
- Tweak the Rejit handler so that we _do_ rejit/call target the
Datadog.Trace 2.x.x module (but not the 3.x.x module)
- Fix a bug in the module builder which was incorrectly injecting a
reference to the 2.x.x assembly instead of the 3.x.x assembly

## Test coverage

We were already working around this issue in our VersionConflict tests,
so I removed the workaround, confirmed that the test failed, then made
the fix, and confirmed the tests pass again.

## Other details

This will only help for customers using a manual version of 2.49.0+
(when we introduced the `StartWithDoNotTrace()` call). I think that's
good enough support.

---------

Co-authored-by: Lucas Pimentel <lucas.pimentel@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:automatic-instrumentation Automatic instrumentation managed C# code (Datadog.Trace.ClrProfiler.Managed) area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) identified-by:telemetry type:bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants