Skip to content

Fix OnSupportedFrameworkInSsi_CallsForwarderWithExpectedTelemetry flakiness#7981

Merged
NachoEchevarria merged 2 commits intomasterfrom
nacho/OnSupportedFrameworkInSsiFlakyness
Dec 26, 2025
Merged

Fix OnSupportedFrameworkInSsi_CallsForwarderWithExpectedTelemetry flakiness#7981
NachoEchevarria merged 2 commits intomasterfrom
nacho/OnSupportedFrameworkInSsiFlakyness

Conversation

@NachoEchevarria
Copy link
Collaborator

@NachoEchevarria NachoEchevarria commented Dec 19, 2025

Summary of changes

Fixes flaky test InstrumentationTests.OnSupportedFrameworkInSsi_CallsForwarderWithExpectedTelemetry by adding a retry mechanism when checking for the telemetry echo log file.

Reason for change

The test was intermittently failing in CI with:

Expected File.Exists(echoLogFileName) to be True, but found False.

The test is already marked as flaky because of this issue. This seems to be a different problem though.

Root cause: Race condition between test assertion and asynchronous telemetry forwarder execution.

The telemetry echo log file is written by a separate process (telemetry_echo.exe), not by the sample application itself. The native loader spawns this forwarder process on a detached thread (single_step_guard_rails.cpp:346) to avoid blocking application startup. Since the forwarder runs in a separate process, the sample application can complete and exit before the forwarder finishes writing the log file. On CI machines with high CPU contention, this timing variance becomes observable, causing the test to check for the file before it exists.

CI Failure: https://dev.azure.com/datadoghq/dd-trace-dotnet/_build/results?buildId=192494&view=logs&j=ca775ed2-8590-5a92-528d-855583aa2d34&t=9804592b-55de-5ba0-6562-0dbf7f7ff3f6

Implementation details

Added a polling loop with 10-second timeout in AssertHasExpectedTelemetry (line 799-807):

  • Polls every 100ms for file existence
  • Allows up to 10 seconds for the detached telemetry thread to complete
  • Provides clear failure message if timeout is exceeded
  • Similar pattern already used elsewhere in the codebase (e.g., Samples.Console/Program.cs:71-74)

The fix acknowledges the production design (non-blocking telemetry on detached thread) while making the test tolerant to timing variance.

Test coverage

This change fixes an existing flaky test. The test itself validates telemetry reporting in Single-Step Instrumentation scenarios and continues to provide the same coverage with improved reliability.

Other details

@github-actions github-actions bot added the area:tests unit tests, integration tests label Dec 19, 2025
@dd-trace-dotnet-ci-bot
Copy link

dd-trace-dotnet-ci-bot bot commented Dec 19, 2025

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (7981) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration68.39 ± (68.36 - 68.63) ms68.57 ± (68.55 - 68.80) ms+0.3%✅⬆️
.NET Framework 4.8 - Bailout
duration72.14 ± (72.10 - 72.38) ms72.32 ± (72.21 - 72.46) ms+0.3%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1000.15 ± (1002.77 - 1008.87) ms1008.25 ± (1010.06 - 1016.76) ms+0.8%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms21.91 ± (21.88 - 21.95) ms21.95 ± (21.91 - 22.00) ms+0.2%✅⬆️
process.time_to_main_ms78.58 ± (78.44 - 78.72) ms79.00 ± (78.82 - 79.18) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.91 ± (10.90 - 10.91) MB10.91 ± (10.90 - 10.91) MB+0.0%✅⬆️
runtime.dotnet.threads.count12 ± (12 - 12)12 ± (12 - 12)+0.0%
.NET Core 3.1 - Bailout
process.internal_duration_ms21.82 ± (21.81 - 21.84) ms21.87 ± (21.85 - 21.90) ms+0.2%✅⬆️
process.time_to_main_ms79.92 ± (79.83 - 80.01) ms80.17 ± (80.04 - 80.30) ms+0.3%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.96 ± (10.96 - 10.97) MB10.94 ± (10.94 - 10.95) MB-0.2%
runtime.dotnet.threads.count13 ± (13 - 13)13 ± (13 - 13)+0.0%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms248.95 ± (245.05 - 252.85) ms245.56 ± (241.31 - 249.80) ms-1.4%
process.time_to_main_ms469.67 ± (469.18 - 470.17) ms472.08 ± (471.69 - 472.47) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.18 ± (48.15 - 48.20) MB48.16 ± (48.14 - 48.18) MB-0.0%
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)-0.9%
.NET 6 - Baseline
process.internal_duration_ms20.56 ± (20.53 - 20.59) ms20.62 ± (20.58 - 20.65) ms+0.3%✅⬆️
process.time_to_main_ms67.97 ± (67.85 - 68.10) ms68.17 ± (68.05 - 68.29) ms+0.3%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.59 ± (10.59 - 10.60) MB10.63 ± (10.63 - 10.64) MB+0.4%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 6 - Bailout
process.internal_duration_ms20.57 ± (20.55 - 20.59) ms20.54 ± (20.52 - 20.57) ms-0.1%
process.time_to_main_ms68.98 ± (68.92 - 69.03) ms69.08 ± (69.02 - 69.14) ms+0.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.66 ± (10.65 - 10.66) MB10.72 ± (10.72 - 10.73) MB+0.6%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms241.61 ± (238.90 - 244.31) ms248.95 ± (246.78 - 251.11) ms+3.0%✅⬆️
process.time_to_main_ms438.41 ± (438.04 - 438.79) ms443.85 ± (443.14 - 444.57) ms+1.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.65 ± (48.62 - 48.68) MB48.59 ± (48.56 - 48.62) MB-0.1%
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)+0.4%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms18.85 ± (18.82 - 18.88) ms18.80 ± (18.76 - 18.83) ms-0.3%
process.time_to_main_ms67.08 ± (66.98 - 67.17) ms67.39 ± (67.28 - 67.50) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.67 ± (7.66 - 7.68) MB7.67 ± (7.66 - 7.67) MB-0.1%
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 8 - Bailout
process.internal_duration_ms18.81 ± (18.79 - 18.83) ms18.81 ± (18.78 - 18.84) ms+0.0%✅⬆️
process.time_to_main_ms68.14 ± (68.08 - 68.19) ms68.56 ± (68.48 - 68.65) ms+0.6%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.74 ± (7.73 - 7.75) MB7.74 ± (7.72 - 7.75) MB-0.0%
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms178.74 ± (177.64 - 179.84) ms179.19 ± (178.28 - 180.11) ms+0.3%✅⬆️
process.time_to_main_ms425.04 ± (424.15 - 425.93) ms425.04 ± (424.40 - 425.69) ms+0.0%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed36.31 ± (36.29 - 36.34) MB36.25 ± (36.22 - 36.28) MB-0.2%
runtime.dotnet.threads.count27 ± (27 - 27)27 ± (27 - 27)-0.2%

HttpMessageHandler

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration194.31 ± (194.09 - 195.09) ms195.70 ± (194.84 - 195.83) ms+0.7%✅⬆️
.NET Framework 4.8 - Bailout
duration196.73 ± (196.52 - 197.10) ms199.72 ± (198.63 - 199.59) ms+1.5%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1106.57 ± (1110.07 - 1119.33) ms1105.78 ± (1110.45 - 1118.75) ms-0.1%
.NET Core 3.1 - Baseline
process.internal_duration_ms187.30 ± (186.92 - 187.68) ms188.16 ± (187.75 - 188.58) ms+0.5%✅⬆️
process.time_to_main_ms80.59 ± (80.34 - 80.84) ms81.07 ± (80.82 - 81.32) ms+0.6%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.14 ± (16.11 - 16.18) MB16.07 ± (16.04 - 16.10) MB-0.5%
runtime.dotnet.threads.count20 ± (20 - 20)20 ± (20 - 20)-0.5%
.NET Core 3.1 - Bailout
process.internal_duration_ms187.27 ± (186.93 - 187.60) ms187.83 ± (187.50 - 188.16) ms+0.3%✅⬆️
process.time_to_main_ms82.10 ± (81.93 - 82.26) ms82.08 ± (81.94 - 82.22) ms-0.0%
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.20 ± (16.16 - 16.23) MB16.14 ± (16.11 - 16.16) MB-0.4%
runtime.dotnet.threads.count21 ± (21 - 21)21 ± (21 - 21)-0.1%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms415.45 ± (411.93 - 418.97) ms422.62 ± (418.95 - 426.29) ms+1.7%✅⬆️
process.time_to_main_ms472.96 ± (472.36 - 473.57) ms475.07 ± (474.44 - 475.70) ms+0.4%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed58.69 ± (58.57 - 58.81) MB58.67 ± (58.55 - 58.78) MB-0.0%
runtime.dotnet.threads.count29 ± (29 - 30)29 ± (29 - 30)-0.1%
.NET 6 - Baseline
process.internal_duration_ms191.96 ± (191.65 - 192.27) ms192.14 ± (191.82 - 192.45) ms+0.1%✅⬆️
process.time_to_main_ms69.99 ± (69.80 - 70.17) ms70.21 ± (70.06 - 70.35) ms+0.3%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.17 ± (16.04 - 16.29) MB16.26 ± (16.15 - 16.37) MB+0.6%✅⬆️
runtime.dotnet.threads.count19 ± (18 - 19)19 ± (18 - 19)-0.2%
.NET 6 - Bailout
process.internal_duration_ms191.30 ± (190.99 - 191.62) ms191.30 ± (190.99 - 191.61) ms-0.0%
process.time_to_main_ms70.88 ± (70.77 - 71.00) ms71.00 ± (70.88 - 71.11) ms+0.2%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.24 ± (16.12 - 16.37) MB16.17 ± (16.03 - 16.30) MB-0.5%
runtime.dotnet.threads.count20 ± (19 - 20)20 ± (20 - 20)+0.5%✅⬆️
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms455.87 ± (453.99 - 457.75) ms457.29 ± (455.29 - 459.29) ms+0.3%✅⬆️
process.time_to_main_ms444.36 ± (443.89 - 444.82) ms445.81 ± (445.36 - 446.26) ms+0.3%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed58.23 ± (58.13 - 58.33) MB58.27 ± (58.16 - 58.38) MB+0.1%✅⬆️
runtime.dotnet.threads.count30 ± (29 - 30)30 ± (29 - 30)+0.0%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms190.16 ± (189.83 - 190.49) ms190.96 ± (190.61 - 191.32) ms+0.4%✅⬆️
process.time_to_main_ms69.45 ± (69.25 - 69.66) ms69.79 ± (69.58 - 69.99) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.77 ± (11.73 - 11.80) MB11.72 ± (11.70 - 11.75) MB-0.4%
runtime.dotnet.threads.count18 ± (18 - 18)18 ± (18 - 18)-0.2%
.NET 8 - Bailout
process.internal_duration_ms189.45 ± (189.18 - 189.72) ms189.81 ± (189.55 - 190.07) ms+0.2%✅⬆️
process.time_to_main_ms70.27 ± (70.17 - 70.38) ms70.66 ± (70.54 - 70.78) ms+0.6%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.82 ± (11.79 - 11.84) MB11.82 ± (11.80 - 11.85) MB+0.1%✅⬆️
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (19 - 19)+0.6%✅⬆️
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms362.82 ± (361.13 - 364.51) ms363.68 ± (362.04 - 365.31) ms+0.2%✅⬆️
process.time_to_main_ms428.15 ± (427.55 - 428.74) ms430.12 ± (429.47 - 430.77) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed47.91 ± (47.87 - 47.95) MB47.94 ± (47.91 - 47.98) MB+0.1%✅⬆️
runtime.dotnet.threads.count29 ± (29 - 29)29 ± (29 - 29)-0.0%
Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts
FakeDbCommand (.NET Framework 4.8)
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (69ms)  : 67, 70
    master - mean (68ms)  : 67, 70

    section Bailout
    This PR (7981) - mean (72ms)  : 71, 73
    master - mean (72ms)  : 71, 74

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (1,013ms)  : 965, 1062
    master - mean (1,006ms)  : 963, 1049

Loading
FakeDbCommand (.NET Core 3.1)
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (106ms)  : 103, 109
    master - mean (106ms)  : 103, 108

    section Bailout
    This PR (7981) - mean (107ms)  : 106, 109
    master - mean (107ms)  : 106, 108

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (743ms)  : 684, 803
    master - mean (738ms)  : 669, 808

Loading
FakeDbCommand (.NET 6)
gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (94ms)  : 91, 96
    master - mean (93ms)  : 91, 95

    section Bailout
    This PR (7981) - mean (94ms)  : 93, 95
    master - mean (94ms)  : 93, 95

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (720ms)  : 686, 755
    master - mean (707ms)  : 668, 745

Loading
FakeDbCommand (.NET 8)
gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (92ms)  : 90, 95
    master - mean (92ms)  : 90, 94

    section Bailout
    This PR (7981) - mean (94ms)  : 92, 95
    master - mean (93ms)  : 92, 94

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (632ms)  : 616, 647
    master - mean (631ms)  : 618, 644

Loading
HttpMessageHandler (.NET Framework 4.8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (195ms)  : 190, 201
    master - mean (195ms)  : 189, 200

    section Bailout
    This PR (7981) - mean (199ms)  : 195, 204
    master - mean (197ms)  : 194, 200

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (1,115ms)  : 1053, 1176
    master - mean (1,115ms)  : 1048, 1182

Loading
HttpMessageHandler (.NET Core 3.1)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (278ms)  : 273, 283
    master - mean (277ms)  : 271, 283

    section Bailout
    This PR (7981) - mean (278ms)  : 274, 283
    master - mean (277ms)  : 272, 283

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (929ms)  : 884, 975
    master - mean (918ms)  : 858, 977

Loading
HttpMessageHandler (.NET 6)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (271ms)  : 266, 275
    master - mean (270ms)  : 266, 275

    section Bailout
    This PR (7981) - mean (270ms)  : 266, 275
    master - mean (270ms)  : 266, 274

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (931ms)  : 894, 968
    master - mean (927ms)  : 897, 958

Loading
HttpMessageHandler (.NET 8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7981) - mean (270ms)  : 265, 276
    master - mean (269ms)  : 264, 275

    section Bailout
    This PR (7981) - mean (270ms)  : 266, 274
    master - mean (269ms)  : 265, 274

    section CallTarget+Inlining+NGEN
    This PR (7981) - mean (825ms)  : 806, 844
    master - mean (823ms)  : 802, 844

Loading

@NachoEchevarria NachoEchevarria marked this pull request as ready for review December 24, 2025 14:46
@NachoEchevarria NachoEchevarria requested review from a team as code owners December 24, 2025 14:46
Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks makes sense!

var fileWaitStart = DateTime.UtcNow;
while (!File.Exists(echoLogFileName) && (DateTime.UtcNow - fileWaitStart) < fileWaitTimeout)
{
Thread.Sleep(100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is going to sleep for up to 10s, should we make this an async Task.Delay instead? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that would mitigate the delay. Done!

@NachoEchevarria NachoEchevarria merged commit fd77731 into master Dec 26, 2025
151 checks passed
@NachoEchevarria NachoEchevarria deleted the nacho/OnSupportedFrameworkInSsiFlakyness branch December 26, 2025 14:23
@github-actions github-actions bot added this to the vNext-v3 milestone Dec 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:tests unit tests, integration tests type:flake-fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants