Fix OnSupportedFrameworkInSsi_CallsForwarderWithExpectedTelemetry flakiness#7981
Conversation
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (7981) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (69ms) : 67, 70
master - mean (68ms) : 67, 70
section Bailout
This PR (7981) - mean (72ms) : 71, 73
master - mean (72ms) : 71, 74
section CallTarget+Inlining+NGEN
This PR (7981) - mean (1,013ms) : 965, 1062
master - mean (1,006ms) : 963, 1049
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (106ms) : 103, 109
master - mean (106ms) : 103, 108
section Bailout
This PR (7981) - mean (107ms) : 106, 109
master - mean (107ms) : 106, 108
section CallTarget+Inlining+NGEN
This PR (7981) - mean (743ms) : 684, 803
master - mean (738ms) : 669, 808
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (94ms) : 91, 96
master - mean (93ms) : 91, 95
section Bailout
This PR (7981) - mean (94ms) : 93, 95
master - mean (94ms) : 93, 95
section CallTarget+Inlining+NGEN
This PR (7981) - mean (720ms) : 686, 755
master - mean (707ms) : 668, 745
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (92ms) : 90, 95
master - mean (92ms) : 90, 94
section Bailout
This PR (7981) - mean (94ms) : 92, 95
master - mean (93ms) : 92, 94
section CallTarget+Inlining+NGEN
This PR (7981) - mean (632ms) : 616, 647
master - mean (631ms) : 618, 644
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (195ms) : 190, 201
master - mean (195ms) : 189, 200
section Bailout
This PR (7981) - mean (199ms) : 195, 204
master - mean (197ms) : 194, 200
section CallTarget+Inlining+NGEN
This PR (7981) - mean (1,115ms) : 1053, 1176
master - mean (1,115ms) : 1048, 1182
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (278ms) : 273, 283
master - mean (277ms) : 271, 283
section Bailout
This PR (7981) - mean (278ms) : 274, 283
master - mean (277ms) : 272, 283
section CallTarget+Inlining+NGEN
This PR (7981) - mean (929ms) : 884, 975
master - mean (918ms) : 858, 977
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (271ms) : 266, 275
master - mean (270ms) : 266, 275
section Bailout
This PR (7981) - mean (270ms) : 266, 275
master - mean (270ms) : 266, 274
section CallTarget+Inlining+NGEN
This PR (7981) - mean (931ms) : 894, 968
master - mean (927ms) : 897, 958
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7981) - mean (270ms) : 265, 276
master - mean (269ms) : 264, 275
section Bailout
This PR (7981) - mean (270ms) : 266, 274
master - mean (269ms) : 265, 274
section CallTarget+Inlining+NGEN
This PR (7981) - mean (825ms) : 806, 844
master - mean (823ms) : 802, 844
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| var fileWaitStart = DateTime.UtcNow; | ||
| while (!File.Exists(echoLogFileName) && (DateTime.UtcNow - fileWaitStart) < fileWaitTimeout) | ||
| { | ||
| Thread.Sleep(100); |
There was a problem hiding this comment.
Given this is going to sleep for up to 10s, should we make this an async Task.Delay instead? 🤔
There was a problem hiding this comment.
Sure, that would mitigate the delay. Done!
Summary of changes
Fixes flaky test
InstrumentationTests.OnSupportedFrameworkInSsi_CallsForwarderWithExpectedTelemetryby adding a retry mechanism when checking for the telemetry echo log file.Reason for change
The test was intermittently failing in CI with:
The test is already marked as flaky because of this issue. This seems to be a different problem though.
Root cause: Race condition between test assertion and asynchronous telemetry forwarder execution.
The telemetry echo log file is written by a separate process (
telemetry_echo.exe), not by the sample application itself. The native loader spawns this forwarder process on a detached thread (single_step_guard_rails.cpp:346) to avoid blocking application startup. Since the forwarder runs in a separate process, the sample application can complete and exit before the forwarder finishes writing the log file. On CI machines with high CPU contention, this timing variance becomes observable, causing the test to check for the file before it exists.CI Failure: https://dev.azure.com/datadoghq/dd-trace-dotnet/_build/results?buildId=192494&view=logs&j=ca775ed2-8590-5a92-528d-855583aa2d34&t=9804592b-55de-5ba0-6562-0dbf7f7ff3f6
Implementation details
Added a polling loop with 10-second timeout in
AssertHasExpectedTelemetry(line 799-807):Samples.Console/Program.cs:71-74)The fix acknowledges the production design (non-blocking telemetry on detached thread) while making the test tolerant to timing variance.
Test coverage
This change fixes an existing flaky test. The test itself validates telemetry reporting in Single-Step Instrumentation scenarios and continues to provide the same coverage with improved reliability.
Other details