Skip to content

fix(DSM): Fix race condition on DataStreamsWriter disposal#7968

Merged
robcarlan-datadog merged 1 commit intomasterfrom
rob.carlan/DSMON-1184-flush-race-condition-fix
Dec 18, 2025
Merged

fix(DSM): Fix race condition on DataStreamsWriter disposal#7968
robcarlan-datadog merged 1 commit intomasterfrom
rob.carlan/DSMON-1184-flush-race-condition-fix

Conversation

@robcarlan-datadog
Copy link
Contributor

@robcarlan-datadog robcarlan-datadog commented Dec 17, 2025

Summary of changes

Fixes a race condition where we attempt to dispose of a semaphore when it is being held by another task.

Reason for change

I think this is the root cause behind some unit test flake, and some errors when disposing DataStreamsWriter.

Implementation details

There are two tasks that run: flushTask and processTask. DataStreamsWriter::DisposeAsync calls _flushSemaphore.Dispose(); after FlushAndCloseAsync().

FlushAndCloseAsync sets an exit flag for the task, but only waits for processTask to complete. If flushTask is running and has already acquired the semaphore, the DisposeAsync will still dispose of the semaphore even though flushTask is still using it.

Fixed by waiting for both tasks to complete, as flushTask will finish once processTask finishes. And we still have the 1 second fallback.

This should also fix the test flake. The unit test immediately calls dispose after adding the data points:

writer.Add(CreateStatsPoint(timestamp));
writer.AddBacklog(CreateBacklogPoint(timestamp));

await writer.DisposeAsync();

So it seems likely that we might dispose the semaphore before flush has a chance to run.

Test coverage

Other details

@github-actions github-actions bot added area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) area:data-streams-monitoring labels Dec 17, 2025
return;
}

var allTasks = Task.WhenAll(_processTask, _flushTask);
Copy link
Contributor Author

@robcarlan-datadog robcarlan-datadog Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush task will return after processTask returns (which returns because we set processExit above). So there will be nothing using the semaphore after this.

Hence, we are safe to dispose the semaphore on L163 after this method returns because no task will be using it.

@dd-trace-dotnet-ci-bot
Copy link

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (7968) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration73.69 ± (73.68 - 73.97) ms73.49 ± (73.50 - 73.80) ms-0.3%
.NET Framework 4.8 - Bailout
duration77.63 ± (77.45 - 77.77) ms77.62 ± (77.58 - 77.91) ms-0.0%
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1048.41 ± (1051.57 - 1061.13) ms1047.22 ± (1048.82 - 1058.18) ms-0.1%
.NET Core 3.1 - Baseline
process.internal_duration_ms22.82 ± (22.78 - 22.87) ms22.75 ± (22.70 - 22.79) ms-0.3%
process.time_to_main_ms84.63 ± (84.45 - 84.82) ms84.00 ± (83.82 - 84.19) ms-0.7%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.93 ± (10.93 - 10.94) MB10.92 ± (10.91 - 10.92) MB-0.1%
runtime.dotnet.threads.count12 ± (12 - 12)12 ± (12 - 12)+0.0%
.NET Core 3.1 - Bailout
process.internal_duration_ms22.74 ± (22.70 - 22.79) ms22.81 ± (22.77 - 22.85) ms+0.3%✅⬆️
process.time_to_main_ms87.23 ± (87.00 - 87.45) ms86.53 ± (86.30 - 86.77) ms-0.8%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.95 ± (10.94 - 10.95) MB10.95 ± (10.95 - 10.96) MB+0.1%✅⬆️
runtime.dotnet.threads.count13 ± (13 - 13)13 ± (13 - 13)+0.0%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms218.87 ± (217.56 - 220.17) ms218.37 ± (216.92 - 219.82) ms-0.2%
process.time_to_main_ms492.61 ± (491.87 - 493.35) ms496.69 ± (495.92 - 497.46) ms+0.8%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.02 ± (48.00 - 48.04) MB48.05 ± (48.03 - 48.07) MB+0.1%✅⬆️
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)+0.0%
.NET 6 - Baseline
process.internal_duration_ms21.32 ± (21.29 - 21.36) ms21.50 ± (21.47 - 21.53) ms+0.8%✅⬆️
process.time_to_main_ms72.75 ± (72.59 - 72.92) ms73.18 ± (73.04 - 73.32) ms+0.6%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.60 ± (10.59 - 10.60) MB10.63 ± (10.62 - 10.63) MB+0.3%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 6 - Bailout
process.internal_duration_ms21.27 ± (21.23 - 21.31) ms21.39 ± (21.34 - 21.43) ms+0.5%✅⬆️
process.time_to_main_ms74.13 ± (73.96 - 74.29) ms73.71 ± (73.58 - 73.84) ms-0.6%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.66 ± (10.66 - 10.66) MB10.67 ± (10.67 - 10.68) MB+0.1%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms206.60 ± (205.55 - 207.65) ms205.81 ± (204.73 - 206.90) ms-0.4%
process.time_to_main_ms457.06 ± (456.38 - 457.73) ms456.64 ± (455.87 - 457.42) ms-0.1%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.57 ± (48.55 - 48.60) MB48.56 ± (48.53 - 48.58) MB-0.0%
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)+0.2%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms19.46 ± (19.43 - 19.50) ms19.70 ± (19.67 - 19.74) ms+1.2%✅⬆️
process.time_to_main_ms72.33 ± (72.14 - 72.51) ms72.95 ± (72.80 - 73.11) ms+0.9%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.67 ± (7.66 - 7.68) MB7.69 ± (7.68 - 7.69) MB+0.2%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 8 - Bailout
process.internal_duration_ms19.54 ± (19.50 - 19.57) ms19.63 ± (19.59 - 19.68) ms+0.5%✅⬆️
process.time_to_main_ms73.37 ± (73.22 - 73.53) ms74.22 ± (74.08 - 74.37) ms+1.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.71 ± (7.70 - 7.72) MB7.75 ± (7.74 - 7.76) MB+0.5%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms187.95 ± (187.07 - 188.83) ms188.06 ± (187.27 - 188.85) ms+0.1%✅⬆️
process.time_to_main_ms444.08 ± (443.42 - 444.74) ms443.63 ± (442.99 - 444.28) ms-0.1%
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed36.43 ± (36.38 - 36.47) MB36.42 ± (36.38 - 36.46) MB-0.0%
runtime.dotnet.threads.count27 ± (27 - 27)27 ± (27 - 27)-0.2%

HttpMessageHandler

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration193.62 ± (193.72 - 194.51) ms194.06 ± (194.33 - 195.23) ms+0.2%✅⬆️
.NET Framework 4.8 - Bailout
duration197.30 ± (197.10 - 197.86) ms197.36 ± (197.29 - 197.91) ms+0.0%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1116.57 ± (1121.08 - 1130.19) ms1121.38 ± (1127.17 - 1137.06) ms+0.4%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms188.09 ± (187.71 - 188.47) ms188.28 ± (187.90 - 188.67) ms+0.1%✅⬆️
process.time_to_main_ms80.52 ± (80.35 - 80.70) ms81.07 ± (80.85 - 81.30) ms+0.7%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.04 ± (16.02 - 16.07) MB16.09 ± (16.06 - 16.11) MB+0.3%✅⬆️
runtime.dotnet.threads.count20 ± (19 - 20)20 ± (20 - 20)+0.3%✅⬆️
.NET Core 3.1 - Bailout
process.internal_duration_ms187.82 ± (187.45 - 188.19) ms186.97 ± (186.63 - 187.31) ms-0.5%
process.time_to_main_ms82.24 ± (82.11 - 82.38) ms82.03 ± (81.88 - 82.18) ms-0.3%
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.08 ± (16.05 - 16.11) MB16.12 ± (16.10 - 16.15) MB+0.3%✅⬆️
runtime.dotnet.threads.count21 ± (20 - 21)21 ± (21 - 21)+0.4%✅⬆️
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms398.20 ± (395.31 - 401.09) ms400.11 ± (397.07 - 403.16) ms+0.5%✅⬆️
process.time_to_main_ms475.57 ± (474.93 - 476.21) ms476.75 ± (475.88 - 477.63) ms+0.2%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed58.47 ± (58.32 - 58.61) MB58.70 ± (58.55 - 58.85) MB+0.4%✅⬆️
runtime.dotnet.threads.count29 ± (29 - 29)29 ± (29 - 30)+0.0%✅⬆️
.NET 6 - Baseline
process.internal_duration_ms192.57 ± (192.20 - 192.95) ms192.06 ± (191.76 - 192.36) ms-0.3%
process.time_to_main_ms70.17 ± (69.97 - 70.37) ms70.08 ± (69.92 - 70.25) ms-0.1%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.30 ± (16.24 - 16.37) MB16.25 ± (16.16 - 16.34) MB-0.3%
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (18 - 19)-0.9%
.NET 6 - Bailout
process.internal_duration_ms191.64 ± (191.38 - 191.90) ms191.09 ± (190.79 - 191.39) ms-0.3%
process.time_to_main_ms70.81 ± (70.71 - 70.91) ms70.88 ± (70.79 - 70.96) ms+0.1%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed15.87 ± (15.70 - 16.04) MB16.04 ± (15.89 - 16.19) MB+1.0%✅⬆️
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (19 - 19)+1.5%✅⬆️
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms409.64 ± (407.38 - 411.91) ms404.86 ± (403.03 - 406.69) ms-1.2%
process.time_to_main_ms444.84 ± (444.24 - 445.45) ms444.69 ± (444.06 - 445.31) ms-0.0%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed58.88 ± (58.73 - 59.03) MB58.98 ± (58.83 - 59.13) MB+0.2%✅⬆️
runtime.dotnet.threads.count30 ± (30 - 30)30 ± (30 - 30)+0.0%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms191.47 ± (191.06 - 191.89) ms190.08 ± (189.68 - 190.48) ms-0.7%
process.time_to_main_ms69.86 ± (69.67 - 70.04) ms69.52 ± (69.34 - 69.71) ms-0.5%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.68 ± (11.66 - 11.70) MB11.74 ± (11.71 - 11.77) MB+0.5%✅⬆️
runtime.dotnet.threads.count18 ± (18 - 18)18 ± (18 - 18)-0.3%
.NET 8 - Bailout
process.internal_duration_ms189.68 ± (189.38 - 189.98) ms189.65 ± (189.43 - 189.88) ms-0.0%
process.time_to_main_ms70.73 ± (70.61 - 70.85) ms70.45 ± (70.36 - 70.54) ms-0.4%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.78 ± (11.75 - 11.81) MB11.78 ± (11.76 - 11.81) MB+0.1%✅⬆️
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (19 - 19)-0.0%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms368.36 ± (366.77 - 369.95) ms363.99 ± (362.61 - 365.37) ms-1.2%
process.time_to_main_ms430.18 ± (429.55 - 430.82) ms428.79 ± (427.98 - 429.59) ms-0.3%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed47.93 ± (47.90 - 47.97) MB47.92 ± (47.88 - 47.95) MB-0.0%
runtime.dotnet.threads.count29 ± (29 - 29)29 ± (29 - 29)+0.6%✅⬆️
Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts
FakeDbCommand (.NET Framework 4.8)
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (74ms)  : 71, 76
    master - mean (74ms)  : 72, 76

    section Bailout
    This PR (7968) - mean (78ms)  : 76, 79
    master - mean (78ms)  : 76, 79

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (1,054ms)  : 987, 1120
    master - mean (1,056ms)  : 988, 1125

Loading
FakeDbCommand (.NET Core 3.1)
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (114ms)  : 110, 117
    master - mean (114ms)  : 111, 118

    section Bailout
    This PR (7968) - mean (116ms)  : 113, 119
    master - mean (117ms)  : 113, 121

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (752ms)  : 704, 799
    master - mean (744ms)  : 715, 772

Loading
FakeDbCommand (.NET 6)
gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (101ms)  : 98, 104
    master - mean (100ms)  : 97, 104

    section Bailout
    This PR (7968) - mean (101ms)  : 100, 103
    master - mean (102ms)  : 99, 104

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (691ms)  : 671, 711
    master - mean (692ms)  : 669, 714

Loading
FakeDbCommand (.NET 8)
gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (101ms)  : 98, 103
    master - mean (100ms)  : 96, 103

    section Bailout
    This PR (7968) - mean (102ms)  : 99, 105
    master - mean (100ms)  : 98, 103

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (660ms)  : 637, 683
    master - mean (661ms)  : 636, 685

Loading
HttpMessageHandler (.NET Framework 4.8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (195ms)  : 190, 199
    master - mean (194ms)  : 189, 199

    section Bailout
    This PR (7968) - mean (198ms)  : 194, 201
    master - mean (197ms)  : 194, 201

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (1,132ms)  : 1055, 1209
    master - mean (1,126ms)  : 1058, 1194

Loading
HttpMessageHandler (.NET Core 3.1)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (278ms)  : 272, 284
    master - mean (277ms)  : 271, 283

    section Bailout
    This PR (7968) - mean (277ms)  : 273, 282
    master - mean (278ms)  : 273, 284

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (909ms)  : 856, 962
    master - mean (908ms)  : 860, 956

Loading
HttpMessageHandler (.NET 6)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (270ms)  : 266, 275
    master - mean (271ms)  : 266, 276

    section Bailout
    This PR (7968) - mean (270ms)  : 266, 273
    master - mean (270ms)  : 267, 274

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (881ms)  : 847, 914
    master - mean (887ms)  : 841, 934

Loading
HttpMessageHandler (.NET 8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (7968) - mean (269ms)  : 263, 276
    master - mean (271ms)  : 262, 280

    section Bailout
    This PR (7968) - mean (270ms)  : 267, 273
    master - mean (270ms)  : 266, 274

    section CallTarget+Inlining+NGEN
    This PR (7968) - mean (825ms)  : 806, 843
    master - mean (830ms)  : 809, 851

Loading

@robcarlan-datadog robcarlan-datadog marked this pull request as ready for review December 17, 2025 21:07
@robcarlan-datadog robcarlan-datadog requested review from a team as code owners December 17, 2025 21:07
Copy link
Member

@andrewlock andrewlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

Copy link
Collaborator

@NachoEchevarria NachoEchevarria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@robcarlan-datadog robcarlan-datadog merged commit 3045580 into master Dec 18, 2025
152 checks passed
@robcarlan-datadog robcarlan-datadog deleted the rob.carlan/DSMON-1184-flush-race-condition-fix branch December 18, 2025 16:50
@github-actions github-actions bot added this to the vNext-v3 milestone Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:data-streams-monitoring area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) identified-by:telemetry type:bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants