Fix multiple termination signals on .NET 10+#8374
Conversation
- Add DD_LIFETIME_SHUTDOWN_DELAY_MS env var to sample app so shutdown hook takes longer than the 100ms interval between signals - Unskip SigtermTriggersShutdownOnce_WhenRepeated test - Widen test from NET10_0_OR_GREATER to NET8_0_OR_GREATER - Add net8.0 and net9.0 TFMs to sample project The test now reliably fails, confirming the bug: the 2nd SIGTERM kills the process before shutdown hooks complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes to prevent shutdown hooks from being interrupted by duplicate termination signals: 1. Don't dispose signal registrations during RunShutdownTasks. They must stay alive so duplicate signals arriving mid-shutdown are handled by our handler (which cancels them), not by the OS default (which kills). 2. Make RunShutdownTasks wait for in-progress shutdown instead of short-circuiting. When ProcessExit fires after the signal handler already started shutdown, it now blocks until hooks complete rather than returning immediately (which let the runtime tear down early). 3. Rework TerminationSignalHandler: first signal runs shutdown tasks synchronously without canceling (so the OS terminates the process after hooks complete, giving exit code 143). Subsequent signals cancel and wait, keeping the process alive until shutdown finishes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d004f02890
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| context.Cancel = true; // Keep the process alive long enough to take the managed shutdown path. | ||
| } |
There was a problem hiding this comment.
Stop swallowing follow-up SIGTERM after shutdown starts
In TerminationSignalHandler, duplicate signals now always set context.Cancel = true, and RunShutdownTasks no longer disposes the signal registrations, so this handler remains active after shutdown completes. If another registered handler cancels the first SIGTERM (for graceful host shutdown), _terminationExitInitiated stays set and every subsequent SIGTERM is canceled by this branch, which prevents expected "second SIGTERM forces termination" behavior and can leave the process only killable via SIGKILL.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
That... seems reasonable 😅 I think we can just get rid of the context.Cancel=true entirely 🤔 I don't see a reason we need it tbh.
FWIW, the other AI (which was adamant about the code previously) now agrees with this one 🙄
Yes, that analysis is correct. If another handler (e.g., the ASP.NET host) cancels the first SIGTERM to do its own graceful shutdown, then every subsequent SIGTERM hits the duplicate branch, gets canceled, and the process becomes unkillable except via SIGKILL. That's worse than the original behavior.
Removing context.Cancel entirely from TerminationSignalHandler is the right call. Our handler's job is just to run shutdown hooks — it shouldn't be in the business of deciding whether the process lives or dies. Let the OS default (or other handlers) handle termination semantics.
WDYT @tonyredondo, go with removing it entirely?
There was a problem hiding this comment.
as we talked, we don't need to set the Cancel property at all.
BenchmarksBenchmark execution time: 2026-03-26 10:30:00 Comparing candidate commit 51c3ca4 in PR branch Found 7 performance improvements and 8 performance regressions! Performance is the same for 258 metrics, 15 unstable metrics.
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8374) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (73ms) : 70, 75
master - mean (72ms) : 70, 75
section Bailout
This PR (8374) - mean (77ms) : 76, 79
master - mean (76ms) : 75, 77
section CallTarget+Inlining+NGEN
This PR (8374) - mean (1,081ms) : 1033, 1129
master - mean (1,077ms) : 1028, 1127
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (113ms) : 109, 117
master - mean (113ms) : 108, 117
section Bailout
This PR (8374) - mean (115ms) : 113, 118
master - mean (114ms) : 111, 117
section CallTarget+Inlining+NGEN
This PR (8374) - mean (799ms) : 779, 818
master - mean (796ms) : 778, 813
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (100ms) : 97, 103
master - mean (99ms) : 95, 103
section Bailout
This PR (8374) - mean (100ms) : 98, 103
master - mean (100ms) : 97, 102
section CallTarget+Inlining+NGEN
This PR (8374) - mean (947ms) : crit, 909, 985
master - mean (786ms) : 764, 808
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (99ms) : 96, 101
master - mean (99ms) : 94, 103
section Bailout
This PR (8374) - mean (99ms) : 97, 102
master - mean (99ms) : 97, 101
section CallTarget+Inlining+NGEN
This PR (8374) - mean (829ms) : crit, 799, 859
master - mean (690ms) : 670, 710
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (194ms) : 189, 199
master - mean (194ms) : 190, 198
section Bailout
This PR (8374) - mean (197ms) : 195, 200
master - mean (197ms) : 195, 199
section CallTarget+Inlining+NGEN
This PR (8374) - mean (1,167ms) : 1115, 1220
master - mean (1,160ms) : 1104, 1215
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (278ms) : 272, 283
master - mean (277ms) : 272, 281
section Bailout
This PR (8374) - mean (279ms) : 275, 284
master - mean (278ms) : 273, 283
section CallTarget+Inlining+NGEN
This PR (8374) - mean (955ms) : 927, 984
master - mean (949ms) : 924, 975
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (272ms) : 264, 279
master - mean (272ms) : 267, 277
section Bailout
This PR (8374) - mean (270ms) : 267, 273
master - mean (272ms) : 267, 277
section CallTarget+Inlining+NGEN
This PR (8374) - mean (1,156ms) : crit, 1110, 1202
master - mean (973ms) : 949, 996
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8374) - mean (269ms) : 265, 274
master - mean (271ms) : 266, 276
section Bailout
This PR (8374) - mean (270ms) : 267, 273
master - mean (269ms) : 265, 274
section CallTarget+Inlining+NGEN
This PR (8374) - mean (1,040ms) : crit, 994, 1087
master - mean (856ms) : 824, 887
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Summary of changes
TerminationSignalTestsTerminationSignalTestson more TFMsReason for change
As part of the recent runtime metrics work (#8265), we skipped the
TerminationSignalTests, as we found they were flaky with the new changes on .NET 10 (and failed outright if we properlyawait-ed for dogstatsd to flush on exit). Ultimately, this comes down to essentially a race condition in the new termination behaviour on .NET 10.Previously, we had this for .NET 10:
This doesn't affect <.NET 10, because POSIX is handled by the runtime, and they queue handling of the subsequent signals while the first one is ongoing.
Implementation details
The fix is essentially two changes:
This only affects the .NET 10 path, as the POSIX handlers only fire explicitly on those paths.
As an aside, this allows enabling the statsd flush on shutdown.
Test coverage
I unskipped the currently-skipped test, and expanded the
TerminationSignalTeststo cover .NET 8+, instead of just .NET 10, so that we know we're actually getting the same behaviour in both cases. (.NET 8 was a somewhat arbitrary choice, we could expand it further if we wanted to, but doesn't necessarily seem worth it to me).I then tested that in CI and saw it fail, before making the fix.
Other details
There's one "interesting" change of behaviour in terms of
Cancel. 🤖 is adamant we Shouldn't cancel the default signal handling on the first handler, because we want the "default signal handling" to kick in after we've run our shutdown hooks. I'm not entirely sure if that's true or not tbh, but it doesn't seem to affect the tests one way or another so I'm guessing, meh? 🤷♂️