Skip to content

Conversation

@VSadov
Copy link
Member

@VSadov VSadov commented Dec 9, 2025

Fixes: #121608
Backport of #121887

The ThreadPool in rare cases allows a scenario when an enqueued workitem is not guaranteed to be executed unless more workitems are enqueued. In some scenarios execution of a particular workitem may be necessary before more work is enqueued, thus leading to a deadlock.
This is a subtle regression introduced by a change in enqueuer/worker handshake algorithm.

The same pattern is used in 2 other ThreadPool-like internal features in addition to the ThreadPool.

Customer Impact

  • Customer reported
  • Found internally

Regression

  • Yes
  • No

Testing

Standard test pass for deterministic regressions.
A targeted stress application that demonstrates the issue.
(no fix: hangs within 1-2 minutes, with the fix: runs for 20+ min. on the same system)

Risk

Low. This is a revert to the preexisting algorithm for the enqueuer/worker handshake. (in all 3 places)

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

VSadov and others added 3 commits December 8, 2025 17:05
@VSadov
Copy link
Member Author

VSadov commented Dec 9, 2025

/backport to release/9.0-staging

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

Started backporting to release/9.0-staging (link to workflow run)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

@VSadov backporting to release/9.0-staging failed, the patch most likely resulted in conflicts. Please backport manually!

git am output
$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Reliability fixes for ThreadPool
Using index info to reconstruct a base tree...
M	src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
CONFLICT (content): Merge conflict in src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Reliability fixes for ThreadPool
Error: The process '/usr/bin/git' failed with exit code 128

Link to workflow output

@VSadov VSadov requested a review from stephentoub December 9, 2025 01:39
@VSadov VSadov marked this pull request as ready for review December 9, 2025 01:39
Copilot AI review requested due to automatic review settings December 9, 2025 01:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This backport fixes a subtle but critical reliability regression in the ThreadPool's enqueuer/worker handshake algorithm that could lead to deadlocks in rare scenarios. The fix reverts from a complex three-state machine (NotScheduled/Determining/Scheduled) to a simpler binary flag-based approach (_hasOutstandingThreadRequest) across three ThreadPool-like components.

Key changes:

  • Replaces state machine with a simpler flag that guarantees at least one worker will check work queues after being set
  • Ensures workers always request another worker when dequeuing items to prevent deadlock scenarios where work items depend on other queued work
  • Applies the same pattern consistently to ThreadPoolWorkQueue, ThreadPoolTypedWorkItemQueue, and SocketAsyncEngine

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs Reverts ThreadPoolWorkQueue and ThreadPoolTypedWorkItemQueue from complex state machine to simpler flag-based approach; simplifies Dispatch logic to unconditionally request workers when processing items
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs Applies same flag-based worker request pattern to SocketAsyncEngine event processing to maintain consistency with ThreadPool reliability fixes

…adPoolWorkQueue.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@agocke agocke added the Servicing-consider Issue for next servicing release review label Dec 9, 2025
@agocke agocke self-requested a review December 10, 2025 02:13
@agocke agocke added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Dec 10, 2025
@rbhanda rbhanda added this to the 10.0.2 milestone Dec 10, 2025
Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holding until we verify the correctness

Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@agocke
Copy link
Member

agocke commented Dec 10, 2025

/ba-g ios failure is unrelated

@agocke agocke enabled auto-merge (squash) December 10, 2025 18:22
@agocke agocke merged commit 0fe11e3 into dotnet:release/10.0 Dec 10, 2025
142 of 144 checks passed
@VSadov
Copy link
Member Author

VSadov commented Dec 10, 2025

Thanks!

@VSadov VSadov deleted the port121887to10 branch December 10, 2025 18:39
@github-actions github-actions bot locked and limited conversation to collaborators Jan 10, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Threading Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants