Skip to content

Fix TOCTOU race in AppDomain::LoadAssembly fast-path#125408

Merged
AaronRobinsonMSFT merged 1 commit intodotnet:mainfrom
AaronRobinsonMSFT:fix/toctou-loadassembly-race
Mar 11, 2026
Merged

Fix TOCTOU race in AppDomain::LoadAssembly fast-path#125408
AaronRobinsonMSFT merged 1 commit intodotnet:mainfrom
AaronRobinsonMSFT:fix/toctou-loadassembly-race

Conversation

@AaronRobinsonMSFT
Copy link
Member

PR #120515 deferred Assembly object creation to fix duplicate ICorProfiler callbacks, making FileLoadLock::m_pAssembly start as nullptr and get populated lazily by the winning thread. However, the fast-path in AppDomain::LoadAssembly was not updated to account for this mutability.

The Bug

The fast-path cached pAssembly = pLock->GetAssembly() (reading nullptr) before checking pLock->GetLoadLevel() >= targetLevel. If another thread completed the load between these two reads, the level check would pass but the code would dereference the stale nullptr, causing an Access Violation (0xC0000005) inside Assembly::ThrowIfError.

This is a classic TOCTOU (Time-of-Check to Time-of-Use) race. It was reported as a rare fatal crash (< 1/100,000) under heavy CPU load when lazy-loaded assemblies (e.g., System.Threading.ThreadPool) are resolved concurrently on multiple threadpool threads.

The Fix

Move the GetAssembly() read inside the fast-path if block, after confirming the load level is sufficient. This mirrors the pattern already used at the end of the slow path (line 2619), where pAssembly is re-read from the lock before use.

PR dotnet#120515 deferred Assembly creation (lazy init), making
FileLoadLock::m_pAssembly mutable. The fast-path in LoadAssembly
cached pAssembly before checking GetLoadLevel(), so a thread could
read nullptr, get preempted while another thread completed the load,
then pass the level check and dereference the stale nullptr.

Re-read pAssembly from the FileLoadLock inside the fast-path block
after the level check passes, ensuring we use the pointer that
corresponds to the observed load level.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke, @elinor-fung
See info in area-owners.md if you want to be subscribed.

@AaronRobinsonMSFT AaronRobinsonMSFT added this to the 11.0.0 milestone Mar 10, 2026
@AaronRobinsonMSFT
Copy link
Member Author

This will need to be ported back to .NET 10 as well.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a TOCTOU race in the AppDomain::LoadAssembly fast-path that could dereference a stale nullptr Assembly* when another thread finishes populating FileLoadLock::m_pAssembly between separate reads of the assembly pointer and the lock’s load level.

Changes:

  • Moves the pLock->GetAssembly() read inside the “already loaded” fast-path block, after confirming GetLoadLevel() >= targetLevel.
  • Re-reads pAssembly from the lock at the end of the slow-path using a fresh local declaration (no longer depending on an outer-scoped cached pointer).

@AaronRobinsonMSFT
Copy link
Member Author

/backport to release/10.0

@github-actions
Copy link
Contributor

Started backporting to release/10.0 (link to workflow run)

@AaronRobinsonMSFT AaronRobinsonMSFT merged commit eb47fb6 into dotnet:main Mar 11, 2026
111 checks passed
@AaronRobinsonMSFT AaronRobinsonMSFT deleted the fix/toctou-loadassembly-race branch March 11, 2026 19:14
@github-project-automation github-project-automation bot moved this to Done in AppModel Mar 11, 2026
JulieLeeMSFT pushed a commit that referenced this pull request Mar 11, 2026
…125424)

Backport of #125408 to release/10.0

/cc @AaronRobinsonMSFT

## Customer Impact

- [x] Customer reported
- [ ] Found internally

The reproduction rate in a real application environment is extremely low
(< 1/100,000) as it requires simultaneous EOF on stdout/stderr, an
unloaded target assembly, and exact OS thread preemption. This is a
reliability issue tht impacted a high value enterprise customer and can
impact other users.

## Regression

- [x] Yes
- [ ] No

Yes in PR #120515.

## Testing

The change here is an obvious TOCTOU issue. The source was updated in a
clearer way to avoid creating locals that are populated long before use.

## Risk

Low. The source change is narrowing the use of a local behind a lock in
a manner that is more local to use. This reflects better engineering
practices and can easily be audited for correctness.

Co-authored-by: Aaron R Robinson <arobins@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants