Skip to content

Fix race conditions in AssemblyTaskFactory class.#13422

Merged
AR-May merged 2 commits intodotnet:mainfrom
AR-May:make-telemetry-thread-safe-2
Mar 24, 2026
Merged

Fix race conditions in AssemblyTaskFactory class.#13422
AR-May merged 2 commits intodotnet:mainfrom
AR-May:make-telemetry-thread-safe-2

Conversation

@AR-May
Copy link
Copy Markdown
Member

@AR-May AR-May commented Mar 20, 2026

Fixes #12867

Context

Building large repos (e.g., Roslyn) in multi-threaded mode crashes intermittently:

error MSB4061: The "CombineTargetFrameworkInfoProperties" task could not be instantiated...
Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state.

Root Cause
AssemblyTaskFactory instances are shared across thread nodes — they're cached in TaskRegistry._taskFactoryWrapperInstance. TaskRegistry itself is shared across thread nodes by design. In mt builds, multiple projects call CreateTaskInstance() on the same AssemblyTaskFactory instance concurrently from different threads. This exposed two thread-safety issues:

  1. _taskLoggingContext field race (crash-causing)

Race condition example: When Thread A (Project A) and Thread B (Project B) concurrently call CreateTaskInstance():

  1. Thread A sets _taskLoggingContext = contextA
  2. Thread B overwrites _taskLoggingContext = contextB
  3. Thread A reads _taskLoggingContext for TrackTaskSubclassing → gets contextB → writes to Project B's ProjectTelemetry
  4. Thread B also writes to Project B's ProjectTelemetry
  5. Two threads mutate the same Dictionary<string, int> → crash

The ProjectTelemetry in turn is not supposed to be accessed from multiple threads - only one thread node should build a project. It does not need to be thread-safe.

_taskLoggingContext field existed so that ErrorLoggingDelegate (a named method passed as a callback to TaskLoader.CreateTask()) could access the logging context, but it is not really required. For mt mode support, there should not be mutable fields in AssemblyTaskFactory that relate to a specific task.

  1. _tasksAndAppDomains dictionary (in .NET Framework)

Dictionary<ITask, AppDomain> is not safe for concurrent structural mutations. While each thread operates on different keys (unique ITask instances), concurrent add/remove on different keys still corrupts a plain Dictionary.

Changes Made

  • Removed _taskLoggingContext field entirely.
  • Changed _tasksAndAppDomains from Dictionary<ITask, AppDomain> to ConcurrentDictionary<ITask, AppDomain>
    Note that lock contention should be negligible, as each thread would add and clean-up its own set of keys.

Testing

Existing unit tests

@AR-May
Copy link
Copy Markdown
Member Author

AR-May commented Mar 20, 2026

It seems the original issue it is not reproducible for us, most probably because of the small number of enlightened tasks compared to the prototype where it was detected, which had more enlightened tasks.

@AR-May AR-May marked this pull request as ready for review March 20, 2026 14:43
Copilot AI review requested due to automatic review settings March 20, 2026 14:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses intermittent multi-threaded build crashes by making AssemblyTaskFactory safe to use concurrently when the same factory instance is shared across nodes/projects.

Changes:

  • Removed per-task mutable state (_taskLoggingContext) and switched telemetry logging to use the per-call taskLoggingContext parameter.
  • Replaced the .NET Framework AppDomain tracking map with a ConcurrentDictionary and updated cleanup to use TryRemove.

@AR-May AR-May self-assigned this Mar 20, 2026
Copy link
Copy Markdown
Member

@JanProvaznik JanProvaznik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like necessary fixes for 2 clear issues in this class now that it's shared across thread worker nodes

@AR-May AR-May changed the title Fix race conditions in AssemblyTaskFactor class. Fix race conditions in AssemblyTaskFactory class. Mar 24, 2026
@AR-May AR-May merged commit 1b923e2 into dotnet:main Mar 24, 2026
14 checks passed
dfederm pushed a commit to dfederm/msbuild that referenced this pull request Apr 9, 2026
Fixes dotnet#12867

### Context
Building large repos (e.g., Roslyn) in multi-threaded mode crashes
intermittently:

error MSB4061: The "CombineTargetFrameworkInfoProperties" task could not
be instantiated...
Operations that change non-concurrent collections must have exclusive
access. A concurrent update was performed on this collection and
corrupted its state.

Root Cause
`AssemblyTaskFactory` instances are shared across thread nodes — they're
cached in `TaskRegistry._taskFactoryWrapperInstance`. `TaskRegistry`
itself is shared across thread nodes by design. In mt builds, multiple
projects call `CreateTaskInstance()` on the same `AssemblyTaskFactory`
instance concurrently from different threads. This exposed two
thread-safety issues:

 1. `_taskLoggingContext` field race (crash-causing)

Race condition example: When Thread A (Project A) and Thread B (Project
B) concurrently call `CreateTaskInstance()`:
   1. Thread A sets `_taskLoggingContext = contextA`
   2. Thread B overwrites `_taskLoggingContext = contextB`
3. Thread A reads `_taskLoggingContext` for `TrackTaskSubclassing` →
gets `contextB` → writes to Project B's ProjectTelemetry
   4. Thread B also writes to Project B's ProjectTelemetry
   5. Two threads mutate the same Dictionary<string, int> → crash

The `ProjectTelemetry` in turn is not supposed to be accessed from
multiple threads - only one thread node should build a project. It does
not need to be thread-safe.

`_taskLoggingContext` field existed so that `ErrorLoggingDelegate` (a
named method passed as a callback to `TaskLoader.CreateTask()`) could
access the logging context, but it is not really required. For mt mode
support, there should not be mutable fields in `AssemblyTaskFactory`
that relate to a specific task.

  2. `_tasksAndAppDomains` dictionary (in .NET Framework)

`Dictionary<ITask, AppDomain>` is not safe for concurrent structural
mutations. While each thread operates on different keys (unique ITask
instances), concurrent add/remove on different keys still corrupts a
plain Dictionary.

### Changes Made
- Removed `_taskLoggingContext` field entirely. 
- Changed `_tasksAndAppDomains` from `Dictionary<ITask, AppDomain>` to
`ConcurrentDictionary<ITask, AppDomain>`
Note that lock contention should be negligible, as each thread would add
and clean-up its own set of keys.

### Testing
Existing unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ProjectTelemetry subclass tracking is not thread safe

3 participants