Fix TraceEngine file contention deadlock in multithreaded mode#13446
Merged
JanProvaznik merged 7 commits intodotnet:mainfrom Apr 1, 2026
Merged
Fix TraceEngine file contention deadlock in multithreaded mode#13446JanProvaznik merged 7 commits intodotnet:mainfrom
JanProvaznik merged 7 commits intodotnet:mainfrom
Conversation
In multithreaded (-mt) mode, multiple in-proc node engines run in the
same process. Each engine's TraceEngine() method wrote to the same
EngineTrace_{PID}.txt file with FileShare.Read. When two engines traced
concurrently, the second got an IOException, which propagated through
QueueAction's error handler and fatally crashed the ActionBlock work
queue. All subsequent Post() calls were silently rejected, causing
request completions to be dropped and the scheduler to deadlock.
Fix:
- Use per-node trace files: EngineTrace_{PID}_{NodeId}.txt
- Catch IOException in TraceEngine so trace failures can never crash
the build engine
Added regression test that enables MSBUILDDEBUGSCHEDULER=1 during a
multithreaded build with parallel child projects.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In -mt mode, multiple in-proc BuildRequestEngine instances share one process.
Each engine's TraceEngine() used lock(this), which only serialized within that
instance. Concurrent FileStream opens to EngineTrace_{PID}.txt with
FileShare.Read caused IOException, which fatally crashed the ActionBlock work
queue and caused a scheduler deadlock.
Fix:
- Replace lock(this) with a static lock (s_traceLock) so all engines in the
same process serialize trace writes to a single file.
- Catch non-critical exceptions in TraceEngine, matching the defensive pattern
already used by Scheduler.TraceScheduler.
- Add regression test with Timeout=30s to fail fast on deadlock.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a multithreaded (-mt) in-proc deadlock path caused by concurrent TraceEngine() writes to a shared per-process trace file, which could fault the ActionBlock work queue and stall the scheduler.
Changes:
- Serialize
BuildRequestEngine.TraceEngine()file writes across all in-proc engine instances via a static lock. - Make
TraceEngine()resilient by swallowing non-critical exceptions during trace emission. - Add a regression unit test that runs an
-mtbuild with debug scheduler tracing enabled to ensure the build completes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/Build/BackEnd/Components/BuildRequestEngine/BuildRequestEngine.cs | Uses a process-wide static lock for engine trace file writes and prevents trace IO failures from crashing the engine. |
| src/Build.UnitTests/BackEnd/BuildManager_Tests.cs | Adds a regression test targeting the previous -mt trace-file contention/deadlock scenario. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
AR-May
approved these changes
Mar 31, 2026
FrameworkDebugUtils.SetDebugPath() is parameterless; it reads from the MSBUILDDEBUGPATH environment variable which is already set on the previous line. The incorrect SetDebugPath(debugPath) call caused CS1501 across all CI legs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a non-critical exception occurs writing the engine trace file, log the failure message and stack trace at Low importance through _nodeLoggingContext (if available) so it appears in diagnostic-level build output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
xUnit only supports Timeout on async tests. Wrap the test body in Task.Run and return async Task. Use fully-qualified Task type to avoid ambiguity with Microsoft.Build.Utilities.Task. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AR-May
reviewed
Mar 31, 2026
src/Build/BackEnd/Components/BuildRequestEngine/BuildRequestEngine.cs
Outdated
Show resolved
Hide resolved
…gine.cs Co-authored-by: AR-May <67507805+AR-May@users.noreply.github.com>
dfederm
pushed a commit
to dfederm/msbuild
that referenced
this pull request
Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In multithreaded (
-mt) mode, multiple in-procBuildRequestEngineinstances run in the same process. Each engine'sTraceEngine()method wrote to the sameEngineTrace_{PID}.txtfile usinglock(this), which only serialized within that single instance. When two engines traced concurrently, the secondFileStreamopen (withFileShare.Read) threwIOException, which propagated throughQueueAction's error handler and fatally crashed theActionBlockwork queue. All subsequentPost()calls were silently rejected, causing request completions to be dropped and the scheduler to deadlock.Root Cause
lock(this)only serializes within ONE engine instance. Cross-instance contention on the shared file was unprotected.Fix
lock(this)with a static lock (s_traceLock) so all engine instances in the same process serialize trace writes to a single file. This keeps one unified trace file (better debugging UX than per-node files).TraceEngine()usingcatch (Exception e) when (!ExceptionHandling.IsCriticalException(e)), matching the defensive pattern already used byScheduler.TraceScheduler(Scheduler.cs:2683). Trace file failures must never crash the build engine.Testing
Added regression test
MultiThreadedBuild_WithDebugSchedulerTracing_DoesNotDeadlock:MSBUILDDEBUGSCHEDULER=1during a multithreaded build with 4 parallel child projects[Fact(Timeout = 30_000)]so a deadlock regression fails fast instead of hanging CIRed-green validated locally: without the fix, the test aborts with
IOException: The process cannot access the file 'EngineTrace_{PID}.txt' because it is being used by another processfollowed byInternalErrorException: Failure during engine shutdown. With the fix, the test passes in ~400ms.