Issue Description
When building with MSBuild's -mt (multithreaded) mode enabled, builds intermittently crash with an internal MSBuild error indicating that results for a project configuration were not transferred between nodes correctly. This appears to be a race condition in the scheduler when a project is moved from one node to another.
The issue is intermittent - it does not reproduce consistently on local machines but falied on VMR CI
Steps to Reproduce
-
Build the dotnet/dotnet VMR (or F# repo) with -mt mode enabled
-
patch build scripts to use -mt mode
-
Run build:
build.cmd -pack -noVisualStudio -ci -configuration Release -bl
Note: The issue is intermittent and may require multiple runs or high-parallelism environment to reproduce. CI machines with many cores reproduce it more frequently than local developer machines.
binlog:
Build.Microsoft.FSharp.Compiler.sln.binlog.zip
Actual Behavior
Multiple projects fail simultaneously with unhandled exceptions:
##[error]This is an unhandled exception in MSBuild -- PLEASE UPVOTE AN EXISTING ISSUE OR FILE A NEW ONE AT https://aka.ms/msbuild/unhandled
Microsoft.Build.Framework.InternalErrorException: MSB0001: Internal MSBuild Error: Results for configuration 27 were not retrieved from node 8
at Microsoft.Build.Shared.ErrorUtilities.ThrowInternalError(String message, Object[] args)
at Microsoft.Build.Shared.ErrorUtilities.VerifyThrow(Boolean condition, String unformattedMessage, Int32 arg0, Int32 arg1)
at Microsoft.Build.BackEnd.RequestBuilder.BuildProject()
at Microsoft.Build.BackEnd.RequestBuilder.RequestThreadProc(Boolean setThreadParameters)
Multiple projects fail with similar errors (configurations 25, 26, 27 all referencing node 8), suggesting a systemic issue when that node becomes unavailable or slow to respond.
Analysis
The failure occurs at RequestBuilder.cs lines 1197-1208:
if ((_requestEntry.RequestConfiguration.ResultsNodeId != Scheduler.InvalidNodeId) &&
(_requestEntry.RequestConfiguration.ResultsNodeId != _componentHost.BuildParameters.NodeId))
{
// Block waiting for results transfer
await BlockOnTargetInProgress(Microsoft.Build.BackEnd.BuildRequest.InvalidGlobalRequestId, null);
// Assertion fails here - ResultsNodeId was not updated
ErrorUtilities.VerifyThrow(
_requestEntry.RequestConfiguration.ResultsNodeId == _componentHost.BuildParameters.NodeId,
"Results for configuration {0} were not retrieved from node {1}",
_requestEntry.RequestConfiguration.ConfigurationId,
_requestEntry.RequestConfiguration.ResultsNodeId);
}
Hypothesis:
- A project configuration was originally built on node 8
- The scheduler moves it to run on a different node
BlockOnTargetInProgress is called to wait for results transfer
- The await completes but
ResultsNodeId was not updated to the current node
- The assertion fails
This suggests either:
BlockOnTargetInProgress returns before the transfer is actually complete
- The results transfer message is lost or not processed
ResultsNodeId update has a race with the completion signal
Versions & Configurations
- MSBuild: 18.4 netsdk11.0.100-p1 (via dotnet/dotnet VMR)
- OS: Windows Server (Azure DevOps hosted agents)
- Architecture: x64
- Build tool: dotnet CLI with
-mt flag enabled
- Parallelism: High (CI machine with many cores)
Local reproduction attempted: Built F# repo locally with -mt twice, both succeeded. The issue appears timing/load dependent.
Issue Description
When building with MSBuild's
-mt(multithreaded) mode enabled, builds intermittently crash with an internal MSBuild error indicating that results for a project configuration were not transferred between nodes correctly. This appears to be a race condition in the scheduler when a project is moved from one node to another.The issue is intermittent - it does not reproduce consistently on local machines but falied on VMR CI
Steps to Reproduce
Build the dotnet/dotnet VMR (or F# repo) with
-mtmode enabledpatch build scripts to use -mt mode
Run build:
Note: The issue is intermittent and may require multiple runs or high-parallelism environment to reproduce. CI machines with many cores reproduce it more frequently than local developer machines.
binlog:
Build.Microsoft.FSharp.Compiler.sln.binlog.zip
Actual Behavior
Multiple projects fail simultaneously with unhandled exceptions:
Multiple projects fail with similar errors (configurations 25, 26, 27 all referencing node 8), suggesting a systemic issue when that node becomes unavailable or slow to respond.
Analysis
The failure occurs at
RequestBuilder.cslines 1197-1208:Hypothesis:
BlockOnTargetInProgressis called to wait for results transferResultsNodeIdwas not updated to the current nodeThis suggests either:
BlockOnTargetInProgressreturns before the transfer is actually completeResultsNodeIdupdate has a race with the completion signalVersions & Configurations
-mtflag enabledLocal reproduction attempted: Built F# repo locally with
-mttwice, both succeeded. The issue appears timing/load dependent.