Skip to content

[Unhandled Exception]: -mt mode intermittent crash "Results for configuration X were not retrieved from node Y" #13188

@JanProvaznik

Description

@JanProvaznik

Issue Description

When building with MSBuild's -mt (multithreaded) mode enabled, builds intermittently crash with an internal MSBuild error indicating that results for a project configuration were not transferred between nodes correctly. This appears to be a race condition in the scheduler when a project is moved from one node to another.

The issue is intermittent - it does not reproduce consistently on local machines but falied on VMR CI

Steps to Reproduce

  1. Build the dotnet/dotnet VMR (or F# repo) with -mt mode enabled

  2. patch build scripts to use -mt mode

  3. Run build:

build.cmd -pack -noVisualStudio -ci -configuration Release -bl

Note: The issue is intermittent and may require multiple runs or high-parallelism environment to reproduce. CI machines with many cores reproduce it more frequently than local developer machines.

binlog:
Build.Microsoft.FSharp.Compiler.sln.binlog.zip

Actual Behavior

Multiple projects fail simultaneously with unhandled exceptions:

##[error]This is an unhandled exception in MSBuild -- PLEASE UPVOTE AN EXISTING ISSUE OR FILE A NEW ONE AT https://aka.ms/msbuild/unhandled
    Microsoft.Build.Framework.InternalErrorException: MSB0001: Internal MSBuild Error: Results for configuration 27 were not retrieved from node 8
   at Microsoft.Build.Shared.ErrorUtilities.ThrowInternalError(String message, Object[] args)
   at Microsoft.Build.Shared.ErrorUtilities.VerifyThrow(Boolean condition, String unformattedMessage, Int32 arg0, Int32 arg1)
   at Microsoft.Build.BackEnd.RequestBuilder.BuildProject()
   at Microsoft.Build.BackEnd.RequestBuilder.RequestThreadProc(Boolean setThreadParameters)

Multiple projects fail with similar errors (configurations 25, 26, 27 all referencing node 8), suggesting a systemic issue when that node becomes unavailable or slow to respond.

Analysis

The failure occurs at RequestBuilder.cs lines 1197-1208:

if ((_requestEntry.RequestConfiguration.ResultsNodeId != Scheduler.InvalidNodeId) &&
    (_requestEntry.RequestConfiguration.ResultsNodeId != _componentHost.BuildParameters.NodeId))
{
    // Block waiting for results transfer
    await BlockOnTargetInProgress(Microsoft.Build.BackEnd.BuildRequest.InvalidGlobalRequestId, null);

    // Assertion fails here - ResultsNodeId was not updated
    ErrorUtilities.VerifyThrow(
        _requestEntry.RequestConfiguration.ResultsNodeId == _componentHost.BuildParameters.NodeId,
        "Results for configuration {0} were not retrieved from node {1}",
        _requestEntry.RequestConfiguration.ConfigurationId,
        _requestEntry.RequestConfiguration.ResultsNodeId);
}

Hypothesis:

  • A project configuration was originally built on node 8
  • The scheduler moves it to run on a different node
  • BlockOnTargetInProgress is called to wait for results transfer
  • The await completes but ResultsNodeId was not updated to the current node
  • The assertion fails

This suggests either:

  • BlockOnTargetInProgress returns before the transfer is actually complete
  • The results transfer message is lost or not processed
  • ResultsNodeId update has a race with the completion signal

Versions & Configurations

  • MSBuild: 18.4 netsdk11.0.100-p1 (via dotnet/dotnet VMR)
  • OS: Windows Server (Azure DevOps hosted agents)
  • Architecture: x64
  • Build tool: dotnet CLI with -mt flag enabled
  • Parallelism: High (CI machine with many cores)

Local reproduction attempted: Built F# repo locally with -mt twice, both succeeded. The issue appears timing/load dependent.

Metadata

Metadata

Assignees

Labels

Priority:2Work that is important, but not critical for the release

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions