-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Worker Nodes should attempt to detect and correct over-provisioning #13218
Copy link
Copy link
Closed
Copy link
Labels
Area: ApplicationIssues with `msbuild.exe` or the CLI experience.Issues with `msbuild.exe` or the CLI experience.Area: EngineIssues impacting the core execution of targets and tasks.Issues impacting the core execution of targets and tasks.Area: IPCIssues concerning how MSBuild communicates between processes, including serialization.Issues concerning how MSBuild communicates between processes, including serialization.Area: Serverneeds-designRequires discussion with the dev team before attempting a fix.Requires discussion with the dev team before attempting a fix.
Metadata
Metadata
Labels
Area: ApplicationIssues with `msbuild.exe` or the CLI experience.Issues with `msbuild.exe` or the CLI experience.Area: EngineIssues impacting the core execution of targets and tasks.Issues impacting the core execution of targets and tasks.Area: IPCIssues concerning how MSBuild communicates between processes, including serialization.Issues concerning how MSBuild communicates between processes, including serialization.Area: Serverneeds-designRequires discussion with the dev team before attempting a fix.Requires discussion with the dev team before attempting a fix.
Type
Fields
Give feedbackNo fields configured for Feature.
Summary
When multiple instances of MSBuild are run concurrently from the command line, each instance tries to create and claim a number of worker nodes up to the
/mlimit (which is NUMPROCS fordotnetby default).This can lead to entirely too many nodes existing on the host machine, lingering for the configured timeout, which is ~15m by default.
These nodes suck up machine resources and look bad - the engine should work to minimize the number of active nodes past an expected threshold to preserve system resources.
Background and Motivation
As more tools, like DevKit, LLM Agents, and human users, delegate their build and inner-loop experiences to the
dotnetCLI, it becomes more and more likely that collisions like the above will occur. I was looking at node trace logs this morning where more than 25 worker nodes were lingering on the machine. Managing these nodes is a pain for users of all kinds - you either have to carefully sequence your build operations, apply/nodereuse:falsewhich slows down subsequent builds, or dangerously and likely incompletely terminate processes.Proposed Feature
Instead, the nodes themselves should clean themselves up.
When a build completes and an out of proc node marks itself idle, it should attempt to handshake with its siblings. If there are a number of active nodes of the same type equal to a threshold, the idling node should terminate.
Proposed thresholds:
Alternative Designs
No response