Skip to content

Add diagnostic data to crash/hang telemetry and move null-Project check after RetrieveFromCache#13332

Merged
YuliiaKovalova merged 7 commits intomainfrom
dev/crash-telemetry-diagnostics-project-null
Mar 6, 2026
Merged

Add diagnostic data to crash/hang telemetry and move null-Project check after RetrieveFromCache#13332
YuliiaKovalova merged 7 commits intomainfrom
dev/crash-telemetry-diagnostics-project-null

Conversation

@YuliiaKovalova
Copy link
Copy Markdown
Member

@YuliiaKovalova YuliiaKovalova commented Mar 5, 2026

Root cause:

In VS scenarios with project cache plugins, HandleBuildResultAsync in ProjectCacheService crashes with InternalErrorException('Project unexpectedly null') when a build result arrives from an out-of-proc node for a configuration whose ProjectInstance was never loaded locally.

The VerifyThrowInternalNull assertion fires before RetrieveFromCache is called, so configurations that were cached to disk never get a chance to restore their ProjectInstance. This causes false build failures and can lead to EndBuild hangs when _scheduler.ReportResult is skipped due to the crash.

Changes:

  • ProjectCacheService: Move VerifyThrowInternalNull after RetrieveFromCache
  • CrashTelemetry: Add IsStandaloneExecution, MaxNodeCount, ActiveNodeCount, SubmissionCount properties for crash/hang diagnostics
  • CrashTelemetryRecorder: Pass new diagnostic properties through all paths
  • BuildManager: Report build state (node count, submission count, standalone flag) in crash telemetry; include submission:config ID mapping in hang diagnostic dump file
  • XMake: Pass isStandaloneExecution=true for CLI crash telemetry

…ck after RetrieveFromCache

Fix for StackHash 2B72B3348620C2E9CF25384A59F497EE4B2D821600C3868D04BD9080EDBB714C

Root cause: In VS scenarios with project cache plugins, HandleBuildResultAsync
in ProjectCacheService crashes with InternalErrorException('Project unexpectedly
null') when a build result arrives from an out-of-proc node for a configuration
whose ProjectInstance was never loaded locally.

The VerifyThrowInternalNull assertion fires before RetrieveFromCache is called,
so configurations that were cached to disk never get a chance to restore their
ProjectInstance. This causes false build failures and can lead to EndBuild hangs
when _scheduler.ReportResult is skipped due to the crash.

Changes:
- ProjectCacheService: Move VerifyThrowInternalNull after RetrieveFromCache
- CrashTelemetry: Add IsStandaloneExecution, MaxNodeCount, ActiveNodeCount,
  SubmissionCount properties for crash/hang diagnostics
- CrashTelemetryRecorder: Pass new diagnostic properties through all paths
- BuildManager: Report build state (node count, submission count, standalone
  flag) in crash telemetry; include submission:config ID mapping in hang
  diagnostic dump file
- XMake: Pass isStandaloneExecution=true for CLI crash telemetry

Telemetry: 4 customers hit this crash in VS 18.5-18.6 building the VS repo.
All internal Microsoft developers. Same StackHash across VS versions.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@YuliiaKovalova YuliiaKovalova force-pushed the dev/crash-telemetry-diagnostics-project-null branch from 8437060 to 9f8d2a7 Compare March 5, 2026 19:04
@YuliiaKovalova YuliiaKovalova enabled auto-merge (squash) March 6, 2026 14:24
@YuliiaKovalova YuliiaKovalova merged commit 6500dd2 into main Mar 6, 2026
10 checks passed
@YuliiaKovalova YuliiaKovalova deleted the dev/crash-telemetry-diagnostics-project-null branch March 6, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants