Skip to content

Add crash/failure telemetry to MSBuild#13270

Merged
YuliiaKovalova merged 2 commits intomainfrom
dev/ykovalova/add_telemetry_to_catch_msbuild_exceptions
Feb 20, 2026
Merged

Add crash/failure telemetry to MSBuild#13270
YuliiaKovalova merged 2 commits intomainfrom
dev/ykovalova/add_telemetry_to_catch_msbuild_exceptions

Conversation

@YuliiaKovalova
Copy link
Copy Markdown
Member

Add CrashTelemetry class that captures rich exception information including exception type, inner exception type, stack trace hash (SHA-256 for bucketing without PII), top stack frame, HResult, exit type classification, criticality flag, MSBuild version, framework name, and host environment.

Crash telemetry is emitted in two places:

  • XMake.Execute() catch blocks for all handled exception types
  • ExceptionHandling.UnhandledExceptionHandler for truly unhandled exceptions

The telemetry is recorded via KnownTelemetry.CrashTelemetry and flushed in the finally block of XMake.Execute() using TelemetryManager and the existing IActivity/ActivitySource infrastructure. All telemetry code is best-effort with catch-all guards to prevent secondary failures during crash handling.

@YuliiaKovalova YuliiaKovalova force-pushed the dev/ykovalova/add_telemetry_to_catch_msbuild_exceptions branch from e5aaf87 to f205d92 Compare February 19, 2026 15:09
Add CrashTelemetry data class and CrashTelemetryRecorder helper that capture
rich exception information: exception type, inner exception type, stack trace
hash (SHA-256 for bucketing without PII), top stack frame, HResult, exit type
classification, criticality flag, MSBuild version, framework name, and host.

CrashTelemetryRecorder centralizes recording and flushing logic used by all
three crash telemetry emission points:

1. MSBuild.exe (XMake.Execute):
   - All catch blocks record crash telemetry via RecordCrashTelemetry
   - FlushCrashTelemetry in the finally block emits via TelemetryManager

2. API mode (BuildManager.EndBuild):
   - Catch block records crash telemetry for shutdown exceptions
   - _threadException (node crashes) is recorded before re-throwing
   - FlushCrashTelemetry emits via TelemetryManager

3. Unhandled exceptions (ExceptionHandling.UnhandledExceptionHandler):
   - RecordAndFlushCrashTelemetry immediately emits since process is dying

All telemetry code is best-effort with catch-all guards to prevent
secondary failures during crash handling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@YuliiaKovalova YuliiaKovalova force-pushed the dev/ykovalova/add_telemetry_to_catch_msbuild_exceptions branch from f205d92 to 5c4a3f1 Compare February 19, 2026 17:00
@YuliiaKovalova YuliiaKovalova marked this pull request as ready for review February 19, 2026 17:09
Copilot AI review requested due to automatic review settings February 19, 2026 17:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive crash and failure telemetry to MSBuild to improve diagnostics and error tracking. The implementation introduces a new CrashTelemetry class that captures rich exception information including exception types, stack trace hashes (SHA-256 for PII-free bucketing), top stack frames, HResult codes, exit type classifications, criticality flags, MSBuild version, framework name, and host environment.

Changes:

  • Adds CrashTelemetry and CrashTelemetryRecorder classes to capture and emit crash telemetry
  • Integrates crash telemetry recording into all exception catch blocks in XMake.Execute() and BuildManager.EndBuild()
  • Adds unhandled exception handler support via ExceptionHandling.UnhandledExceptionHandler

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Framework/Telemetry/CrashTelemetry.cs Core telemetry data class with exception information, stack hashing, and property serialization
src/Framework/Telemetry/CrashTelemetryRecorder.cs Centralized helper for recording and flushing crash telemetry with best-effort error handling
src/Framework/Telemetry/TelemetryConstants.cs Adds "Crash" constant for crash activity naming
src/Framework/Telemetry/KnownTelemetry.cs Adds static CrashTelemetry property for crash telemetry storage
src/MSBuild/XMake.cs Integrates crash telemetry recording in all exception catch blocks and finally block flush
src/Build/BackEnd/BuildManager/BuildManager.cs Adds crash telemetry recording for BuildManager exceptions and thread exceptions
src/Shared/ExceptionHandling.cs Adds crash telemetry recording for truly unhandled exceptions

…ts, fix exitType

- Extract GetHostName() as shared method in XMake.cs and BuildManager.cs
  to eliminate duplicated host detection logic
- Sanitize StackTop to redact file paths that may contain PII (usernames)
  while preserving method names and line numbers
- Use consistent exitType 'EndBuildFailure' in BuildManager instead of
  exception.GetType().Name
- Add CrashTelemetry_Tests with 8 tests covering PopulateFromException,
  GetProperties, GetActivityProperties, PII redaction, and null handling
- Add comment clarifying why Initialize is needed in RecordAndFlushCrashTelemetry

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@YuliiaKovalova YuliiaKovalova merged commit 893c824 into main Feb 20, 2026
10 checks passed
@YuliiaKovalova YuliiaKovalova deleted the dev/ykovalova/add_telemetry_to_catch_msbuild_exceptions branch February 20, 2026 09:13
Copy link
Copy Markdown
Member

@JanProvaznik JanProvaznik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the kind of change that would be better with a spec that has a feedback from the team.

ShowHelpPrompt();

exitType = ExitType.SwitchError;
RecordCrashTelemetry(e, exitType);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really interesting to collect the instances when users have typos in switches?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because it helps us to understand if documentation should be updated/other aliases of switches added.

Exception ex = (Exception)e.ExceptionObject;
DumpExceptionToFile(ex);
#if !CLR2COMPATIBILITY && !MICROSOFT_BUILD_ENGINE_OM_UNITTESTS
RecordCrashTelemetryForUnhandledException(ex);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this really avoid sending telemetry in our tests?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be, but if our tests crash with unhandled exceptions - it's not a good sign.

/// <summary>
/// Timestamp when the crash occurred.
/// </summary>
public DateTime? CrashTimestamp { get; set; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? isn't this already included in the data field in the database on ingestion?

/// <summary>
/// The exit type / category of the crash (e.g., "LoggerFailure", "Unexpected", "UnhandledException").
/// </summary>
public string? ExitType { get; set; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing this string seems a bit too liberal?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know!
it's a version 1, before that we haven't had insights in the crashes - once it's available and analyzed by the team, the corresponding adjustments can be made.

@YuliiaKovalova
Copy link
Copy Markdown
Member Author

YuliiaKovalova commented Feb 20, 2026

I think this is the kind of change that would be better with a spec that has a feedback from the team.

it's yet another telemetry event , i don't see why it's can't be iterative and addressed based on feedback later.

YuliiaKovalova added a commit that referenced this pull request Feb 24, 2026
…detection (#13289)

## Summary

Improves the crash telemetry infrastructure to produce higher-quality,
more actionable data and enable Prism dashboard visibility (
continuation of #13270 based on
the first results)

## Changes

### Convert ExitType and CrashOrigin from strings to enums

### Remove noise from crash telemetry
- Removed `SwitchError` and `InitializationError` from crash telemetry
recording — these accounted for ~99.8% of crash events but represent
expected user errors (bad CLI args, missing toolsets), not actual
crashes

### Remove redundant `CrashTimestamp`
- The database ingestion timestamp already captures this; the
client-side timestamp added no value

###  Deduplicate `GetHostName()`
- Consolidated 3 copies of host-detection logic (VS / VSCode /
`MSBUILD_HOST_NAME`) into a single `BuildEnvironmentState.GetHostName()`
method

### Harden `DebugUtils` static constructor
- Wrapped in try/catch to prevent `TypeInitializationException` crashes
when environment variable access fails

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JanProvaznik pushed a commit to JanProvaznik/msbuild that referenced this pull request Feb 25, 2026
Add CrashTelemetry class that captures rich exception information
including exception type, inner exception type, stack trace hash
(SHA-256 for bucketing without PII), top stack frame, HResult, exit type
classification, criticality flag, MSBuild version, framework name, and
host environment.

Crash telemetry is emitted in two places:
- XMake.Execute() catch blocks for all handled exception types
- ExceptionHandling.UnhandledExceptionHandler for truly unhandled
exceptions

The telemetry is recorded via KnownTelemetry.CrashTelemetry and flushed
in the finally block of XMake.Execute() using TelemetryManager and the
existing IActivity/ActivitySource infrastructure. All telemetry code is
best-effort with catch-all guards to prevent secondary failures during
crash handling.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JanProvaznik pushed a commit to JanProvaznik/msbuild that referenced this pull request Feb 25, 2026
…detection (dotnet#13289)

## Summary

Improves the crash telemetry infrastructure to produce higher-quality,
more actionable data and enable Prism dashboard visibility (
continuation of dotnet#13270 based on
the first results)

## Changes

### Convert ExitType and CrashOrigin from strings to enums

### Remove noise from crash telemetry
- Removed `SwitchError` and `InitializationError` from crash telemetry
recording — these accounted for ~99.8% of crash events but represent
expected user errors (bad CLI args, missing toolsets), not actual
crashes

### Remove redundant `CrashTimestamp`
- The database ingestion timestamp already captures this; the
client-side timestamp added no value

###  Deduplicate `GetHostName()`
- Consolidated 3 copies of host-detection logic (VS / VSCode /
`MSBUILD_HOST_NAME`) into a single `BuildEnvironmentState.GetHostName()`
method

### Harden `DebugUtils` static constructor
- Wrapped in try/catch to prevent `TypeInitializationException` crashes
when environment variable access fails

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants