Skip to content

fix: Write reports on concurrent crashes#7340

Merged
denrase merged 22 commits intomainfrom
repro/cocoa-14-concurrent-fatal-crash
Feb 17, 2026
Merged

fix: Write reports on concurrent crashes#7340
denrase merged 22 commits intomainfrom
repro/cocoa-14-concurrent-fatal-crash

Conversation

@denrase
Copy link
Copy Markdown
Collaborator

@denrase denrase commented Feb 2, 2026

📜 Description

  • Track which thread is handling the crash via g_crashingThread
  • Same thread re-entering = recrash (allowed)
  • Different thread = block for 2s, then continue
  • Use C11 atomics (<stdatomic.h>)
  • Rename sentrycrashcm_notifyFatalExceptionCaptured to sentrycrashcm_notifyFatalException to move it closer to KSCrash naming.
  • Merged notify + suspend into sentrycrashcm_notifyFatalException to prevent deadlock (concurrency check must run before any threads are suspended)

This eliminates the ENOENT race by ensuring only one thread ever enters crash handling at a time.

Aligned with KSCrash upstream (KSCrashMonitor.c):

💡 Motivation and Context

When multiple threads crash simultaneously, the second crash was incorrectly treated as a "recrash" (crash during crash handling) because g_handlingFatalException had no thread awareness.

Fixes #3296

💚 How did you test it?

📝 Checklist

You have to check all boxes before merging:

  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

Concurrent crashes from different threads were incorrectly detected as
recrash, causing ENOENT errors when the recrash logic tried to read a
crash report that hadn't been written yet.

Add thread-aware atomic guard to ensure only one thread handles a crash.
Same-thread recrash is still detected correctly; concurrent crashes from
other threads are discarded.

Fixes #3296
@linear
Copy link
Copy Markdown

linear Bot commented Feb 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 2, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (visionOS) Enable MetricKit Integration for visionOS by denrase in #7466

Bug Fixes 🐛

  • Write reports on concurrent crashes by denrase in #7340
  • Revert TelemetryScopeApplier gating of user attributes on sendDefaultPII by philprime in #7437

Internal Changes 🔧

Deps

  • Bump actions/checkout from 6.0.1 to 6.0.2 by dependabot in #7463
  • Bump getsentry/craft from 2.21.2 to 2.21.4 by dependabot in #7462
  • Bump getsentry/craft/.github/workflows/changelog-preview.yml from 2.21.2 to 2.21.4 by dependabot in #7461
  • Bump fastlane from 2.232.0 to 2.232.1 by dependabot in #7460
  • Bump fastlane-plugin-sentry from 2.0.0 to 2.1.0 by dependabot in #7459
  • Bump faraday from 1.10.4 to 1.10.5 by dependabot in #7446

🤖 This preview updates automatically when you update the PR.

@denrase
Copy link
Copy Markdown
Collaborator Author

denrase commented Feb 2, 2026

Please look closely at the monitor changes (SentryCrashMonitor_MachException.c, SentryCrashMonitor_Signal.c, SentryCrashMonitor_CPPException.cpp, SentryCrashMonitor_NSException.m). This is all sensitive async-signal-safe code which i'm not familiar with, as we also don't seem to have extensive test coverage here. So please give me feedback if bailing out in this concurrent crash case is the correct thing to do here.

@denrase denrase changed the title fix: Distinguish concurrent crashes from recrash fix: Write reports on concurrent crashes (first wins) Feb 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 89.65517% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.324%. Comparing base (c899804) to head (366847f).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ording/Monitors/SentryCrashMonitor_MachException.c 0.000% 1 Missing ⚠️
...ecording/Monitors/SentryCrashMonitor_NSException.m 0.000% 1 Missing ⚠️
...ash/Recording/Monitors/SentryCrashMonitor_Signal.c 0.000% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##              main     #7340       +/-   ##
=============================================
- Coverage   85.324%   85.324%   -0.001%     
=============================================
  Files          480       480               
  Lines        28620     28632       +12     
  Branches     12371     12398       +27     
=============================================
+ Hits         24420     24430       +10     
- Misses        4150      4154        +4     
+ Partials        50        48        -2     
Files with missing lines Coverage Δ
...entryCrash/Recording/Monitors/SentryCrashMonitor.c 84.000% <100.000%> (+3.672%) ⬆️
...rding/Monitors/SentryCrashMonitor_CPPException.cpp 77.876% <100.000%> (-0.195%) ⬇️
Sources/SentryCrash/Recording/SentryCrashC.c 77.777% <100.000%> (+0.277%) ⬆️
...ording/Monitors/SentryCrashMonitor_MachException.c 36.286% <0.000%> (+0.152%) ⬆️
...ecording/Monitors/SentryCrashMonitor_NSException.m 31.914% <0.000%> (+0.664%) ⬆️
...ash/Recording/Monitors/SentryCrashMonitor_Signal.c 61.261% <0.000%> (+0.546%) ⬆️

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c899804...366847f. Read the comment docs.

@denrase
Copy link
Copy Markdown
Collaborator Author

denrase commented Feb 2, 2026

Reproduced:

@IBAction func asyncCrash(_ sender: Any) {
    for i in 1...6 {
        DispatchQueue.global().async {
            fatalError("Concurrent crash #\(i)")
        }
    }
}

Capture:
https://denrase.sentry.io/issues/7236198917/?project=4508007036551168&query=is%3Aunresolved&referrer=issue-stream

@philipphofmann
Copy link
Copy Markdown
Member

Sorry that it took so long. It just took me 2 minutes to give input 🤦

So first, I would double-check KSCrash. They did plenty of fixes over the past, and they might have a similar fix. And second, we need a review from a native dev. We could ping Mischan, but I think he might be sick this week.

Thanks for looking into this @denrase 😃

@denrase denrase changed the title fix: Write reports on concurrent crashes (first wins) fix: Write reports on concurrent crashes Feb 9, 2026
@denrase
Copy link
Copy Markdown
Collaborator Author

denrase commented Feb 9, 2026

@philipphofmann Ok, I aligned this with the KSCrash solution, where they suspend for 2 seconds when a concurrent thread also crashed. Also updated naming. This fixes our issue with incorrectly detecting a re-crash and also aligns us closer with KSCrash. Of course we still need a native team review here. 🙇

@denrase denrase added the ready-to-merge Use this label to trigger all PR workflows label Feb 9, 2026
@denrase denrase marked this pull request as ready for review February 9, 2026 15:30
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor.c
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor.c
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor_MachException.c Outdated
Copy link
Copy Markdown
Contributor

@itaybre itaybre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some small comments

Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor_MachException.c Outdated
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor_MachException.c Outdated
Comment thread CHANGELOG.md Outdated
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor_MachException.c Outdated
@itaybre
Copy link
Copy Markdown
Contributor

itaybre commented Feb 9, 2026

(sorry for the duplicated comments, github was having issues)

Comment thread Sources/Sentry/include/SentryCrashMonitor.h
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor.c
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor.c
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor_MachException.c Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 10, 2026

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1225.62 ms 1258.84 ms 33.22 ms
Size 24.14 KiB 1.11 MiB 1.09 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
50e7b3e 1221.54 ms 1250.81 ms 29.27 ms
ffe0649 1213.35 ms 1248.64 ms 35.29 ms
a1a9260 1192.15 ms 1229.80 ms 37.64 ms
58a9225 1211.40 ms 1238.88 ms 27.48 ms
e8cc4e7 1216.94 ms 1246.62 ms 29.68 ms
0e4b033 1218.59 ms 1248.51 ms 29.92 ms
41b4993 1215.15 ms 1248.14 ms 32.99 ms
88c11f3 1226.18 ms 1262.62 ms 36.44 ms
5ca545a 1219.06 ms 1244.59 ms 25.53 ms
e92ab66 1228.86 ms 1258.43 ms 29.57 ms

App size

Revision Plain With Sentry Diff
50e7b3e 24.14 KiB 1.04 MiB 1.02 MiB
ffe0649 24.14 KiB 1.06 MiB 1.04 MiB
a1a9260 24.14 KiB 1.08 MiB 1.06 MiB
58a9225 24.14 KiB 1.06 MiB 1.04 MiB
e8cc4e7 24.14 KiB 1.11 MiB 1.09 MiB
0e4b033 24.14 KiB 1.11 MiB 1.09 MiB
41b4993 24.14 KiB 1.06 MiB 1.04 MiB
88c11f3 24.14 KiB 1.11 MiB 1.09 MiB
5ca545a 24.14 KiB 1.06 MiB 1.04 MiB
e92ab66 24.14 KiB 1.10 MiB 1.08 MiB

Previous results on branch: repro/cocoa-14-concurrent-fatal-crash

Startup times

Revision Plain With Sentry Diff
a4250c1 1224.16 ms 1253.79 ms 29.62 ms

App size

Revision Plain With Sentry Diff
a4250c1 24.14 KiB 1.10 MiB 1.08 MiB

@denrase denrase requested a review from philprime February 16, 2026 13:34
@denrase
Copy link
Copy Markdown
Collaborator Author

denrase commented Feb 16, 2026

@supervacuus I am once again asking for your review. 🙇

Copy link
Copy Markdown
Collaborator

@supervacuus supervacuus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. The thing that surprises me the most, though, is that all monitors that invoke sentrycrashcm_notifyFatalException() have very different behavior:

  • fatal signal handlers preempt the thread that crashed (and can preempt themselves), so they can reenter not only from other threads but from the same
  • Mach exceptions are handled on a port listener thread and would only re-enter from another thread if the handler dishes out the retrieved exception to worker threads
  • The cxa_throw hook only reenters from other threads (edge-case is throwing from cxa_throw, of course, which is certainly less defined than than raising a signal from a signal handler)

(I don't know enough about the NSException mechanism to categorize)

So, while guarding against reentrancy and async-signal-safety is certainly not a bad idea in all of the above, the exposure to certain issues is very different. This is primarily something that I would inline document. For instance, wasHandlingFatalException would never appear in Mach exception handling (unless that handler defers to worker threads), and an assigned thread wouldn't be the "crashing thread" but the crash handler thread. The code, as written in sentrycrashcm_notifyFatalException(), appears to be tuned to POSIX signal handling.

Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor.c
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor.c
Comment thread Sources/SentryCrash/Recording/Monitors/SentryCrashMonitor.c
@denrase
Copy link
Copy Markdown
Collaborator Author

denrase commented Feb 17, 2026

@supervacuus I have added additional comments derived from your feedback, do you think this is good enough?

@denrase denrase requested a review from supervacuus February 17, 2026 15:03
@supervacuus
Copy link
Copy Markdown
Collaborator

@supervacuus I have added additional comments derived from your feedback, do you think this is good enough?

I think so. It is now clear that this is primarily a first preventative measure, using the worst-case monitor (the signal handler) as a baseline in the face of concurrent crashes, and is much better than how it was handled before. I know inline docs can get old quickly, but it certainly helps to orient whoever works on this next.

@denrase denrase merged commit bcabca0 into main Feb 17, 2026
206 of 209 checks passed
@denrase denrase deleted the repro/cocoa-14-concurrent-fatal-crash branch February 17, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Use this label to trigger all PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sentry fails to write report to disk with concurrent fatal crashes

5 participants