Skip to content

ci: enable tsan + fix: rework breadcrumb ringbuffer#1309

Merged
JoshuaMoelans merged 32 commits intomasterfrom
test/enable_tsan
Jul 16, 2025
Merged

ci: enable tsan + fix: rework breadcrumb ringbuffer#1309
JoshuaMoelans merged 32 commits intomasterfrom
test/enable_tsan

Conversation

@supervacuus
Copy link
Copy Markdown
Collaborator

@supervacuus supervacuus commented Jul 10, 2025

related to: #962 #974

Quick summary:

  • Added TSAN CI configs to the matrix
  • Adapt test runner to configure TSAN
  • Added basic lock handling description to the contributions docs
  • Extracted the ring buffer from the current functional setup to a more stateful data structure in a separate module. This was on my agenda from shortly after you implemented this @JoshuaMoelans, but the trigger here was unnecessary options access inside scope locks for max_breadcrumbs (which is an init constant anyway)
  • Fixed a lot of tiny sync errors, not all of which might be overly problematic, but they help establish ground rules
  • Fixed DSC sharing issue.

This will typically not hit because the inversion is relevant with the lock acquisition in `sentry_init()` and these two are typically not called from two different threads at the same time.

However, consistent lock-order is such a basic sanity step and will guard us from future functions that also require scope- and options-locks.
this allows us to keep state close and remove the need to retrieve max_breadcrumbs from the options after sentry_init(). Which in turn removes the need for nested locks.
@jpnurmi
Copy link
Copy Markdown
Collaborator

jpnurmi commented Jul 11, 2025

WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=9728)
    #0 malloc <null> (sentry_example+0x63624) (BuildId: a2734971b663a1840dd0fdfd3f868f60bfaefa8f)

Nice! 🤩

@supervacuus
Copy link
Copy Markdown
Collaborator Author

WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=9728)
    #0 malloc <null> (sentry_example+0x63624) (BuildId: a2734971b663a1840dd0fdfd3f868f60bfaefa8f)

Nice! 🤩

Yeah, TSAN will prove very useful. This is particularly interesting because we knew that backtrace is not signal-safe, but it was unclear whether it uses malloc, given that we provide the frame storage.

I falsely assumed an issue in the signal handler itself because the internal check triggered by crashpad_handler forking coincided with DEADLYSIGNAL.

Turns out DEADLYSIGNAL is handled in TSAN like in ASAN, i.e. non-fatal (puh).

The culprit for crashpad is the fork during initialization, which breaks with TSAN invariants.
@supervacuus supervacuus marked this pull request as ready for review July 11, 2025 15:36
@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 15, 2025

Codecov Report

Attention: Patch coverage is 98.19820% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.84%. Comparing base (68b918d) to head (516cc1b).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1309      +/-   ##
==========================================
+ Coverage   83.65%   83.84%   +0.18%     
==========================================
  Files          54       55       +1     
  Lines        8286     8309      +23     
  Branches     1261     1264       +3     
==========================================
+ Hits         6932     6967      +35     
+ Misses       1235     1225      -10     
+ Partials      119      117       -2     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@JoshuaMoelans JoshuaMoelans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JoshuaMoelans JoshuaMoelans changed the title ci: enable tsan ci: enable tsan + fix: rework breadcrumb ringbuffer Jul 16, 2025
@JoshuaMoelans JoshuaMoelans merged commit 70f38a2 into master Jul 16, 2025
40 checks passed
@JoshuaMoelans JoshuaMoelans deleted the test/enable_tsan branch July 16, 2025 15:21
@vaind
Copy link
Copy Markdown
Contributor

vaind commented Aug 11, 2025

This PR has broken compilation of unit tests on on switch and ps5. Do you know what needs to change out of the top of your head or do I need to investigate this properly? Probably related to SENTRY__MUTEX_INIT not being available on these platforms (they use _DYN).

C:/actions-runner/_work/sentry-switch/sentry-switch/build/nx64/_deps/sentry-native-src/tests/unit/test_concurrency.c:6:44: error: use of undeclared identifier 'SENTRY__MUTEX_INIT'

    6 | static sentry_mutex_t g_test_check_mutex = SENTRY__MUTEX_INIT;

      |                                            ^

1 error generated.

@supervacuus
Copy link
Copy Markdown
Collaborator Author

This PR has broken compilation of unit tests on on switch and ps5. Do you know what needs to change out of the top of your head or do I need to investigate this properly? Probably related to SENTRY__MUTEX_INIT not being available on these platforms (they use _DYN).

C:/actions-runner/_work/sentry-switch/sentry-switch/build/nx64/_deps/sentry-native-src/tests/unit/test_concurrency.c:6:44: error: use of undeclared identifier 'SENTRY__MUTEX_INIT'

    6 | static sentry_mutex_t g_test_check_mutex = SENTRY__MUTEX_INIT;

      |                                            ^

1 error generated.

I am sorry, @vaind, I forgot to add the DYN case in the unit test. Essentially what needs to happen is to apply the same DYN init as in all other places we use SENTRY__MUTEX_INIT if that module could run on a platform that requires dynamic init (i.e. provide a conditional init based on the SENTRY__MUTEX_INIT_DYN def and then do a lazy SENTRY__MUTEX_INIT_DYN_ONCE() wherever the mutex is used). I can fix this in the afternoon, but you can also have a go if you want.

@vaind
Copy link
Copy Markdown
Contributor

vaind commented Aug 11, 2025

Thanks for the info @supervacuus, I'll likely not get to do it today so feel free to make the changes and add me as a reviewer in the PR, I can verify downstream.

BernhardMarconato pushed a commit to elgatosf/sentry-native that referenced this pull request Apr 21, 2026
* adapt python test runner

* fix lock-order inversion in `sentry_start_session()`

This will typically not hit because the inversion is relevant with the lock acquisition in `sentry_init()` and these two are typically not called from two different threads at the same time.

However, consistent lock-order is such a basic sanity step and will guard us from future functions that also require scope- and options-locks.

* properly synchronize `executed_after_shutdown` in `SENTRY_TEST(task_queue)`

* move pre-init in sentry_init outside the options lock

* move set_context outside the scope lock in set_trace so we can prevent unnecessary scope lock recursion

* fix early exit in sentry__scope_get_span_or_transaction (only relevant for tests)

* fix concurrent access to TEST_CHECK macro, which updates an unsynced global in the test-framework

* minor format

* extract the ringbuffer into a separate module

this allows us to keep state close and remove the need to retrieve max_breadcrumbs from the options after sentry_init(). Which in turn removes the need for nested locks.

* add "lock handling" to the contributor docs

* synchronize access to acutest state in the concurrency unit tests

* limit SENTRY_WITH_SCOPE in attachment tests to the lines that actually need the scope

* introduce tsan configs into the matrix

* remove ringbuffer get_len definition

* add tsan.supp + support in test runner

* disable tsan in integration tests that involve DEADLYSIGNALS

* add GCC based TSAN tests as a toolchain cross-check

* clean up

* revert Werror for GCC due to crashpad (not solving this here)

* increase TSAN verbosity for the internal check that fails

* disable all crashpad runs for tsan since it fails an internal check when forking the handler

* format

* re-enable all crashing tests in the http/stdout integration tests

I falsely assumed an issue in the signal handler itself because the internal check triggered by crashpad_handler forking coincided with DEADLYSIGNAL.

Turns out DEADLYSIGNAL is handled in TSAN like in ASAN, i.e. non-fatal (puh).

The culprit for crashpad is the fork during initialization, which breaks with TSAN invariants.

* move ringbuffer tests in separate tu

* missed cmake update

* get rid of another unnecessary lock nesting

* clone the DSC when prepping for the envelope header

* clone the DSC when prepping for the envelope header (also in event path)

* dramatically limit scope lock when prepping envelopes.

Also ensure that the cloned DSC is freed if we don't add it as a header.

* remove unused imports

---------

Co-authored-by: JoshuaMoelans <60878493+JoshuaMoelans@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants