cmake,qa,crimson: misc ASAN usage fixes by Matan-B · Pull Request #66994 · ceph/ceph

Matan-B · 2026-01-20T13:41:32Z

Related fixes that helped with #67020

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

tchaikov

i'd suggest split this pull request into smaller ones. i am fine reverting 1ab0a8c -- i will take a closer look to see how to address it.

@Matan-B do you happen to know how to reproduce this?

cmake/modules/BuildBoost.cmake

src/crimson/osd/main_config_bootstrap_helpers.cc

Matan-B · 2026-01-21T12:22:44Z

i'd suggest split this pull request into smaller ones. i am fine reverting 1ab0a8c -- i will take a closer look to see how to address it.

I'll push the revert as a separate PR and address the comments here.

@Matan-B do you happen to know how to reproduce this?

This should be reproducible on a vstart cluster with --crimson (~~can confirm later~~ See: https://tracker.ceph.com/issues/74481) - the backtrace is not from a vstart cluster. In case in helps, the problematic user was _get_early_config.

cmake/modules/BuildBoost.cmake

tchaikov · 2026-01-26T03:46:38Z

src/crimson/osd/main_config_bootstrap_helpers.cc

+    if (WIFSIGNALED(status)) {
+      int signal = WTERMSIG(status);
+      std::cerr << "get_early_config: child signaled " << strsignal(signal) << std::endl;
+      return tl::unexpected(-signal);


we are using exit(-ret.error()) in get_early_config(). so signal would be return as the status code of the crimson process, can we instead follow the UNIX convention by returning the status instead, so that the toolings like systemd can understand why a process exits?

also, would be better to explain why we need this change, and how it's connected with ASan.

we are using exit(-ret.error()) in get_early_config(). so signal would be return as the status code of the crimson process, can we instead follow the UNIX convention by returning the status instead, so that the toolings like systemd can understand why a process exits?

@Matan-B hi Matan, could you please reconsider this comment?

@tchaikov, please help me understand this case better, iiuc waitpid does not return a plain exit code.
It stores an encoded status value that must be interpreted using the provided macros (e.g WIFSIGNALED).
From the Linux man page:

If status is not NULL, wait() and waitpid() store status information in the int to which it points. This integer can be inspected with the following macros

So returning the raw status value is not meaningful if we didn't decode it.
IIRC, tools like systemd can understand if the program exited normally or was killed by a signal (such as this case).

@Matan-B You're absolutely correct that returning the raw status value wouldn't work - it's encoded and needs the macros to decode.

However, looking at the flow: when the child is killed by signal 11, you return tl::unexpected(11), which becomes the exit status of crimson-osd via main.cc:96. The issue is that exit status 11 doesn't clearly indicate "killed by signal 11" to external tools.

Good news: Ceph already has a standard pattern for this! Check out

ceph/src/common/SubProcess.cc

Line 280 in a76c01c

return 128 + WTERMSIG(status);

and

ceph/src/common/fork_function.h

Line 45 in a76c01c

return 128 + WTERMSIG(status);

This follows the shell convention where signal termination is reported as 128 + signal_number. So for SIGSEGV (11), the exit status would be 139.

Suggestion: Change to return tl::unexpected(128 + signal) to match the existing Ceph convention and make the exit status meaningful to scripts and monitoring tools.

context-impl is moved into the per-module jam file in 1.87 which resulted in the following error when trying to set the context-impl: ``` Boost/tools/build/src/build/feature.jam:327: in validate-feature from module feature error: unknown feature "<context-impl> ``` context-impl setting was brought back in 1.90 boostorg/context@12ac945 context-impl is only set in our cmake files if WITH_ASAN is on. Hence, FTBFS, were not common so far. Signed-off-by: Matan Breizman <mbreizma@redhat.com>

on vstart+asan: ``` ==764060==ERROR: LeakSanitizer: detected memory leaks Direct leak of 11696 byte(s) in 20 object(s) allocated from: #0 0x7f8402cedcab in malloc (/usr/lib64/libasan.so.8+0xedcab) (BuildId: 84eaebdb799a67003ebf64e4287c60bd45cf5b03) #1 0x7f840291ff54 in _PyObject_GC_NewVar (/lib64/libpython3.9.so.1.0+0x11ff54) (BuildId: d5e7f95ca612f1fb71c0a7a1fed9c4c11b007bda) Direct leak of 5892 byte(s) in 8 object(s) allocated from: #0 0x7f8402cedcab in malloc (/usr/lib64/libasan.so.8+0xedcab) (BuildId: 84eaebdb799a67003ebf64e4287c60bd45cf5b03) #1 0x7f840291aa22 in unicode_decode_utf8 (/lib64/libpython3.9.so.1.0+0x11aa22) (BuildId: d5e7f95ca612f1fb71c0a7a1fed9c4c11b007bda) Direct leak of 3872 byte(s) in 7 object(s) allocated from: #0 0x7f8402cedcab in malloc (/usr/lib64/libasan.so.8+0xedcab) (BuildId: 84eaebdb799a67003ebf64e4287c60bd45cf5b03) #1 0x7f84029df4e4 in _PyObject_Malloc.part.0 (/lib64/libpython3.9.so.1.0+0x1df4e4) (BuildId: d5e7f95ca612f1fb71c0a7a1fed9c4c11b007bda) Indirect leak of 3032 byte(s) in 4 object(s) allocated from: #0 0x7f8402cedcab in malloc (/usr/lib64/libasan.so.8+0xedcab) (BuildId: 84eaebdb799a67003ebf64e4287c60bd45cf5b03) #1 0x7f840291ff54 in _PyObject_GC_NewVar (/lib64/libpython3.9.so.1.0+0x11ff54) (BuildId: d5e7f95ca612f1fb71c0a7a1fed9c4c11b007bda) Indirect leak of 1776 byte(s) in 3 object(s) allocated from: #0 0x7f8402cedcab in malloc (/usr/lib64/libasan.so.8+0xedcab) (BuildId: 84eaebdb799a67003ebf64e4287c60bd45cf5b03) #1 0x7f84029df4e4 in _PyObject_Malloc.part.0 (/lib64/libpython3.9.so.1.0+0x1df4e4) (BuildId: d5e7f95ca612f1fb71c0a7a1fed9c4c11b007bda) ``` Signed-off-by: Matan Breizman <mbreizma@redhat.com>

child Without this dedicated check, the bootstrap process would proceed to decoding the buffer from the child proccess - and only then would fail. This would result in a possibly misleading "failed to decode" instead of "child signaled". See the following example where ASAN caught a SIGSEV in the child process: ``` Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: AddressSanitizer:DEADLYSIGNAL Jan 20 09:27:16 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ================================================================= Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: SUMMARY: AddressSanitizer: stack-overflow (/usr/bin/ceph-osd-crimson+0x46e7a72) (BuildId: 2a86043f51c9be9cb19801e276fb3ee36239556a) in get_global_> Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: ==3==ABORTING Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: get_early_config: parent failed to decode Jan 20 09:27:17 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: do_early_config returned error: -22 Jan 20 09:27:19 ceph-node-0 ceph-e818662e-f5e1-11f0-b263-525400908ba7-osd-1[12300]: --------------------------------------- ``` Signed-off-by: Matan Breizman <mbreizma@redhat.com>

Matan-B requested review from a team as code owners January 20, 2026 13:41

Matan-B requested a review from shraddhaag January 20, 2026 13:41

github-actions bot added build/ops common core crimson mon tests labels Jan 20, 2026

Matan-B requested a review from tchaikov January 20, 2026 13:41

Matan-B force-pushed the wip-matanb-crimson-asan-fixes branch from 6ec88a2 to 03564cb Compare January 20, 2026 13:47

Matan-B requested review from athanatos and ronen-fr January 20, 2026 13:58

Matan-B force-pushed the wip-matanb-crimson-asan-fixes branch from 03564cb to d046a5c Compare January 20, 2026 14:01

Matan-B marked this pull request as draft January 20, 2026 17:13

tchaikov reviewed Jan 21, 2026

View reviewed changes

cmake/modules/BuildBoost.cmake Outdated Show resolved Hide resolved

src/crimson/osd/main_config_bootstrap_helpers.cc Show resolved Hide resolved

Matan-B force-pushed the wip-matanb-crimson-asan-fixes branch from d046a5c to 8db56f6 Compare January 21, 2026 12:38

Matan-B changed the title ~~crimson/osd: Fix stack-overflow on deploy~~ cmake,qa,crimson: misc ASAN usage fixes Jan 21, 2026

Matan-B requested a review from tchaikov January 21, 2026 12:41

Matan-B marked this pull request as ready for review January 21, 2026 12:41

tchaikov reviewed Jan 26, 2026

View reviewed changes

Matan-B added 3 commits January 26, 2026 09:47

Matan-B force-pushed the wip-matanb-crimson-asan-fixes branch from 8db56f6 to ca60f8a Compare January 26, 2026 08:00

Matan-B requested a review from tchaikov January 26, 2026 08:00

tchaikov mentioned this pull request Feb 9, 2026

cmake: disable ASan exception interception in tests #66875

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmake,qa,crimson: misc ASAN usage fixes#66994

cmake,qa,crimson: misc ASAN usage fixes#66994
Matan-B wants to merge 3 commits intoceph:mainfrom
Matan-B:wip-matanb-crimson-asan-fixes

Matan-B commented Jan 20, 2026 •

edited

Loading

Uh oh!

tchaikov left a comment

Uh oh!

Uh oh!

Uh oh!

Matan-B commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

tchaikov Jan 26, 2026

Uh oh!

tchaikov Jan 26, 2026

Uh oh!

tchaikov Jan 29, 2026

Uh oh!

Matan-B Jan 29, 2026

Uh oh!

tchaikov Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Matan-B commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

tchaikov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Matan-B commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tchaikov Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

tchaikov Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

tchaikov Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Matan-B Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

tchaikov Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Matan-B commented Jan 20, 2026 •

edited

Loading

Matan-B commented Jan 21, 2026 •

edited

Loading