Skip to content

common: fix backtrace leak in __ceph_abort and friends#63300

Merged
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:assert-fix-leaks
May 18, 2025
Merged

common: fix backtrace leak in __ceph_abort and friends#63300
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:assert-fix-leaks

Conversation

@tchaikov
Copy link
Contributor

@tchaikov tchaikov commented May 15, 2025

Previously, in __ceph_abort and related abort handlers, we allocated ClibBackTrace instances using raw pointers without proper cleanup. Since these handlers terminate execution, the leaks didn't affect production systems but were correctly flagged by ASan during testing:

Direct leak of 288 byte(s) in 1 object(s) allocated from:
    #0 0x55aefe8cb65d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_ceph_assert+0x1f465d) (BuildId: a4faeddac80b0d81062bd53ede3388c0c10680bc)
    #1 0x7f3b84da988d in ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...) /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/assert.cc:157:21
    #2 0x55aefe8cf04b in supressed_assertf_line22() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:22:3
    #3 0x55aefe8ce4e6 in CephAssertDeathTest_ceph_assert_supresssions_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:31:3
    #4 0x55aefe99135d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2653:10
    #5 0x55aefe94f015 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2689:14
...

This commit resolves the issue by using std::unique_ptr to manage the lifecycle of backtrace objects, ensuring proper cleanup even in non-returning functions. While these leaks had no practical impact in production (as the process terminates anyway), fixing them improves code quality and eliminates false positives in memory analysis tools.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

Previously, in __ceph_abort and related abort handlers, we allocated
ClibBackTrace instances using raw pointers without proper cleanup. Since
these handlers terminate execution, the leaks didn't affect production
systems but were correctly flagged by ASan during testing:

```
Direct leak of 288 byte(s) in 1 object(s) allocated from:
    #0 0x55aefe8cb65d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_ceph_assert+0x1f465d) (BuildId: a4faeddac80b0d81062bd53ede3388c0c10680bc)
    #1 0x7f3b84da988d in ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...) /home/jenkins-build/build/workspace/ceph-pull-requests/src/common/assert.cc:157:21
    #2 0x55aefe8cf04b in supressed_assertf_line22() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:22:3
    #3 0x55aefe8ce4e6 in CephAssertDeathTest_ceph_assert_supresssions_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/ceph_assert.cc:31:3
    #4 0x55aefe99135d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2653:10
    #5 0x55aefe94f015 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2689:14
...
```

This commit resolves the issue by using std::unique_ptr to manage the
lifecycle of backtrace objects, ensuring proper cleanup even in
non-returning functions. While these leaks had no practical impact in
production (as the process terminates anyway), fixing them improves code
quality and eliminates false positives in memory analysis tools.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
@tchaikov tchaikov mentioned this pull request May 15, 2025
14 tasks
@tchaikov
Copy link
Contributor Author

Building remotely on [54.226.41.118+aws-builder-3](https://jenkins.ceph.com/computer/54.226.41.118+aws-builder-3/) (jammy sepia focal arm64 bionic gigantic huge installed-os-jammy xenial) in workspace /home/jenkins-build/build/workspace/ceph-pull-requests-arm64
...
E: Could not create temporary file for /var/cache/apt/pkgcache.bin - mkstemp (28: No space left on device)
E: The package lists or status file could not be parsed or opened.
dpkg: unrecoverable fatal error, aborting:

@tchaikov
Copy link
Contributor Author

jenkins test make check arm64

@tchaikov tchaikov requested a review from Matan-B May 16, 2025 05:02
@tchaikov
Copy link
Contributor Author

hi @Matan-B , could you please help review this change?

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could ClibBackTrace replaced by std::basic_stacktrace in the future?

@tchaikov
Copy link
Contributor Author

yes, yet another change pending on bumping up the clang version for the decent C++23 support.

@tchaikov tchaikov merged commit c88fe7e into ceph:main May 18, 2025
20 of 22 checks passed
@tchaikov tchaikov deleted the assert-fix-leaks branch May 18, 2025 13:49
@Matan-B
Copy link
Contributor

Matan-B commented May 19, 2025

yes, yet another change pending on bumping up the clang version for the decent C++23 support.

I've updated make check to clang to 19 not a while ago btw

@tchaikov
Copy link
Contributor Author

tchaikov commented Jun 8, 2025

yes, yet another change pending on bumping up the clang version for the decent C++23 support.

I've updated make check to clang to 19 not a while ago btw

@Matan-B that's a great achievement! but i am afraid that we still need to bump up the Clang used in Win32 build. see https://jenkins.ceph.com/job/ceph-windows-pull-requests/60580/consoleText, where we were still using clang 16:

Building remotely on 172.21.2.15+braggi15 (buster small jammy gigantic windows bullseye trusty libvirt sepia x86_64 focal bionic braggi vagrant huge installed-os-jammy xenial amd64) in workspace /home/jenkins-build/build/workspace/ceph-windows-pull-requests
...
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /home/ubuntu/ceph/build.deps/mingw-llvm/bin/x86_64-w64-mingw32-clang
-- Check for working C compiler: /home/ubuntu/ceph/build.deps/mingw-llvm/bin/x86_64-w64-mingw32-clang - works
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /home/ubuntu/ceph/build.deps/mingw-llvm/bin/x86_64-w64-mingw32-clang++
-- Check for working CXX compiler: /home/ubuntu/ceph/build.deps/mingw-llvm/bin/x86_64-w64-mingw32-clang++ - works
-- Detecting CXX compile features
-- Detecting CXX compile features - done

fortunately, https://github.com/mstorsjo/llvm-mingw/releases/tag/20250528 already provides clang 20 built for ubuntu 22.04, so i created #63797 to address this discrepancy. once we have windows build ready, we should be able to switch to C++23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants