Skip to content

src: Fix memory leaks in generate_test_instance() by returning values instead of pointers#63910

Merged
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:wip-generate-test-instances
Aug 22, 2025
Merged

src: Fix memory leaks in generate_test_instance() by returning values instead of pointers#63910
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:wip-generate-test-instances

Conversation

@tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Jun 12, 2025

Problem:

The current generate_test_instance() function returns std::list<T*>, which creates memory management issues:

  • Inconsistent lifecycle management of test instances
  • Callers don't always properly clean up allocated memory
  • Results in ASan memory leak reports in unit tests and ceph-dencoder

Solution:

Change generate_test_instance() to return std::list<T> instead of std::list<T*>:

Core Changes:

  • Modified all classes with generate_test_instance() to return std::list<T>
  • Use emplace_back() without parameters** to avoid copy/move constructors for classes that don't define them
  • Updated ceph-dencoder to handle the new return type

ceph-dencoder Adaptations:

Since m_list now holds T objects instead of T*, and we can't assume T is copyable/moveable:

  • Keep m_object as a pointer for flexibility
  • Handle two scenarios:
    1. m_object points to an element in m_list
    2. m_object points to a decoded instance (requires manual cleanup)
  • Introduce is_owned flag to track when manual cleanup is needed

Additional Cleanup:

  • Simplify DencoderBase constructor from template to plain function (extra parameters were never used in derived classes)

With this change, object lifecycles are now managed by value semantics instead of raw pointers, eliminating memory leaks.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@tchaikov tchaikov requested review from a team as code owners June 12, 2025 14:08
@tchaikov tchaikov force-pushed the wip-generate-test-instances branch from 9fa5b9a to 3365637 Compare June 12, 2025 14:10
@tchaikov tchaikov requested review from Copilot and removed request for a team June 12, 2025 14:13

This comment was marked as resolved.

@tchaikov tchaikov force-pushed the wip-generate-test-instances branch from 3365637 to 9df4967 Compare June 12, 2025 14:24
@tchaikov tchaikov requested a review from Copilot June 12, 2025 14:25

This comment was marked as outdated.

@tchaikov tchaikov force-pushed the wip-generate-test-instances branch 2 times, most recently from fcb0815 to 87f514e Compare June 13, 2025 04:10
@tchaikov tchaikov requested review from cbodley and Copilot June 13, 2025 04:11
@tchaikov
Copy link
Contributor Author

tchaikov commented Jun 13, 2025

changelog:

  • use unique_ptr in DencoderBase as suggested by Casey
  • use emplace_back() in lieu of push_back(T{}) in more places.
  • fix an issue where we generated invalid test instances
  • fix two issues where we failed to decode all fields that are dumped in json.

@tchaikov tchaikov force-pushed the wip-generate-test-instances branch from 8542569 to da338a3 Compare June 13, 2025 21:36
@ronen-fr
Copy link
Contributor

ronen-fr commented Jun 15, 2025

A lot of work was invested in this!
LGTM - but there are two 'make check' failures that are possibly related

@tchaikov
Copy link
Contributor Author

tchaikov commented Jun 15, 2025

A lot of work was invested in this! LGTM - but there are two 'make check' failures that are possibly related

Yes, they are related. This change addresses an issue in our existing testing and reveals several problems in our test suite. I've included some fixes in this pull request, and also submitted all of them in separate pull requests to make the review process more manageable.

@tchaikov tchaikov marked this pull request as draft June 15, 2025 06:53
@tchaikov tchaikov force-pushed the wip-generate-test-instances branch from da338a3 to da7437d Compare July 21, 2025 03:22
@tchaikov tchaikov force-pushed the wip-generate-test-instances branch 2 times, most recently from d29aed8 to 34056c7 Compare July 22, 2025 01:11
@tchaikov tchaikov force-pushed the wip-generate-test-instances branch 2 times, most recently from c6204ef to cfc6598 Compare August 1, 2025 08:47
@tchaikov tchaikov force-pushed the wip-generate-test-instances branch from cfc6598 to ab1e089 Compare August 14, 2025 05:56
@tchaikov tchaikov force-pushed the wip-generate-test-instances branch from ab1e089 to f8c1528 Compare August 18, 2025 03:05
… instead of pointers

Problem:

The current `generate_test_instance()` function returns `std::list<T*>`,
which creates memory management issues:

- Inconsistent lifecycle management of test instances
- Callers don't always properly clean up allocated memory
- Results in ASan memory leak reports in unit tests and ceph-dencoder

Solution:

Change `generate_test_instance()` to return `std::list<T>` instead of `std::list<T*>`:

Core Changes:

- Modified all classes with `generate_test_instance()` to return `std::list<T>`
- Use `emplace_back()` without parameters** to avoid copy/move
  constructors for classes that don't define them
- Updated ceph-dencoder to handle the new return type

ceph-dencoder Adaptations:

Since `m_list` now holds `T` objects instead of `T*`, and we can't
assume `T` is copyable/moveable:

- Keep `m_object` as a pointer for flexibility
- Handle two scenarios:
  1. `m_object` points to an element in `m_list`
  2. `m_object` points to a decoded instance (requires manual cleanup)
- Introduce `make_ptr()` as a factory function to create a smart pointer
  to conditionally free the managed pointer.

Additional Cleanup:

- Simplify DencoderBase constructor from template to plain
  function (extra parameters were never used in derived classes)

With this change, object lifecycles are now managed by value semantics
instead of raw pointers, eliminating memory leaks.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
@tchaikov tchaikov force-pushed the wip-generate-test-instances branch from f8c1528 to ed6b712 Compare August 21, 2025 07:19
@tchaikov tchaikov marked this pull request as ready for review August 21, 2025 07:24
@ceph ceph deleted a comment from github-actions bot Aug 21, 2025
@ceph ceph deleted a comment from github-actions bot Aug 21, 2025
@tchaikov
Copy link
Contributor Author

hi @cbodley and @ronen-fr , now that all changes preparing for this pull request have landed, could you please help review this one ? it is in turn a dependency of the more challenging #56537 .

@cbodley
Copy link
Contributor

cbodley commented Aug 21, 2025

315/318 Test #165: readable.sh ............................... Passed 1613.85 sec

this shows that all of the encodings registered with ceph-dencoder are still working 👍 while this is a massive change set, i don't think it necessarily needs to go through all of the qa suites. but if anyone disagrees, we can ask Yuri to coordinate the testing

@tchaikov tchaikov merged commit b1e7da9 into ceph:main Aug 22, 2025
25 checks passed
@tchaikov tchaikov deleted the wip-generate-test-instances branch August 22, 2025 12:57
o.push_back(new cls_rbd_snap{});
static std::list<cls_rbd_snap> generate_test_instances() {
std::list<cls_rbd_snap> o;
o.emplace_back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason push_back() is used to invoke the default constructor in pretty much all other places (e.g. o.push_back(cls_rbd_parent{}); above)?

Separately, now that these "test instance" objects are no longer expected to be allocated with new, have you considered switching from push_back() to emplace_back() everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason push_back() is used to invoke the default constructor in pretty much all other places (e.g. o.push_back(cls_rbd_parent{}); above)?

hi @idryomov , No specific reason for the inconsistency. I made these changes in multiple passes - initially taking a conservative approach, then becoming more aggressive in later passes since this is test code where such changes carry lower risk.

Separately, now that these "test instance" objects are no longer expected to be allocated with new, have you considered switching from push_back() to emplace_back() everywhere?

Yes, I did consider this. Given my current bandwidth constraints, I'm prioritizing first completing the remaining changes in this commit's scope. The push_back() to emplace_back() conversion would make a good follow-up improvement.

cls::rbd::UserSnapshotNamespace{}, 543, {}});
o.push_back(cls_rbd_snap{1, "snap", 123456,
RBD_PROTECTION_STATUS_PROTECTED, cls_rbd_parent{}, 31, utime_t{},
cls::rbd::UserSnapshotNamespace{}, 543, std::optional<uint64_t>{0}});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there something that required spelling out utime_t{} and std::optional<uint64_t>{0} for this object but not for the previous or next object in this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no specific reasons. it was a leftover. should be addressed by #65227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants