Skip to content

test/osd: Fix unittest_peeringstate message leaks:#67544

Merged
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:wip-fix-unittest-peeringstate-leaks
Mar 18, 2026
Merged

test/osd: Fix unittest_peeringstate message leaks:#67544
tchaikov merged 1 commit intoceph:mainfrom
tchaikov:wip-fix-unittest-peeringstate-leaks

Conversation

@tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Feb 26, 2026

The test creates OSD messages (MOSDPGLease, MOSDPGNotify2, MBackfillReserve, etc.) via ceph::make_message but the mock PGListener accumulates them in a messages map without clearing them at test teardown. This caused ~100 leak reports in CI.

Fix by explicitly clearing undispatched messages in TearDown() before destroying the listeners. This ensures all MessageRef objects are properly released even if not all messages were dispatched during the test.

Part of the leak report:

==77847==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 86400 byte(s) in 48 object(s) allocated from:
    #0 0x61806af7c2bd in operator new(unsigned long) (/ceph/build/bin/unittest_peeringstate+0x9092bd) (BuildId: 9a900986804eedf4e9290ec705e6444f4bf0ba94)
    #1 0x61806b4bc9b9 in boost::intrusive_ptr<MOSDPGInfo2> ceph::make_message<MOSDPGInfo2, spg_t&, pg_info_t const&, unsigned int&, unsigned int&, std::optional<pg_lease_t>&, std::optional<pg_lease_ack_t>&>(spg_t&, pg_info_t const&, unsigned int&, unsigned int&, std::optional<pg_lease_t>&, std::optional<pg_lease_ack_t>&) /ceph/src/msg/Message.h:597:11
    #2 0x61806b3e182a in BufferedRecoveryMessages::send_info(int, spg_t, unsigned int, unsigned int, pg_info_t const&, std::optional<pg_lease_t>, std::optional<pg_lease_ack_t>) /ceph/src/osd/PeeringState.cc:86:5
    #3 0x61806b4c86c9 in PeeringCtxWrapper::send_info(int, spg_t, unsigned int, unsigned int, pg_info_t const&, std::optional<pg_lease_t>, std::optional<pg_lease_ack_t>) /ceph/src/osd/PeeringState.h:269:10
    #4 0x61806b48a56c in PeeringState::ReplicaActive::react(PeeringState::ActivateCommitted const&) /ceph/src/osd/PeeringState.cc:7011:8
    #5 0x61806b5f7cb6 in boost::statechart::detail::reaction_result boost::statechart::custom_reaction<PeeringState::ActivateCommitted>::react<PeeringState::ReplicaActive, boost::statechart::event_base, void const*>(PeeringState::ReplicaActive&, boost::statechart::event_base const&, void const* const&) /opt/ceph/include/boost/statechart/custom_reaction.hpp:42:15
    #6 0x61806b5f7b1d in boost::statechart::detail::reaction_result boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list10<boost::statechart::custom_reaction<PeeringState::ActivateCommitted>, boost::statechart::custom_reaction<DeferRecovery>, boost::statechart::custom_reaction<DeferBackfill>, boost::statechart::custom_reaction<PeeringState::UnfoundRecovery>, boost::statechart::custom_reaction<PeeringState::UnfoundBackfill>, boost::statechart::custom_reaction<PeeringState::RemoteBackfillPreempted>, boost::statechart::custom_reaction<PeeringState::RemoteRecoveryPreempted>, boost::statechart::custom_reaction<RecoveryDone>, boost::statechart::transition<PeeringState::DeleteStart, PeeringState::ToDelete, boost::statechart::detail::no_context<PeeringState::DeleteStart>, &boost::statechart::detail::no_context<PeeringState::DeleteStart>::no_function(PeeringState::DeleteStart const&)>, boost::statechart::custom_reaction<MLease>>, boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>>(boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*) /opt/ceph/include/boost/statechart/simple_state.hpp:814:11
    #7 0x61806b5f79b4 in boost::statechart::detail::reaction_result boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>::local_react<boost::mpl::list10<boost::statechart::custom_reaction<PeeringState::ActivateCommitted>, boost::statechart::custom_reaction<DeferRecovery>, boost::statechart::custom_reaction<DeferBackfill>, boost::statechart::custom_reaction<PeeringState::UnfoundRecovery>, boost::statechart::custom_reaction<PeeringState::UnfoundBackfill>, boost::statechart::custom_reaction<PeeringState::RemoteBackfillPreempted>, boost::statechart::custom_reaction<PeeringState::RemoteRecoveryPreempted>, boost::statechart::custom_reaction<RecoveryDone>, boost::statechart::transition<PeeringState::DeleteStart, PeeringState::ToDelete, boost::statechart::detail::no_context<PeeringState::DeleteStart>, &boost::statechart::detail::no_context<PeeringState::DeleteStart>::no_function(PeeringState::DeleteStart const&)>, boost::statechart::custom_reaction<MLease>>>(boost::statechart::event_base const&, void const*) /opt/ceph/include/boost/statechart/simple_state.hpp:850:14

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@tchaikov
Copy link
Contributor Author

@tchaikov tchaikov force-pushed the wip-fix-unittest-peeringstate-leaks branch from 5dacb0a to d0afe9d Compare February 26, 2026 12:23
The test creates OSD messages (MOSDPGLease, MOSDPGNotify2,
MBackfillReserve, etc.) via ceph::make_message but the mock
PGListener accumulates them in a messages map without clearing them
at test teardown. This caused ~100 leak reports in CI.

Fix by explicitly clearing undispatched messages in TearDown() before
destroying the listeners. This ensures all MessageRef objects are
properly released even if not all messages were dispatched during the
test.

Part of the leak report:
```
==77847==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 86400 byte(s) in 48 object(s) allocated from:
    #0 0x61806af7c2bd in operator new(unsigned long) (/ceph/build/bin/unittest_peeringstate+0x9092bd) (BuildId: 9a900986804eedf4e9290ec705e6444f4bf0ba94)
    #1 0x61806b4bc9b9 in boost::intrusive_ptr<MOSDPGInfo2> ceph::make_message<MOSDPGInfo2, spg_t&, pg_info_t const&, unsigned int&, unsigned int&, std::optional<pg_lease_t>&, std::optional<pg_lease_ack_t>&>(spg_t&, pg_info_t const&, unsigned int&, unsigned int&, std::optional<pg_lease_t>&, std::optional<pg_lease_ack_t>&) /ceph/src/msg/Message.h:597:11
    #2 0x61806b3e182a in BufferedRecoveryMessages::send_info(int, spg_t, unsigned int, unsigned int, pg_info_t const&, std::optional<pg_lease_t>, std::optional<pg_lease_ack_t>) /ceph/src/osd/PeeringState.cc:86:5
    #3 0x61806b4c86c9 in PeeringCtxWrapper::send_info(int, spg_t, unsigned int, unsigned int, pg_info_t const&, std::optional<pg_lease_t>, std::optional<pg_lease_ack_t>) /ceph/src/osd/PeeringState.h:269:10
    #4 0x61806b48a56c in PeeringState::ReplicaActive::react(PeeringState::ActivateCommitted const&) /ceph/src/osd/PeeringState.cc:7011:8
    #5 0x61806b5f7cb6 in boost::statechart::detail::reaction_result boost::statechart::custom_reaction<PeeringState::ActivateCommitted>::react<PeeringState::ReplicaActive, boost::statechart::event_base, void const*>(PeeringState::ReplicaActive&, boost::statechart::event_base const&, void const* const&) /opt/ceph/include/boost/statechart/custom_reaction.hpp:42:15
    #6 0x61806b5f7b1d in boost::statechart::detail::reaction_result boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list10<boost::statechart::custom_reaction<PeeringState::ActivateCommitted>, boost::statechart::custom_reaction<DeferRecovery>, boost::statechart::custom_reaction<DeferBackfill>, boost::statechart::custom_reaction<PeeringState::UnfoundRecovery>, boost::statechart::custom_reaction<PeeringState::UnfoundBackfill>, boost::statechart::custom_reaction<PeeringState::RemoteBackfillPreempted>, boost::statechart::custom_reaction<PeeringState::RemoteRecoveryPreempted>, boost::statechart::custom_reaction<RecoveryDone>, boost::statechart::transition<PeeringState::DeleteStart, PeeringState::ToDelete, boost::statechart::detail::no_context<PeeringState::DeleteStart>, &boost::statechart::detail::no_context<PeeringState::DeleteStart>::no_function(PeeringState::DeleteStart const&)>, boost::statechart::custom_reaction<MLease>>, boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>>(boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*) /opt/ceph/include/boost/statechart/simple_state.hpp:814:11
    #7 0x61806b5f79b4 in boost::statechart::detail::reaction_result boost::statechart::simple_state<PeeringState::ReplicaActive, PeeringState::Started, PeeringState::RepNotRecovering, (boost::statechart::history_mode)0>::local_react<boost::mpl::list10<boost::statechart::custom_reaction<PeeringState::ActivateCommitted>, boost::statechart::custom_reaction<DeferRecovery>, boost::statechart::custom_reaction<DeferBackfill>, boost::statechart::custom_reaction<PeeringState::UnfoundRecovery>, boost::statechart::custom_reaction<PeeringState::UnfoundBackfill>, boost::statechart::custom_reaction<PeeringState::RemoteBackfillPreempted>, boost::statechart::custom_reaction<PeeringState::RemoteRecoveryPreempted>, boost::statechart::custom_reaction<RecoveryDone>, boost::statechart::transition<PeeringState::DeleteStart, PeeringState::ToDelete, boost::statechart::detail::no_context<PeeringState::DeleteStart>, &boost::statechart::detail::no_context<PeeringState::DeleteStart>::no_function(PeeringState::DeleteStart const&)>, boost::statechart::custom_reaction<MLease>>>(boost::statechart::event_base const&, void const*) /opt/ceph/include/boost/statechart/simple_state.hpp:850:14
```
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
@tchaikov tchaikov force-pushed the wip-fix-unittest-peeringstate-leaks branch from d0afe9d to 5a5dbc7 Compare February 27, 2026 04:39
@tchaikov tchaikov requested a review from bill-scales February 27, 2026 04:41
@tchaikov
Copy link
Contributor Author

@bill-scales Hi Bill, would you be able to review this change? This PR fixes memory leaks in the gating tests as part of a broader effort to enable AddressSanitizer (ASan) in our CI workflow (see #56537) -- catching these issues early is a key step toward getting that enabled.

@tchaikov
Copy link
Contributor Author

@bill-scales hi Bill, ping?

@tchaikov tchaikov merged commit 69a475e into ceph:main Mar 18, 2026
13 checks passed
@tchaikov tchaikov deleted the wip-fix-unittest-peeringstate-leaks branch March 18, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants