Skip to content

mds: dump SnapRealm subvolume_ino#56578

Closed
mchangir wants to merge 1 commit intoceph:mainfrom
mchangir:mds-dump-src-and-dest-snaprealms-during-rename
Closed

mds: dump SnapRealm subvolume_ino#56578
mchangir wants to merge 1 commit intoceph:mainfrom
mchangir:mds-dump-src-and-dest-snaprealms-during-rename

Conversation

@mchangir
Copy link
Contributor

@mchangir mchangir commented Mar 29, 2024

for debugging purpose

Fixes: https://tracker.ceph.com/issues/65224
Signed-off-by: Milind Changire mchangir@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the cephfs Ceph File System label Mar 29, 2024
Fixes: https://tracker.ceph.com/issues/65224
Signed-off-by: Milind Changire <mchangir@redhat.com>
@mchangir mchangir force-pushed the mds-dump-src-and-dest-snaprealms-during-rename branch from 3e23d39 to b1c6e44 Compare March 29, 2024 12:48
@mchangir mchangir self-assigned this Mar 29, 2024
@vshankar
Copy link
Contributor

vshankar commented Apr 1, 2024

jenkins test api

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/66035.

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/66063.

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/66090.

@vshankar
Copy link
Contributor

It may sound a bit odd, but it seems like this change is causing a test run failure: https://pulpito.ceph.com/vshankar-2024-05-27_05:52:49-fs-wip-vshankar-testing-20240521.063558-debug-testing-default-smithi/7727595/

Actual issues is the s-r daemon crashing

 ceph version 19.0.0-3872-g38f9d900 (38f9d900c83c63799fbdbe61acc9a11b0d3554a6) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x127) [0x7f680cab4ba6]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7f680cab4dd1]
 3: (SnapClient::filter(std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&) const+0x81) [0x5637ddc8023d]
 4: (SnapRealm::build_snap_set() const+0x323) [0x5637ddc6b971]
 5: (SnapRealm::check_cache() const+0x13e) [0x5637ddc69fa0]
 6: (operator<<(std::ostream&, SnapRealm const&)+0x1fd) [0x5637ddc6a651]
 7: (CInode::decode_snap_blob(ceph::buffer::v15_2_0::list const&)+0x271) [0x5637ddbff347]
 8: (EMetaBlob::fullbit::update_inode(MDSRank*, CInode*)+0x22c) [0x5637dd8fb87e]
 9: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x2702) [0x5637dd900e94]
 10: (EUpdate::replay(MDSRank*)+0x63) [0x5637dd90b969]
 11: (MDLog::_replay_thread()+0x1f38) [0x5637ddca448c]
 12: (MDLog::ReplayThread::entry()+0x11) [0x5637ddcad9cb]
 13: (Thread::entry_wrapper()+0x43) [0x7f680ca8372d]
 14: (Thread::_entry_func(void*)+0xd) [0x7f680ca83749]
 15: /lib64/libpthread.so.0(+0x81ca) [0x7f680b6271ca]
 16: clone()

I don't see this crash in any of fs suite runs in the past one month.

@vshankar
Copy link
Contributor

It may sound a bit odd, but it seems like this change is causing a test run failure: https://pulpito.ceph.com/vshankar-2024-05-27_05:52:49-fs-wip-vshankar-testing-20240521.063558-debug-testing-default-smithi/7727595/

Actual issues is the s-r daemon crashing

 ceph version 19.0.0-3872-g38f9d900 (38f9d900c83c63799fbdbe61acc9a11b0d3554a6) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x127) [0x7f680cab4ba6]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7f680cab4dd1]
 3: (SnapClient::filter(std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&) const+0x81) [0x5637ddc8023d]
 4: (SnapRealm::build_snap_set() const+0x323) [0x5637ddc6b971]
 5: (SnapRealm::check_cache() const+0x13e) [0x5637ddc69fa0]
 6: (operator<<(std::ostream&, SnapRealm const&)+0x1fd) [0x5637ddc6a651]
 7: (CInode::decode_snap_blob(ceph::buffer::v15_2_0::list const&)+0x271) [0x5637ddbff347]
 8: (EMetaBlob::fullbit::update_inode(MDSRank*, CInode*)+0x22c) [0x5637dd8fb87e]
 9: (EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x2702) [0x5637dd900e94]
 10: (EUpdate::replay(MDSRank*)+0x63) [0x5637dd90b969]
 11: (MDLog::_replay_thread()+0x1f38) [0x5637ddca448c]
 12: (MDLog::ReplayThread::entry()+0x11) [0x5637ddcad9cb]
 13: (Thread::entry_wrapper()+0x43) [0x7f680ca8372d]
 14: (Thread::_entry_func(void*)+0xd) [0x7f680ca83749]
 15: /lib64/libpthread.so.0(+0x81ca) [0x7f680b6271ca]
 16: clone()

I don't see this crash in any of fs suite runs in the past one month.

This is a sample fs:workload run without this change: https://pulpito.ceph.com/vshankar-2024-05-28_17:28:53-fs:workload-wip-vshankar-testing-20240528.122300-debug-testing-default-smithi/

No MDS crashes, so, somehow this change should be the cause for the crash.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mchangir PTAL at the failure.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jul 28, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants