Project

General

Profile

Actions

Bug #70770

open

cephfs: crash when killing a lone session

Added by Abhishek Lekshmanan 12 months ago. Updated 8 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Backport:
reef,squid,tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-420-g1d1b0db41e
Released In:
Upkeep Timestamp:
2025-07-14T20:45:40+00:00

Description

When a lone session is killed during a finisher op, we see crashes like this

-1 /root/ceph/src/mds/SessionMap.cc: In function 'void SessionMap::hit_session(Session*)' thread 7f102150164
0 time 2025-03-31T17:35:23.250461+0200
/root/ceph/src/mds/SessionMap.cc: 1104: FAILED ceph_assert(sessions != 0)

 ceph version 19.3.0-6717-gef0dfeb295f (ef0dfeb295f720a472b36fc95c6787388ead3eb1) squid (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x116) [0x7f102dcd47d7]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7f102dcd49de]
 3: (SessionMap::hit_session(Session*)+0xe1) [0x5560635b6373]
 4: (Server::reply_client_request(boost::intrusive_ptr<MDRequestImpl> const&, boost::intrusive_ptr<MClientReply> const&)+0xa40) [0x55606326f0d2]
 5: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl> const&, int)+0x347) [0x556063271657]
 6: (C_MDS_openc_finish::finish(int)+0x170) [0x5560633159da]
 7: (MDSContext::complete(int)+0x7f) [0x5560635c62bf]
 8: (MDSIOContextBase::complete(int)+0x8ce) [0x5560635c6df2]
 9: (MDSLogContextBase::complete(int)+0x42) [0x5560635c6f20]
 10: (Finisher::finisher_thread_entry()+0x66e) [0x7f102dc5a800]
 11: (Finisher::FinisherThread::entry()+0xd) [0x7f102dc5b841]
 12: (Thread::entry_wrapper()+0x2f) [0x7f102dca1b05]
 13: (Thread::_entry_func(void*)+0x9) [0x7f102dca1b17]
 14: /lib64/libc.so.6(+0x897e2) [0x7f102ca897e2]
 15: /lib64/libc.so.6(+0x10e800) [0x7f102cb0e800]

     0> 2025-03-31T17:35:23.255+0200 7f1021501640 -1 *** Caught signal (Aborted) **
 in thread 7f1021501640 thread_name:mds-rank-fin

 ceph version 19.3.0-6717-gef0dfeb295f (ef0dfeb295f720a472b36fc95c6787388ead3eb1) squid (dev)
 1: /lib64/libc.so.6(+0x3e730) [0x7f102ca3e730]
 2: /lib64/libc.so.6(+0x8b52c) [0x7f102ca8b52c]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x25d) [0x7f102dcd491e]
 6: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7f102dcd49de]
 7: (SessionMap::hit_session(Session*)+0xe1) [0x5560635b6373]
 8: (Server::reply_client_request(boost::intrusive_ptr<MDRequestImpl> const&, boost::intrusive_ptr<MClientReply> const&)+0xa40) [0x55606326f0d2]
 9: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl> const&, int)+0x347) [0x556063271657]
 10: (C_MDS_openc_finish::finish(int)+0x170) [0x5560633159da]
 11: (MDSContext::complete(int)+0x7f) [0x5560635c62bf]
 12: (MDSIOContextBase::complete(int)+0x8ce) [0x5560635c6df2]
 13: (MDSLogContextBase::complete(int)+0x42) [0x5560635c6f20]
 14: (Finisher::finisher_thread_entry()+0x66e) [0x7f102dc5a800]
 15: (Finisher::FinisherThread::entry()+0xd) [0x7f102dc5b841]
 16: (Thread::entry_wrapper()+0x2f) [0x7f102dca1b05]
 17: (Thread::_entry_func(void*)+0x9) [0x7f102dca1b17]
 18: /lib64/libc.so.6(+0x897e2) [0x7f102ca897e2]
 19: /lib64/libc.so.6(+0x10e800) [0x7f102cb0e800]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This is another edge case similar to https://tracker.ceph.com/issues/47833


Related issues 3 (2 open1 closed)

Copied to CephFS - Backport #71379: reef: cephfs: crash when killing a lone session ResolvedJos CollinActions
Copied to CephFS - Backport #71380: tentacle: cephfs: crash when killing a lone session QA TestingRishabh DaveActions
Copied to CephFS - Backport #71381: squid: cephfs: crash when killing a lone session QA TestingJos CollinActions
Actions #2

Updated by Venky Shankar 12 months ago

  • Category set to Correctness/Safety
  • Status changed from New to Fix Under Review
  • Assignee set to Abhishek Lekshmanan
  • Target version set to v20.0.0
  • Source set to Community (dev)
  • Backport set to reef,squid
  • Pull request ID set to 62631
Actions #3

Updated by Venky Shankar 10 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from reef,squid to reef,squid,tentacle
Actions #4

Updated by Upkeep Bot 10 months ago

  • Copied to Backport #71379: reef: cephfs: crash when killing a lone session added
Actions #5

Updated by Upkeep Bot 10 months ago

  • Copied to Backport #71380: tentacle: cephfs: crash when killing a lone session added
Actions #6

Updated by Upkeep Bot 10 months ago

  • Copied to Backport #71381: squid: cephfs: crash when killing a lone session added
Actions #7

Updated by Upkeep Bot 10 months ago

  • Tags (freeform) set to backport_processed
Actions #8

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to 1d1b0db41e1a4e0731e1bb25e8485b2e74950839
  • Fixed In set to v20.3.0-420-g1d1b0db41e1
  • Upkeep Timestamp set to 2025-07-08T14:46:58+00:00
Actions #9

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.3.0-420-g1d1b0db41e1 to v20.3.0-420-g1d1b0db41e1a
  • Upkeep Timestamp changed from 2025-07-08T14:46:58+00:00 to 2025-07-14T15:21:06+00:00
Actions #10

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.3.0-420-g1d1b0db41e1a to v20.3.0-420-g1d1b0db41e
  • Upkeep Timestamp changed from 2025-07-14T15:21:06+00:00 to 2025-07-14T20:45:40+00:00
Actions

Also available in: Atom PDF