Project

General

Profile

Actions

Bug #54741

closed

crash: MDSTableClient::got_journaled_ack(unsigned long)

Added by Telemetry Bot about 4 years ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Telemetry
Backport:
tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Tags (freeform):
Fixed In:
v20.3.0-1890-g742cb59571
Released In:
Upkeep Timestamp:
2025-08-12T05:09:42+00:00

Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=366ab44cb3c1d002359d6d1dbe2b4287e261f13517807620ef5a83625e3c0a2b

Sanitized backtrace:

    MDSTableClient::got_journaled_ack(unsigned long)
    ETableClient::replay(MDSRank*)
    MDLog::_replay_thread()
    MDLog::ReplayThread::entry()

Crash dump sample:
{
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12c20) [0x7f7500e67c20]",
        "(std::_Hashtable<unsigned long, unsigned long, std::allocator<unsigned long>, std::__detail::_Identity, std::equal_to<unsigned long>, std::hash<unsigned long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_erase(std::integral_constant<bool, true>, unsigned long const&)+0x37) [0x55651af0bb97]",
        "(MDSTableClient::got_journaled_ack(unsigned long)+0x179) [0x55651af09159]",
        "(ETableClient::replay(MDSRank*)+0x1b3) [0x55651afd8d03]",
        "(MDLog::_replay_thread()+0xcd1) [0x55651af688e1]",
        "(MDLog::ReplayThread::entry()+0x11) [0x55651ac6a311]",
        "/lib64/libpthread.so.0(+0x817a) [0x7f7500e5d17a]",
        "clone()" 
    ],
    "ceph_version": "16.2.7",
    "crash_id": "2022-03-01T10:31:29.990280Z_af75c1cc-080f-4364-9410-565aa6812923",
    "entity_name": "mds.79d0a07c1449cab02f6cf00fb829f33d1b9f49e6",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mds",
    "stack_sig": "c009c98d5387ea202da11cd36585a1c06b8d48ee740496c707f1171a5332598c",
    "timestamp": "2022-03-01T10:31:29.990280Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-74-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021" 
}


Related issues 3 (1 open2 closed)

Related to CephFS - Bug #62962: mds: standby-replay daemon crashes on replayDuplicateMilind Changire

Actions
Related to CephFS - Bug #59119: mds: segmentation fault during replay of snaptable updatesPending BackportVenky Shankar

Actions
Related to CephFS - Bug #70723: qa: AddressSanitizer reports heap-use-after-free in mds-log-replay threadResolvedMilind Changire

Actions
Actions #1

Updated by Telemetry Bot about 4 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.4, v16.2.7 added
Actions #2

Updated by Telemetry Bot almost 4 years ago

  • Project changed from RADOS to CephFS
  • Crash signature (v1) updated (diff)
Actions #3

Updated by Milind Changire over 2 years ago

  • Assignee set to Xiubo Li
Actions #4

Updated by Venky Shankar over 2 years ago

  • Assignee changed from Xiubo Li to Venky Shankar
  • Target version set to v19.0.0
  • Backport set to quincy,reef
  • Component(FS) MDS added
  • Labels (FS) crash added

Xiubo, I'm taking this one. This has been hit in a downstream build lately.

Actions #5

Updated by Venky Shankar over 2 years ago

BTW, this crash is seen in the standby-replay MDS daemon.

Actions #6

Updated by Xiubo Li over 2 years ago

Venky Shankar wrote:

BTW, this crash is seen in the standby-replay MDS daemon.

Sure Venky, not getting a chance to check it yet.

Actions #7

Updated by Venky Shankar over 2 years ago

... and we have a core to inspect. Hopefully things go well.

Actions #8

Updated by Xiubo Li over 2 years ago

Venky Shankar wrote:

... and we have a core to inspect. Hopefully things go well.

Sounds good.

Actions #9

Updated by Venky Shankar over 2 years ago

The LogSegment has an empty `pending_commit_tids` map

(gdb) p pending_commit
$9 = std::map with 1 element = {[174037] = 0x55919f128300}
(gdb) p (LogSegment *)0x55919f128300
$10 = (LogSegment *) 0x55919f128300
(gdb) p *(LogSegment *)0x55919f128300
$11 = {seq = 94081258618881, offset = 94083927409456, end = 0, num_events = 0, dirty_dirfrags = {_head = {_prev = 0x2a0b, _next = 0x0}, item_offset = 94081258618881}, new_dirfrags = {_head = {_prev = 0x55919f1282d0, _next = 0x55919f128300},
    item_offset = 94083927409504}, dirty_inodes = {_head = {_prev = 0x2a0c, _next = 0x0}, item_offset = 94081258618881}, dirty_dentries = {_head = {_prev = 0x55919f128330, _next = 0x0}, item_offset = 0}, open_files = {_head = {_prev = 0x2a0d, _next = 0x0},
    item_offset = 94081258618881}, dirty_parent_inodes = {_head = {_prev = 0x55919f128690, _next = 0x55919f128210}, item_offset = 94083927409936}, dirty_dirfrag_dir = {_head = {_prev = 0x2a0e, _next = 0x0}, item_offset = 94081258618881}, dirty_dirfrag_nest = {_head = {
      _prev = 0x55919f1283f0, _next = 0x0}, item_offset = 0}, dirty_dirfrag_dirfragtree = {_head = {_prev = 0x2a0f, _next = 0x0}, item_offset = 94081258618881},
  truncating_inodes = std::set with 94081258618881 elements<error reading variable: Cannot access memory at address 0x2a28>, purging_inodes = {_size = 94083927409648,
    m = std::map with 94083927409936 elements<error reading variable: Cannot access memory at address 0x18>}, purged_cb = 0x55919f1283f0, pending_commit_tids = std::map with 0 elements,
  uncommitted_leaders = std::set with 94083927409792 elements<error reading variable: Cannot access memory at address 0x400010>, uncommitted_peers = std::set with 0 elements,
  uncommitted_fragments = std::set with 94083927409744 elements<error reading variable: Cannot access memory at address 0x400010>, last_client_tids = std::map with 0 elements,
  touched_sessions = std::set with 94083927409984 elements<error reading variable: Cannot access memory at address 0x400010>, inotablev = 94083927410080, sessionmapv = 10776,
  tablev = std::map with 10777 elements<error reading variable: Cannot access memory at address 0x18>, expiry_waiters = std::vector of length 0, capacity 11760490926242 = {<error reading variable>
(gdb) p ((LogSegment *)0x55919f128300)->pending_commit_tids
$12 = std::map with 0 elements

but `MDSTableClient::got_journaled_ack` tries to remove an element from the map value.

void MDSTableClient::got_journaled_ack(version_t tid)
{
  dout(10) << "got_journaled_ack " << tid << dendl;
  if (pending_commit.count(tid)) {
    pending_commit[tid]->pending_commit_tids[table].erase(tid);  <-- here
    pending_commit.erase(tid);
  }
}
Actions #10

Updated by Venky Shankar over 2 years ago

  • Related to Bug #62962: mds: standby-replay daemon crashes on replay added
Actions #11

Updated by Venky Shankar over 2 years ago

So, this happens in a standby-replay daemon

(gdb) bt
#0  0x00007f6a667c854c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007f6a6677bd46 in raise () from /lib64/libc.so.6
#2  0x0000558ffa73187a in reraise_fatal (signum=11) at /usr/src/debug/ceph-17.2.6-148.el9cp.x86_64/src/global/signal_handler.cc:88
#3  handle_oneshot_fatal_signal (signum=11) at /usr/src/debug/ceph-17.2.6-148.el9cp.x86_64/src/global/signal_handler.cc:363
#4  <signal handler called>
#5  0x00007f6a66aee0d3 in std::_Rb_tree_decrement(std::_Rb_tree_node_base*) () from /lib64/libstdc++.so.6
#6  0x0000558ffa4078e5 in std::_Rb_tree_iterator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator-- (this=<synthetic pointer>) at /usr/include/c++/11/bits/stl_tree.h:302
#7  std::_Rb_tree<int, std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::_Select1st<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::_M_get_insert_unique_pos (this=<optimized out>, __k=@0x559077906bc0: 1) at /usr/include/c++/11/bits/stl_tree.h:2080
#8  0x0000558ffa7ccea2 in std::_Rb_tree<int, std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > >, std::_Select1st<std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > > >, std::less<int>, std::allocator<std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > > > >::_M_get_insert_hint_unique_pos (__k=@0x559077906bc0: 1, __position={first = 0, second = std::unordered_set with 94081258618881 elements}, this=0x55919f128468) at /usr/include/c++/11/bits/stl_tree.h:2209
#9  std::_Rb_tree<int, std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > >, std::_Select1st<std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > > >, std::less<int>, std::allocator<std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > > > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<int const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > > >, std::piecewise_construct_t const&, std::tuple<int const&>&&, std::tuple<>&&) [clone .constprop.0] [clone .isra.0] (this=0x55919f128468, __pos={first = 0, second = std::unordered_set with 94081258618881 elements}) at /usr/include/c++/11/bits/stl_tree.h:2435
#10 0x0000558ffa67d9fc in std::map<int, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >, std::less<int>, std::allocator<std::pair<int const, std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> > > > >::operator[] (__k=@0x558ffb691450: 1, this=<optimized out>) at /usr/include/c++/11/bits/stl_tree.h:350
#11 MDSTableClient::got_journaled_ack (this=0x558ffb691440, tid=<optimized out>) at /usr/src/debug/ceph-17.2.6-148.el9cp.x86_64/src/mds/MDSTableClient.cc:222
#12 0x0000558ffa6c8923 in MDLog::_replay_thread (this=0x558ffc40c300) at /usr/src/debug/ceph-17.2.6-148.el9cp.x86_64/src/mds/MDLog.cc:1436
#13 0x0000558ffa43ff41 in MDLog::ReplayThread::entry (this=<optimized out>) at /usr/src/debug/ceph-17.2.6-148.el9cp.x86_64/src/mds/MDLog.h:192
#14 0x00007f6a667c6802 in start_thread () from /lib64/libc.so.6
#15 0x00007f6a66766450 in clone3 () from /lib64/libc.so.6
(gdb) f 12
#12 0x0000558ffa6c8923 in MDLog::_replay_thread (this=0x558ffc40c300) at /usr/src/debug/ceph-17.2.6-148.el9cp.x86_64/src/mds/MDLog.cc:1436
1436            le->replay(mds);
(gdb) p mds->state
$36 = MDSMap::STATE_STANDBY_REPLAY

EMetaBlob::table_tids is persisted in mdlog which holds the commits tids, however, the standby-replay daemon didn't seem to replay that log segment (perhaps that log segment was trimmed by the active MDS?). If the next log segment has the `TABLESERVER_OP_ACK` log event, this crash will happen.

Actions #12

Updated by Patrick Donnelly over 2 years ago

  • Related to Bug #59119: mds: segmentation fault during replay of snaptable updates added
Actions #13

Updated by Patrick Donnelly almost 2 years ago

  • Target version changed from v19.0.0 to v20.0.0
Actions #14

Updated by Venky Shankar almost 2 years ago

/a/vshankar-2024-05-10_15:47:15-fs-wip-vshankar-testing-20240509.114851-debug-testing-default-smithi/7701383

The underlying cause is describe in https://tracker.ceph.com/issues/59119 - the active MDS trims a log segment from the journal but the s-r daemon still think it can still continue replaying the log segments which is incorrect and leads to this crash.

Actions #15

Updated by Venky Shankar over 1 year ago

So, the series of log segment replay that leads to this crash is as follows (in standby-replay daemon):

A journal read is done by the replay thread

2024-05-10T22:03:01.148+0000 7fdf6e200700 20 mds.0.0 _standby_replay_restart_finish: old_read_pos=42766129 trimmed_pos=37748736
2024-05-10T22:03:01.148+0000 7fdf6e200700 10 mds.0.log standby_trim_segments
2024-05-10T22:03:01.148+0000 7fdf6e200700 10 mds.0.log  expire_pos=42319376
2024-05-10T22:03:01.148+0000 7fdf6e200700 10 mds.0.log  maybe trim LogSegment(10205/0x285be10 events=21)
2024-05-10T22:03:01.148+0000 7fdf6e200700 10 mds.0.log  won't remove, not expired!
2024-05-10T22:03:01.148+0000 7fdf6e200700 20 mds.0.log  removed no segments!
2024-05-10T22:03:01.148+0000 7fdf6e200700  2 mds.0.0 Booting: 2: replaying mds log
2024-05-10T22:03:01.148+0000 7fdf6e200700 10 mds.0.log replay start, from 42766129 to 43141352
2024-05-10T22:03:01.149+0000 7fdf6c400700 10 mds.0.log _replay_thread start

This log segment has some a mksnap and an rmsnap event, though the sub-events are spread across subsequent log segments.

OP_PREPARE (snap server):

2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42766129~121 / 43141352 2024-05-10T22:03:00.678242+0000: ETableServer snaptable prepare reqid 34 mds.0 tid 81 version 81 mutation=57 bytes
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal  ETableServer.replay snaptable prepare event 81 - 1 == table 80
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.snap prepare v81 create snap(22 0x1 'scheduled-2024-05-10-22_03_00_UTC' 2024-05-10T22:03:00.678077+0000)

MKSNAP:

2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42766270~4226 / 43141352 2024-05-10T22:03:00.684379+0000: EUpdate mksnap [metablob table_tids=[^A,81]]
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EUpdate::replay
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay 0 dirlumps by unknown.0
2024-05-10T22:03:01.153+0000 7fdf6c400700 20 mds.0.cache.ino(0x1) decode_snap_blob snaprealm(0x1 seq 22 lc 22 cr 22 cps 1 snaps={16=snap(16 0x1 'scheduled-2024-05-10-21_57_00_UTC' 2024-05-10T21:57:00.411295+0000),18=snap(18 0x1 'scheduled-2024-05-10-21_58_00_UTC' 2024-\
05-10T21:58:00.795387+0000),1a=snap(1a 0x1 'scheduled-2024-05-10-21_59_00_UTC' 2024-05-10T21:59:00.287682+0000),1c=snap(1c 0x1 'scheduled-2024-05-10-22_00_00_UTC' 2024-05-10T22:00:00.625804+0000),1e=snap(1e 0x1 'scheduled-2024-05-10-22_01_00_UTC' 2024-05-10T22:01:00.81\
2167+0000),20=snap(20 0x1 'scheduled-2024-05-10-22_02_00_UTC' 2024-05-10T22:02:00.140519+0000),22=snap(22 0x1 'scheduled-2024-05-10-22_03_00_UTC' 2024-05-10T22:03:00.477972+0000)} last_modified 2024-05-10T22:03:00.477972+0000 change_attr 33 0x557a01ab2000)
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay  updated root [inode 0x1 [...22,head] / auth v170 snaprealm=0x557a01ab2000 f(v0 m2024-05-10T21:40:01.354241+0000 1=0+1) n(v4 rc2024-05-10T22:03:00.477972+0000 b1286564962 rs7 66878=63038+3840)/\
n(v0 rc2024-05-10T21:38:23.018285+0000 1=0+1) old_inodes=6 | dirfrag=1 dirty=1 0x557a01aa7080]
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay noting snaptable transaction 81
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.tableclient(snaptable) got_journaled_agree 81
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.snapclient notify_commit tid 81
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay request client.15360:236 trim_to 235
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42770516~64 / 43141352 2024-05-10T22:03:00.689268+0000: ETableServer snaptable commit tid 81 version 82
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal  ETableServer.replay snaptable commit event 82 - 1 == table 81
2024-05-10T22:03:01.153+0000 7fdf6c400700  7 mds.0.snap commit 81 create snap(22 0x1 'scheduled-2024-05-10-22_03_00_UTC' 2024-05-10T22:03:00.678077+0000)

OP_COMMIT (snap server):

2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.snapclient notify_commit tid 81
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay request client.15360:236 trim_to 235
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42770516~64 / 43141352 2024-05-10T22:03:00.689268+0000: ETableServer snaptable commit tid 81 version 82
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal  ETableServer.replay snaptable commit event 82 - 1 == table 81
2024-05-10T22:03:01.153+0000 7fdf6c400700  7 mds.0.snap commit 81 create snap(22 0x1 'scheduled-2024-05-10-22_03_00_UTC' 2024-05-10T22:03:00.678077+0000)

OP_ACK (snap client):

2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42770704~40 / 43141352 2024-05-10T22:03:00.888532+0000: ETableClient snaptable ack tid 81
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal  ETableClient.replay snaptable op ack tid 81
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.tableclient(snaptable) got_journaled_ack 81

This is fine. Now, let's see the rmsnap event.

OP_PREPARE (snap server):

2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42770600~84 / 43141352 2024-05-10T22:03:00.884034+0000: ETableServer snaptable prepare reqid 35 mds.0 tid 83 version 83 mutation=20 bytes
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal  ETableServer.replay snaptable prepare event 83 - 1 == table 82
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.snap prepare v83 destroy 16 seq 23

RMSNAP:

2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42770764~4143 / 43141352 2024-05-10T22:03:00.889489+0000: EUpdate rmsnap [metablob table_tids=[^A,83]]
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EUpdate::replay
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay 0 dirlumps by unknown.0
2024-05-10T22:03:01.153+0000 7fdf6c400700 20 mds.0.cache.ino(0x1) decode_snap_blob snaprealm(0x1 seq 23 lc 22 cr 22 cps 1 snaps={18=snap(18 0x1 'scheduled-2024-05-10-21_58_00_UTC' 2024-05-10T21:58:00.795387+0000),1a=snap(1a 0x1 'scheduled-2024-05-10-21_59_00_UTC' 2024-\
05-10T21:59:00.287682+0000),1c=snap(1c 0x1 'scheduled-2024-05-10-22_00_00_UTC' 2024-05-10T22:00:00.625804+0000),1e=snap(1e 0x1 'scheduled-2024-05-10-22_01_00_UTC' 2024-05-10T22:01:00.812167+0000),20=snap(20 0x1 'scheduled-2024-05-10-22_02_00_UTC' 2024-05-10T22:02:00.14\
0519+0000),22=snap(22 0x1 'scheduled-2024-05-10-22_03_00_UTC' 2024-05-10T22:03:00.477972+0000)} last_modified 2024-05-10T22:03:00.876094+0000 change_attr 34 0x557a01ab2000)
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay  updated root [inode 0x1 [...22,head] / auth v171 snaprealm=0x557a01ab2000 f(v0 m2024-05-10T21:40:01.354241+0000 1=0+1) n(v4 rc2024-05-10T22:03:00.876094+0000 b1286564962 rs6 66878=63038+3840)/\
n(v0 rc2024-05-10T21:38:23.018285+0000 1=0+1) old_inodes=6 | dirfrag=1 dirty=1 0x557a01aa7080]
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay noting snaptable transaction 83
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.tableclient(snaptable) got_journaled_agree 83
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.snapclient notify_commit tid 83
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal EMetaBlob.replay request client.15360:248 trim_to 247

OP_COMMIT (snap server):

2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.log _replay 42774927~64 / 43141352 2024-05-10T22:03:00.896226+0000: ETableServer snaptable commit tid 83 version 84
2024-05-10T22:03:01.153+0000 7fdf6c400700 10 mds.0.journal  ETableServer.replay snaptable commit event 84 - 1 == table 83
2024-05-10T22:03:01.153+0000 7fdf6c400700  7 mds.0.snap commit 83 destroy 16 seq 23

That's the end of the log segment - note, no OP_ACK yet. Later log segment LogSegment(10205/0x285be10 events=21) is trimmed since the expire position has moved ahead of the end of the log segment. From MDLog::standby_trim_segment():

    if (ls->end > expire_pos) {
      dout(10) << " won't remove, not expired!" << dendl;
      break;
    }

    if (segments.size() == 1) {
      dout(10) << " won't remove, last segment!" << dendl;
      break;
    }

    dout(10) << " removing segment" << dendl;
    mds->mdcache->standby_trim_segment(ls);
    remove_oldest_segment();
    if (pre_segments_size > 0) {
      --pre_segments_size;
    }
    removed_segment = true;
  }

So, the log segment is removed and trimmed:

2024-05-10T22:03:03.159+0000 7fdf6e200700 20 mds.0.0 _standby_replay_restart_finish: old_read_pos=43141352 trimmed_pos=41943040
2024-05-10T22:03:03.159+0000 7fdf6e200700 10 mds.0.log standby_trim_segments
2024-05-10T22:03:03.159+0000 7fdf6e200700 10 mds.0.log  expire_pos=42775011
2024-05-10T22:03:03.159+0000 7fdf6e200700 10 mds.0.log  maybe trim LogSegment(10205/0x285be10 events=28)
2024-05-10T22:03:03.159+0000 7fdf6e200700 10 mds.0.log  removing segment

and the journal is read again to replay events:

2024-05-10T22:03:03.159+0000 7fdf6e200700 10 mds.0.log  maybe trim LogSegment(10233/0x28cb1e3 events=1)
2024-05-10T22:03:03.159+0000 7fdf6e200700 10 mds.0.log  won't remove, not expired!
2024-05-10T22:03:03.159+0000 7fdf6e200700 20 mds.0.log  calling mdcache->trim!
2024-05-10T22:03:03.159+0000 7fdf6e200700  5 mds.0.cache trim bytes_used=2MB limit=4GB reservation=0.05% count=0
2024-05-10T22:03:03.159+0000 7fdf6e200700  7 mds.0.cache trim_lru trimming 0 items from LRU size=541 mid=158 pintail=0 pinned=290
2024-05-10T22:03:03.159+0000 7fdf6e200700 20 mds.0.cache bottom_lru: 0 items, 0 top, 0 bot, 0 pintail, 0 pinned
2024-05-10T22:03:03.159+0000 7fdf6e200700 20 mds.0.cache lru: 541 items, 158 top, 383 bot, 0 pintail, 290 pinned
2024-05-10T22:03:03.159+0000 7fdf6e200700  7 mds.0.cache trim_lru trimmed 0 items
2024-05-10T22:03:03.159+0000 7fdf6e200700  2 mds.0.0 Booting: 2: replaying mds log

The OP_ACK event (from RMSNAP) is in this log segment:

2024-05-10T22:03:03.160+0000 7fdf6c400700 10 mds.0.log _replay 43141352~40 / 43141412 2024-05-10T22:03:00.994052+0000: ETableClient snaptable ack tid 83
2024-05-10T22:03:03.160+0000 7fdf6c400700 10 mds.0.journal  ETableClient.replay snaptable op ack tid 83
2024-05-10T22:03:03.160+0000 7fdf6c400700 10 mds.0.tableclient(snaptable) got_journaled_ack 83

However, the log segment which noted (got_journaled_agree 83) the tid (83) is already removed and trimmed. This should be addressed by the MDLog refactoring change https://github.com/ceph/ceph/pull/53086 which makes LogSegment a share pointer. Interestingly, that's what I'm reviewing this week :)

Actions #16

Updated by Konstantin Shalygin about 1 year ago

  • Backport changed from quincy,reef to reef
Actions #17

Updated by Patrick Donnelly 9 months ago

  • Target version deleted (v20.0.0)
Actions #18

Updated by Venky Shankar 7 months ago

  • Category set to Correctness/Safety
  • Status changed from New to Resolved
  • Target version set to v21.0.0
  • Backport changed from reef to tentacle
  • Pull request ID set to 62554
Actions #19

Updated by Venky Shankar 7 months ago

  • Related to Bug #70723: qa: AddressSanitizer reports heap-use-after-free in mds-log-replay thread added
Actions #20

Updated by Upkeep Bot 7 months ago

  • Merge Commit set to 742cb595719d5cc4c96edeefdff5d19009b9e506
  • Fixed In set to v20.3.0-1890-g742cb59571
  • Upkeep Timestamp set to 2025-08-12T05:09:42+00:00
Actions

Also available in: Atom PDF