Project

General

Profile

Actions

Bug #66967

closed

MDS crash with failed ceph_assert(batch_ops.empty())

Added by Enrico Bocchi over 1 year ago. Updated over 1 year ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

We observed an MDS crash (looks identical to https://tracker.ceph.com/issues/59819) with a failed ceph assertion on batch_ops.empty():

2024-07-16T07:05:27.920+0200 7faf7c312700 -1 /builddir/build/BUILD/ceph-16.2.15/src/mds/CInode.h: In function 'virtual CInode::~CInode()' thread 7faf7c312700 time 2024-07-16T07:05:27.915826+0200
/builddir/build/BUILD/ceph-16.2.15/src/mds/CInode.h: 417: FAILED ceph_assert(batch_ops.empty())

 ceph version 16.2.15-1 (618f440892089921c3e944a991122ddc44e60516) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7faf88d5d174]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x27a38e) [0x7faf88d5d38e]
 3: (CInode::~CInode()+0x935) [0x55595a35e985]
 4: (MDCache::remove_inode(CInode*)+0x247) [0x55595a2b7bf7]
 5: (MDCache::trim_inode(CDentry*, CInode*, CDir*, std::map<int, boost::intrusive_ptr<MCacheExpire>, std::less<int>, std::allocator<std::pair<int const, boost::intrusive_ptr<MCacheExpire> > > >&)+0x296) [0x55595a2d2186]
 6: (MDCache::trim_dentry(CDentry*, std::map<int, boost::intrusive_ptr<MCacheExpire>, std::less<int>, std::allocator<std::pair<int const, boost::intrusive_ptr<MCacheExpire> > > >&)+0xc9) [0x55595a2d26f9]
 7: (MDCache::trim_lru(unsigned long, std::map<int, boost::intrusive_ptr<MCacheExpire>, std::less<int>, std::allocator<std::pair<int const, boost::intrusive_ptr<MCacheExpire> > > >&)+0xaa0) [0x55595a2d4200]
 8: (MDCache::trim(unsigned long)+0x27d) [0x55595a2d4c5d]
 9: (MDCache::upkeep_main()+0x8aa) [0x55595a309dfa]
 10: /lib64/libstdc++.so.6(+0xc2b23) [0x7faf87170b23]
 11: /lib64/libpthread.so.0(+0x81ca) [0x7faf87d3a1ca]
 12: clone()

The cluster is:
- running pacific 16.2.15
- multi-active MDS (2 active, 2 stand-by)
- manual pinning of several directory trees to mds.1
- no stand-by replay
- no snapshots

`batch_ops` is used by Server::handle_client_getattr.
We suspect the crash is due to a slow operation that has not self-resolved for long (~3h 25m at the time of the crash):

2024-07-16T03:34:21.429+0200 7faf7eb17700  0 log_channel(cluster) log [WRN] : slow request 30.814402 seconds old, received at 2024-07-16T03:33:50.616206+0200: client_request(client.352416104:7069668566 getattr Fs #0x1000c04a2be 2024-07-16T03:33:50.614574+0200 caller_uid=NNNN, caller_gid=MMMM) currently failed to rdlock, waiting
...cut...
2024-07-16T07:00:52.451+0200 7faf7eb17700  0 log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 12156.239280 secs

However, it is unclear to us why this crashed mds.0:
- The slow operation concerns an inode in a directory which is pinned to mds.1 (ceph.dir.pin=1).
- The assertion triggers from an inode with outstanding batch_ops being removed from the lru cache
- Why would the cache of mds.0 have an entry for an inode pinned to mds.1?

About the long outstanding rdlock slow operation, we set `mds cap revoke eviction timeout` to 900 seconds but it does not seem to help with this case.
Is there any other configuration option that is worth trying to address slow requests failing to r/w/xlock?


Also, mds.1 crashed upon mds.0 rejoin with a different signature (which is being addressed in https://tracker.ceph.com/issues/62036):

024-07-16T07:06:50.604+0200 7f82b4c87700 -1 /builddir/build/BUILD/ceph-16.2.15/src/mds/MDCache.cc: In function 'void MDCache::handle_cache_rejoin_ack(ceph::cref_t<MMDSCacheRejoin>&)' thread 7f82b4c87700 time 2024-07-16T07:06:50.599257+0200
/builddir/build/BUILD/ceph-16.2.15/src/mds/MDCache.cc: 5142: FAILED ceph_assert(isolated_inodes.empty())

 ceph version 16.2.15-1 (618f440892089921c3e944a991122ddc44e60516) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f82bd6ca174]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x27a38e) [0x7f82bd6ca38e]
 3: (MDCache::handle_cache_rejoin_ack(boost::intrusive_ptr<MMDSCacheRejoin const> const&)+0x2390) [0x55ba22c56d40]
 4: (MDCache::handle_cache_rejoin(boost::intrusive_ptr<MMDSCacheRejoin const> const&)+0xdb) [0x55ba22c7b51b]
 5: (MDCache::dispatch(boost::intrusive_ptr<Message const> const&)+0x344) [0x55ba22c7b9e4]
 6: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0x642) [0x55ba22b0a6d2]
 7: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x7bb) [0x55ba22b0d60b]
 8: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const> const&)+0x55) [0x55ba22b0dc05]
 9: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x108) [0x55ba22afd678]
 10: (DispatchQueue::entry()+0x126a) [0x7f82bd911faa]
 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f82bd9c6531]
 12: /lib64/libpthread.so.0(+0x81ca) [0x7f82bc6a71ca]
 13: clone()

From code (https://github.com/ceph/ceph/blob/v16.2.15/src/mds/MDCache.cc#L5140-L5142):

  // FIXME: This can happen if entire subtree, together with the inode subtree root
  // belongs to, were trimmed between sending cache rejoin and receiving rejoin ack.
  ceph_assert(isolated_inodes.empty());


Related issues 4 (1 open3 closed)

Related to CephFS - Bug #50821: qa: untar_snap_rm failure during mds thrashingFix Under ReviewXiubo Li

Actions
Related to CephFS - Backport #67163: quincy: mds: choose a new batch head in request_cleanup() insteadResolvedXiubo LiActions
Related to CephFS - Bug #65536: mds: after the unresponsive client was evicted the blocked slow requests were not successfully cleaned upResolvedXiubo Li

Actions
Related to CephFS - Bug #59749: crash: virtual CInode::~CInode(): assert(batch_ops.empty())DuplicateDhairya Parmar

Actions
Actions #1

Updated by Venky Shankar over 1 year ago

  • Assignee set to Xiubo Li

cc @Xiubo Li

I believe there was an attempt0 at fixing this, but the approach wasn't satisfactory and there hasn't been much work done on this lately to find an alternate approach.

[0]: https://github.com/ceph/ceph/pull/57124

Actions #2

Updated by Venky Shankar over 1 year ago

Venky Shankar wrote in #note-1:

cc @Xiubo Li

I believe there was an attempt0 at fixing this, but the approach wasn't satisfactory and there hasn't been much work done on this lately to find an alternate approach.

[0]: https://github.com/ceph/ceph/pull/57124

This is specifically related to the `FAILED ceph_assert(isolated_inodes.empty())` crash and not the crash mentioned in the tracker related to batch ops.

Actions #3

Updated by Venky Shankar over 1 year ago

  • Related to Bug #50821: qa: untar_snap_rm failure during mds thrashing added
Actions #4

Updated by Enrico Bocchi over 1 year ago

Venky Shankar wrote in #note-2:

Venky Shankar wrote in #note-1:

cc @Xiubo Li

I believe there was an attempt0 at fixing this, but the approach wasn't satisfactory and there hasn't been much work done on this lately to find an alternate approach.

[0]: https://github.com/ceph/ceph/pull/57124

This is specifically related to the `FAILED ceph_assert(isolated_inodes.empty())` crash and not the crash mentioned in the tracker related to batch ops.

Agreed. The only tracker I found related to batch ops is from telemetry: https://tracker.ceph.com/issues/59749 (and similarly for dentry https://tracker.ceph.com/issues/59819)

Actions #5

Updated by Enrico Bocchi over 1 year ago

We had a second crash and investigated a bit further.

The issue can be reproduced with the following setup:
  • Test cluster (16.2.9) with 2 active MDSs
  • 2 clients mounted RW, kernel mount (6.5.5-200.fc38)
  • client A runs a custom program (we call it "multiwriter", kudos to @Abhishek Lekshmanan -- https://gist.github.com/theanalyst/436a541e150810cfb20858c0b6713439) to generate load on the cluster using 2000 write threads
    • client A writes into a test directory of the filesystem (/cephfs/gtest), not pinned to any rank
    • client A (also) writes into a subdirectory manually pinned to mds.1 (/cephfs/volumes/_nogroup/multiwriter) with the same custom program
  • client B uses the same custom program with 2000 read threads, reading from the subdirectory pinned to mds.1 (/cephfs/volumes/_nogroup/multiwriter)
  • client B may use again the custom program, either reading or writing, to put additional pressure on the MDS.

The ceph_assert produces when the CInode gets removed from the LRU cache of the MDS: Cache thrashing (and a deliberately low mds_cache_memory_limit) are required to trigger the problem.
The workload described above makes it such MDS memory allocation grows0 well beyond the configured mds_cache_memory_limit (1 GB, in our testbed).

Evicting client A repeatedly (e.g., 10 times in a raw, every 10 seconds) produces slow operations.
If a slow operation with flag_point "cleaned up request" appears, this is likely stuck and not going away until the MDS removes it from its LRU, crashing with the batch_ops.empty() assertion.

Example from `dump_blocked_ops`:

{
    "ops": [
        {
            "description": "client_request(client.6440007:3389928698 getattr Xs #0x2000005ca3e 2024-07-23T16:58:07.545878+0200 caller_uid=0, caller_gid=0{0,})",
            "initiated_at": "2024-07-23T16:58:17.871938+0200",
            "age": 888.87018786199997,
            "duration": 888.87021391400003,
            "type_data": {
                "flag_point": "cleaned up request",
                "reqid": "client.6440007:3389928698",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.6440007",
                    "tid": 3389928698
                },
                "events": [
                    {
                        "time": "2024-07-23T16:58:17.871938+0200",
                        "event": "initiated" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.871939+0200",
                        "event": "throttled" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.871938+0200",
                        "event": "header_read" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.871940+0200",
                        "event": "all_read" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.919792+0200",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.919802+0200",
                        "event": "requesting remote authpins" 
                    },
                    {
                        "time": "2024-07-23T16:59:34.828645+0200",
                        "event": "killing request" 
                    },
                    {
                        "time": "2024-07-23T16:59:34.828649+0200",
                        "event": "cleaned up request" 
                    }
                ]
            }
        },
...

The inode dump is as follows:

{
    "path": "/volumes/_nogroup/multiwriter/test3.out.1303",
    "ino": 2199023635006,
    "rdev": 0,
    "ctime": "2024-07-24T13:29:23.030379+0200",
    "btime": "2024-07-22T15:36:08.831700+0200",
    "mode": 33412,
    "uid": 0,
    "gid": 0,
...

where '/volumes/_nogroup/multiwriter' is manually pinned (ceph.dir.pin) to mds.1.

It is not clear to me why mds.0 would serve lookup requests also for directories pinned to mds.1, but it seems to be the case from the MDS logs.
This explains why mds.0 crashes, though the inode belongs to a subtree pinned to another rank.

We have tried to get rid of the slow operation in different ways, all unsuccessful:
- MDS: evict the client
- MDS: session kill
- MDS: blocklist the client
- MDS: cache drop
- Client: "echo 2 > /proc/sys/vm/drop_caches"
- Client: "mount -o,remount /cephfs"

The original issue occurred on a production cluster running the latest pacific release (16.2.15).
There seem to be no relevant code changes in quincy around batch_ops, so maybe those releases may be exposed to this issue as well.

The coredump of mds.0 crashing on batch_ops.empty() assertion is available at:
- abrtd: c25a781a-7cc5-47b3-afd2-dd1eb29c152f
- systemd: 4e402a60-46cf-4749-8e8d-6ac82fa486c4

[0]:
**It is not fully understood if the memory allocation is "normal" and driven by the workload.
We have seen the MDS allocating up to 30x the configured mds_cache_memory_limit during our tests.

Actions #6

Updated by Xiubo Li over 1 year ago

  • Status changed from New to In Progress

Sorry for missing this and was busy with other urgent issues. Will take a look it today.

Actions #7

Updated by Xiubo Li over 1 year ago

Enrico Bocchi wrote:

We observed an MDS crash (looks identical to https://tracker.ceph.com/issues/59819) with a failed ceph assertion on batch_ops.empty():

[...]

The cluster is:
- running pacific 16.2.15
- multi-active MDS (2 active, 2 stand-by)
- manual pinning of several directory trees to mds.1
- no stand-by replay
- no snapshots

`batch_ops` is used by Server::handle_client_getattr.
We suspect the crash is due to a slow operation that has not self-resolved for long (~3h 25m at the time of the crash):
[...]

However, it is unclear to us why this crashed mds.0:
- The slow operation concerns an inode in a directory which is pinned to mds.1 (ceph.dir.pin=1).

So the 0x1000c04a2be just was a child of the pinned directory. But that would make sure that the auth of 0x1000c04a2be inode would be in mds.1.

- The assertion triggers from an inode with outstanding batch_ops being removed from the lru cache
- Why would the cache of mds.0 have an entry for an inode pinned to mds.1?

And the 0x1000c04a2be could be replicated to mds.0, and the getattr client requests won't always be sent to the auth mds.

About the long outstanding rdlock slow operation, we set `mds cap revoke eviction timeout` to 900 seconds but it does not seem to help with this case.
Is there any other configuration option that is worth trying to address slow requests failing to r/w/xlock?

Did you see the caps revoke stuck mds logs or warning ? If not then this getattr reqeust was not a caps revoke bug, so this wouldn't work anyway.


Also, mds.1 crashed upon mds.0 rejoin with a different signature (which is being addressed in https://tracker.ceph.com/issues/62036):
[...]

From code (https://github.com/ceph/ceph/blob/v16.2.15/src/mds/MDCache.cc#L5140-L5142):
[...]

Actions #8

Updated by Xiubo Li over 1 year ago

{
    "ops": [
        {
            "description": "client_request(client.6440007:3389928698 getattr Xs #0x2000005ca3e 2024-07-23T16:58:07.545878+0200 caller_uid=0, caller_gid=0{0,})",
            "initiated_at": "2024-07-23T16:58:17.871938+0200",
            "age": 888.87018786199997,
            "duration": 888.87021391400003,
            "type_data": {
                "flag_point": "cleaned up request",
                "reqid": "client.6440007:3389928698",
                "op_type": "client_request",
                "client_info": {
                    "client": "client.6440007",
                    "tid": 3389928698
                },
                "events": [
                    {
                        "time": "2024-07-23T16:58:17.871938+0200",
                        "event": "initiated" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.871939+0200",
                        "event": "throttled" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.871938+0200",
                        "event": "header_read" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.871940+0200",
                        "event": "all_read" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.919792+0200",
                        "event": "dispatched" 
                    },
                    {
                        "time": "2024-07-23T16:58:17.919802+0200",
                        "event": "requesting remote authpins" 
                    },
                    {
                        "time": "2024-07-23T16:59:34.828645+0200",
                        "event": "killing request" 
                    },
                    {
                        "time": "2024-07-23T16:59:34.828649+0200",
                        "event": "cleaned up request" 
                    }
                ]
            }
        },
...

This mostly like a similar issue with https://tracker.ceph.com/issues/65536. But the difference is that in 16.2.15-1 there has no the buggy commit https://github.com/ceph/ceph/pull/54581.

Since there is not any debug logs I don't know why this request wasn't retrigger after being killed. I think the https://tracker.ceph.com/issues/65536 could resolve the crash bug, but couldn't resolve the slow request issue.

Actions #9

Updated by Xiubo Li over 1 year ago

  • Status changed from In Progress to Need More Info

Enrico Bocchi wrote in #note-5:

We had a second crash and investigated a bit further.

The issue can be reproduced with the following setup:
  • Test cluster (16.2.9) with 2 active MDSs
  • 2 clients mounted RW, kernel mount (6.5.5-200.fc38)
  • client A runs a custom program (we call it "multiwriter", kudos to @Abhishek Lekshmanan -- https://gist.github.com/theanalyst/436a541e150810cfb20858c0b6713439) to generate load on the cluster using 2000 write threads
    • client A writes into a test directory of the filesystem (/cephfs/gtest), not pinned to any rank
    • client A (also) writes into a subdirectory manually pinned to mds.1 (/cephfs/volumes/_nogroup/multiwriter) with the same custom program
  • client B uses the same custom program with 2000 read threads, reading from the subdirectory pinned to mds.1 (/cephfs/volumes/_nogroup/multiwriter)
  • client B may use again the custom program, either reading or writing, to put additional pressure on the MDS.

The ceph_assert produces when the CInode gets removed from the LRU cache of the MDS: Cache thrashing (and a deliberately low mds_cache_memory_limit) are required to trigger the problem.
The workload described above makes it such MDS memory allocation grows0 well beyond the configured mds_cache_memory_limit (1 GB, in our testbed).

Evicting client A repeatedly (e.g., 10 times in a raw, every 10 seconds) produces slow operations.
If a slow operation with flag_point "cleaned up request" appears, this is likely stuck and not going away until the MDS removes it from its LRU, crashing with the batch_ops.empty() assertion.

Example from `dump_blocked_ops`:
[...]

The inode dump is as follows:
[...]
where '/volumes/_nogroup/multiwriter' is manually pinned (ceph.dir.pin) to mds.1.

It is not clear to me why mds.0 would serve lookup requests also for directories pinned to mds.1, but it seems to be the case from the MDS logs.
This explains why mds.0 crashes, though the inode belongs to a subtree pinned to another rank.

We have tried to get rid of the slow operation in different ways, all unsuccessful:
- MDS: evict the client
- MDS: session kill
- MDS: blocklist the client
- MDS: cache drop
- Client: "echo 2 > /proc/sys/vm/drop_caches"
- Client: "mount -o,remount /cephfs"

The original issue occurred on a production cluster running the latest pacific release (16.2.15).
There seem to be no relevant code changes in quincy around batch_ops, so maybe those releases may be exposed to this issue as well.

The coredump of mds.0 crashing on batch_ops.empty() assertion is available at:
- abrtd: c25a781a-7cc5-47b3-afd2-dd1eb29c152f
- systemd: 4e402a60-46cf-4749-8e8d-6ac82fa486c4

[0]:
**It is not fully understood if the memory allocation is "normal" and driven by the workload.
We have seen the MDS allocating up to 30x the configured mds_cache_memory_limit during our tests.

Could you enable the debug logs and try to reproduce it again ?

debug_mds = 20
debug_ms = 1

Thanks

Actions #10

Updated by Xiubo Li over 1 year ago

  • Related to Backport #67163: quincy: mds: choose a new batch head in request_cleanup() instead added
Actions #11

Updated by Xiubo Li over 1 year ago

Xiubo Li wrote in #note-8:

[...]

This mostly like a similar issue with https://tracker.ceph.com/issues/65536. But the difference is that in 16.2.15-1 there has no the buggy commit https://github.com/ceph/ceph/pull/54581.

Since there is not any debug logs I don't know why this request wasn't retrigger after being killed. I think the https://tracker.ceph.com/issues/65536 could resolve the crash bug, but couldn't resolve the slow request issue.

Backported this to quincy, please see https://tracker.ceph.com/issues/67163

Actions #12

Updated by Enrico Bocchi over 1 year ago

Thanks for the pointer, Xiubo.
Are you still interested in the debug logs from the MDS or for you https://tracker.ceph.com/issues/65536 addresses the problem?

Actions #13

Updated by Abhishek Lekshmanan over 1 year ago

Could you enable the debug logs and try to reproduce it again ?

debug_mds = 20
debug_ms = 1

Thanks

Hi Xiubo,
we notice that when we crank up debug mds to 20 and debug_ms 1 usually it gets very difficult to reproduce the issue as the MDS gets much more slower in this setup and somehow the concurrency bugs never appear. Though this particular case seems like a bug which very likely is fixed by your patch. Based on the uploaded coredump here is a sequence of events I believe happens:
  • On the server side; client eviction happens:
    • This invokes `journal_close_session` -> `request_kill` -> `request_cleanup`
    • Request cleanup currently doesn't account for if it is a batch op, so batch ops for the inode still maintains a reference to this request
    • Request cleanup completes; Since the request itself is a boost::intrusive_ptr, the reference maintained within the BatchOp is still valid
    • OpTracker will forever complain of a slow op as the request is cleaned up but still around
  • MDCache eviction happens at some point
    • Inode destructor is invoked, which asserts on the BatchOp not being empty

Do you think this is correct?

While the request_cleanup with your fix should ideally allow the first case to be protected, I wonder if there are any other code paths that might lead the batch ops list to hold a MDRequestRef that is in a different state?
Do you think it makes sense the `MDCache::remove_inode` also drains the Batch Ops on the inode so that it doesn't explode on delete?

Here is a simple gdb trace on the core:

(gdb) f 8
#8  0x000055b945cf4ef5 in CInode::~CInode (this=0x55ba5023e580, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/atomic_base.h:319
(gdb) p this->batch_ops
$1 = std::map with 1 element = {[64] = std::unique_ptr<BatchOp> = {get() = 0x55ba604b8480}}
(gdb) p *(Batch_Getattr_Lookup *)0x55ba604b8480
$2 = {<BatchOp> = {_vptr.BatchOp = 0x55b9462d86d8 <vtable for Batch_Getattr_Lookup+16>}, server = 0x55b947cb7520, mdr = {px = 0x55ba88956000},
  batch_reqs = std::vector of length 0, capacity 0, res = 0}
(gdb) p *(MDRequestImpl *)0x55ba88956000
$5 = {<MutationImpl> = {<TrackedOp> = {<boost::intrusive::list_base_hook<>>... events = std::vector of length 8, capacity 20 = {{stamp = {tv = {tv_sec = 1721820000, tv_nsec = 613154198}}, str = "initiated"}, {stamp = {tv = {
              tv_sec = 1721820000, tv_nsec = 613154366}}, str = "throttled"}, {stamp = {tv = {tv_sec = 1721820000, tv_nsec = 613154198}},
          str = "header_read"}, {stamp = {tv = {tv_sec = 1721820000, tv_nsec = 613156052}}, str = "all_read"}, {stamp = {tv = {tv_sec = 1721820000,
              tv_nsec = 619693894}}, str = "dispatched"}, {stamp = {tv = {tv_sec = 1721820000, tv_nsec = 619708631}}, str = "requesting remote authpins"}, {
          stamp = {tv = {tv_sec = 1721820000, tv_nsec = 800392644}}, str = "killing request"}, {stamp = {tv = {tv_sec = 1721820000, tv_nsec = 800399595}},
          str = "cleaned up request"}}, lock = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
              __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, <No data fields>},
      seq = 3736590, warn_interval_multiplier = 1, state = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 1},
        static is_always_lock_free = true},
      desc_str = "client_request(client.6440007:3395045818 getattr Xs #0x2000005c671 2024-07-24T13:20:00.604441+0200 caller_uid=0, caller_gid=0{0,})",
      desc = 0x55ba8670a900 "client_request(client.6440007:3395045818 getattr Xs #0x2000005c671 2024-07-24T13:20:00.604441+0200 caller_uid=0, caller_gid=0{0,})", want_new_desc = {_M_base = {static _S_alignment = 1, _M_i = false}, static is_always_lock_free = true}, osd_trace = {<No data fields>},
Actions #14

Updated by Xiubo Li over 1 year ago ยท Edited

Abhishek Lekshmanan wrote in #note-13:

Could you enable the debug logs and try to reproduce it again ?

debug_mds = 20
debug_ms = 1

Thanks

Hi Xiubo,
we notice that when we crank up debug mds to 20 and debug_ms 1 usually it gets very difficult to reproduce the issue

This sounds reasonable.

I just figured out the root cause of it. Just after a client request is killed it will clear all the pin references but leaving the request to be retried, which will clear the batch_ops list, in future. But just before the request is retried the trim is triggered we should see this crash.

And the https://tracker.ceph.com/issues/65536 could resolve this bug.

Thanks

Actions #15

Updated by Venky Shankar over 1 year ago

  • Related to Bug #65536: mds: after the unresponsive client was evicted the blocked slow requests were not successfully cleaned up added
Actions #16

Updated by Xiubo Li over 1 year ago

  • Status changed from Need More Info to In Progress
Actions #17

Updated by Xiubo Li over 1 year ago

  • Status changed from In Progress to Duplicate
Actions #18

Updated by Venky Shankar about 1 year ago

  • Related to Bug #59749: crash: virtual CInode::~CInode(): assert(batch_ops.empty()) added
Actions

Also available in: Atom PDF