Project

General

Profile

Actions

Bug #63364

closed

MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tid

Added by Xiubo Li over 2 years ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Backport:
quincy,reef,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, MDS
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v18.0.0-7357-ge23a69c1fa0
Released In:
v19.2.0~1221
Upkeep Timestamp:
2025-07-11T21:10:43+00:00

Description

Currently only the client requests will include the oldest_client_tid, and the MDS will use the oldest_client_tid to trim the completed request list. Just in case that after the APPs finish their work and then stop sending any new client request the MDS could make the completed request list large.

We need to update the oldest_client_tid anyway.


Related issues 4 (0 open4 closed)

Related to CephFS - Bug #61947: mds: enforce a limit on the size of a session in the sessionmapResolvedVenky Shankar

Actions
Copied to CephFS - Backport #63535: quincy: MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tidDuplicateXiubo LiActions
Copied to CephFS - Backport #63536: reef: MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tidResolvedXiubo LiActions
Copied to CephFS - Backport #63551: pacific: MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tidRejectedXiubo LiActions
Actions #1

Updated by Xiubo Li over 2 years ago

  • Related to Bug #61947: mds: enforce a limit on the size of a session in the sessionmap added
Actions #2

Updated by Xiubo Li over 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 54259
Actions #3

Updated by Venky Shankar over 2 years ago

  • Category set to Correctness/Safety
  • Target version set to v19.0.0
  • Backport set to quincy,reef
  • Component(FS) Client, MDS added
Actions #4

Updated by Venky Shankar over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Upkeep Bot over 2 years ago

  • Copied to Backport #63535: quincy: MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tid added
Actions #6

Updated by Upkeep Bot over 2 years ago

  • Copied to Backport #63536: reef: MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tid added
Actions #8

Updated by Xiubo Li over 2 years ago

  • Backport changed from quincy,reef to quincy,reef,pacific
Actions #9

Updated by Xiubo Li over 2 years ago

  • Backport changed from quincy,reef,pacific to quincy,reef
Actions #10

Updated by Venky Shankar over 2 years ago

  • Backport changed from quincy,reef to quincy,reef,pacific
Actions #11

Updated by Upkeep Bot over 2 years ago

  • Copied to Backport #63551: pacific: MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tid added
Actions #14

Updated by Niklas Hambuechen about 2 years ago

Potentially related issue:

Actions #15

Updated by Xiubo Li over 1 year ago

squid:

https://pulpito.ceph.com/jcollin-2024-07-12_00:07:25-fs-wip-jcollin-testing-20240711.095637-squid-distro-default-smithi/7798210

The stock kernel version was kernel-5.14.0-474.el9.x86_64, which hasn't backported the following kernel commit yet:

commit 6df89bf220fdac9f40b0d35cd132eef54cf99d4b
Author: Xiubo Li <xiubli@redhat.com>
Date:   Thu Nov 16 10:56:24 2023 +0800

    ceph: send oldest_client_tid when renewing caps

    Update the oldest_client_tid via the session renew caps msg to
    make sure that the MDSs won't pile up the completed request list
    in a very large size.

    [ idryomov: drop inapplicable comment ]

    Link: https://tracker.ceph.com/issues/63364
    Signed-off-by: Xiubo Li <xiubli@redhat.com>
    Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Actions #16

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #17

Updated by Md Mahamudur Rahaman Sajib 9 months ago ยท Edited

"Jul 2, 2025 @ 09:01:43.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 34.629514 seconds old, received at 2025-07-02T07:01:09.226348+0000: client_request(client.6280240352:16030718 lookup #0x10000959ca3/{338d246c-49c1-2f60-0ec6-47548d913c2e} 2025-07-02T07:01:09.222549+0000 caller_uid=33, caller_gid=33{33,}) currently dispatched" 
"Jul 2, 2025 @ 09:01:43.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 34.592074 seconds old, received at 2025-07-02T07:01:09.263788+0000: client_request(client.6670238079:15643270 lookup #0x1000060265b/{62c7aeca-511f-73ed-2f80-ac31cf690234} 2025-07-02T07:01:09.262496+0000 caller_uid=33, caller_gid=33{33,}) currently dispatched" 
"Jul 2, 2025 @ 09:01:43.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 34.592038 seconds old, received at 2025-07-02T07:01:09.263824+0000: client_request(client.6670144414:15691893 lookup #0x10000602600/{702a81d3-7c0e-fa74-fcd0-0c5333f6fecd} 2025-07-02T07:01:09.260873+0000 caller_uid=33, caller_gid=33{33,}) currently dispatched" 
"Jul 2, 2025 @ 09:01:43.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 34.662662 seconds old, received at 2025-07-02T07:01:09.193200+0000: client_request(client.6670280724:16409566 lookup #0x1000095a810/{24694c3d-e968-e80e-2456-a0bc40da0866} 2025-07-02T07:01:09.186444+0000 caller_uid=33, caller_gid=33{33,}) currently dispatched" 
"Jul 2, 2025 @ 09:01:43.855","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 34.662189 seconds old, received at 2025-07-02T07:01:09.193673+0000: client_request(client.6670280724:16409567 lookup #0x1000063ebb7/{5c1ad8f6-8553-4573-a8f8-772e9ebba5b4} 2025-07-02T07:01:09.190444+0000 caller_uid=33, caller_gid=33{33,}) currently dispatched" 
"Jul 2, 2025 @ 09:01:43.855","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : 39 slow requests, 5 included below; oldest blocked for > 37.086957 secs" 
"Jul 2, 2025 @ 09:01:43.853","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.852+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560763458800 0x56071b6b2c00 crc :-1 s=SESSION_ACCEPTING pgs=170 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.850","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.848+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56087dff7000 0x5607f9684c00 crc :-1 s=SESSION_ACCEPTING pgs=167 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.845","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.844+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x5607f11b0000 0x560710f63600 crc :-1 s=SESSION_ACCEPTING pgs=164 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.802","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.800+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560828052800 0x5607f9685180 crc :-1 s=SESSION_ACCEPTING pgs=161 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.799","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.796+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560b79a05400 0x560bd603f600 crc :-1 s=SESSION_ACCEPTING pgs=158 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.735","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.732+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063cdb5c00 0x560bd6040100 crc :-1 s=SESSION_ACCEPTING pgs=155 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.696","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.692+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560763459000 0x560bd603eb00 crc :-1 s=SESSION_ACCEPTING pgs=152 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.681","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.680+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560b68b07c00 0x560bd603fb80 crc :-1 s=SESSION_ACCEPTING pgs=149 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.666","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.664+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x5608a5d80400 0x560bd6041180 crc :-1 s=SESSION_ACCEPTING pgs=146 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.651","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.648+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x5608729aac00 0x56071afb1080 crc :-1 s=SESSION_ACCEPTING pgs=143 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.649","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.648+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063ccae800 0x560710f63600 crc :-1 s=SESSION_ACCEPTING pgs=140 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.646","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.644+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560846b65800 0x56071b6b3180 crc :-1 s=SESSION_ACCEPTING pgs=137 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.602","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.600+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560883796000 0x560bd603f080 crc :-1 s=SESSION_ACCEPTING pgs=134 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.598","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.596+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063cad8c00 0x56071094ac00 crc :-1 s=SESSION_ACCEPTING pgs=131 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.596","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.592+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56083c16bc00 0x56071afb0000 crc :-1 s=SESSION_ACCEPTING pgs=128 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.592","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.588+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560863cc9800 0x56071afb3700 crc :-1 s=SESSION_ACCEPTING pgs=125 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.590","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.588+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x5608867b0800 0x56071b184b00 crc :-1 s=SESSION_ACCEPTING pgs=122 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.587","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.584+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063cbe5800 0x56071b187180 crc :-1 s=SESSION_ACCEPTING pgs=119 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.585","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.584+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x5607f7325800 0x56071b187700 crc :-1 s=SESSION_ACCEPTING pgs=116 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.584","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.580+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560b5b849c00 0x56071b184000 crc :-1 s=SESSION_ACCEPTING pgs=113 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.582","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.580+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560b5c0e9c00 0x5606727f2b00 crc :-1 s=SESSION_ACCEPTING pgs=110 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.580","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.576+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063d569000 0x560715c61b80 crc :-1 s=SESSION_ACCEPTING pgs=107 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.552","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.552+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063cc0b000 0x56071b186100 crc :-1 s=SESSION_ACCEPTING pgs=104 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.521","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.520+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560884d3c000 0x560afcc3c580 crc :-1 s=SESSION_ACCEPTING pgs=101 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.469","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.468+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560b48e9dc00 0x560bd69fc100 crc :-1 s=SESSION_ACCEPTING pgs=98 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.446","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.444+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56085827dc00 0x560a610bbb80 crc :-1 s=SESSION_ACCEPTING pgs=95 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.425","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.424+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56081c6d6400 0x56071b4fc680 crc :-1 s=SESSION_ACCEPTING pgs=92 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:43.424","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:43.420+0000 7f2db47f5700  1 mds.[REDACTED_HOST] Updating MDS map to version 1826 from mon.1" 
"Jul 2, 2025 @ 09:01:42.479","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:42.476+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063ccff800 0x560bd6041700 crc :-1 s=SESSION_ACCEPTING pgs=89 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:42.132","ceph-mds@[REDACTED_HOST2].service","2025-07-02T07:01:42.131+0000 7f64257f5700  1 mds.[REDACTED_HOST2] Updating MDS map to version 1826 from mon.0" 
"Jul 2, 2025 @ 09:01:39.162","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.160+0000 7f2dae7e9700  0 log_channel(cluster) log [WRN] : force file system read-only" 
"Jul 2, 2025 @ 09:01:39.161","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.160+0000 7f2dae7e9700 -1 mds.0.1821 unhandled write error (90) Message too long, force readonly..." 
"Jul 2, 2025 @ 09:01:39.161","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.160+0000 7f2dae7e9700  1 mds.0.cache force file system read-only" 
"Jul 2, 2025 @ 09:01:39.161","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.160+0000 7f2dae7e9700 -1 mds.0.1821 unhandled write error (90) Message too long, force readonly..." 
"Jul 2, 2025 @ 09:01:39.141","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.140+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x5608798f4400 0x560bd69fd180 crc :-1 s=SESSION_ACCEPTING pgs=86 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:39.078","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.076+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560763459400 0x560bd603e000 crc :-1 s=SESSION_ACCEPTING pgs=83 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:38.857","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.856+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063cabf800 0x560bd603e580 crc :-1 s=SESSION_ACCEPTING pgs=80 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.086972 seconds old, received at 2025-07-02T07:01:06.768779+0000: client_request(client.7520086794:4244339 getattr pAsLsXsFs #0x10005437cd1 2025-07-02T07:01:00.938659+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.086686 seconds old, received at 2025-07-02T07:01:06.769065+0000: client_request(client.7520086439:4243074 getattr pAsLsXsFs #0x10000000ad0 2025-07-02T07:00:54.938126+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.087093 seconds old, received at 2025-07-02T07:01:06.768658+0000: client_request(client.7520086794:4244336 getattr pAsLsXsFs #0x1000531d487 2025-07-02T07:00:56.903054+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.855","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : 3 slow requests, 3 included below; oldest blocked for > 32.087643 secs" 
"Jul 2, 2025 @ 09:01:39.141","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.140+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x5608798f4400 0x560bd69fd180 crc :-1 s=SESSION_ACCEPTING pgs=86 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:39.078","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:39.076+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x560763459400 0x560bd603e000 crc :-1 s=SESSION_ACCEPTING pgs=83 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:38.857","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.856+0000 7f2db6ffa700  0 --2- [v2:[REDACTED_IP_1]:6800/3373947614,v1:[REDACTED_IP_1]:6801/3373947614] >> [REDACTED_IP_2]:0/2175379589 conn(0x56063cabf800 0x560bd603e580 crc :-1 s=SESSION_ACCEPTING pgs=80 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_reconnect no existing connection exists, reseting client" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.086972 seconds old, received at 2025-07-02T07:01:06.768779+0000: client_request(client.7520086794:4244339 getattr pAsLsXsFs #0x10005437cd1 2025-07-02T07:01:00.938659+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.086686 seconds old, received at 2025-07-02T07:01:06.769065+0000: client_request(client.7520086439:4243074 getattr pAsLsXsFs #0x10000000ad0 2025-07-02T07:00:54.938126+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.087093 seconds old, received at 2025-07-02T07:01:06.768658+0000: client_request(client.7520086794:4244336 getattr pAsLsXsFs #0x1000531d487 2025-07-02T07:00:56.903054+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.855","ceph-mds@[REDACTED_HOST].service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : 3 slow requests, 3 included below; oldest blocked for > 32.087643 secs" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@monitor01-prod.service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.086972 seconds old, received at 2025-07-02T07:01:06.768779+0000: client_request(client.7520086794:4244339 getattr pAsLsXsFs #0x10005437cd1 2025-07-02T07:01:00.938659+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@monitor01-prod.service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.086686 seconds old, received at 2025-07-02T07:01:06.769065+0000: client_request(client.7520086439:4243074 getattr pAsLsXsFs #0x10000000ad0 2025-07-02T07:00:54.938126+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.856","ceph-mds@monitor01-prod.service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : slow request 32.087093 seconds old, received at 2025-07-02T07:01:06.768658+0000: client_request(client.7520086794:4244336 getattr pAsLsXsFs #0x1000531d487 2025-07-02T07:00:56.903054+0000 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting" 
"Jul 2, 2025 @ 09:01:38.855","ceph-mds@monitor01-prod.service","2025-07-02T07:01:38.852+0000 7f2db2ff2700  0 log_channel(cluster) log [WRN] : 3 slow requests, 3 included below; oldest blocked for > 32.087643 secs" 

We have some mds slow request as you can see in the log, instead of blocklisting, still mds is getting into read only mode.

Actions #18

Updated by Igor Fedotov 9 months ago

  • Status changed from Pending Backport to Resolved
Actions #19

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to e23a69c1fa0e36d61e16a6047289a41f4a815f7f
  • Fixed In set to v18.0.0-7357-ge23a69c1fa0
  • Released In set to v19.2.0~1221
  • Upkeep Timestamp set to 2025-07-11T21:10:43+00:00
Actions

Also available in: Atom PDF