Skip to content

mds: QuiesceDbRequest: update the internal encoding of ops#57912

Merged
leonid-s-usov merged 1 commit intomainfrom
wip-lusov-qdb-exclude-or-cancel
Jun 8, 2024
Merged

mds: QuiesceDbRequest: update the internal encoding of ops#57912
leonid-s-usov merged 1 commit intomainfrom
wip-lusov-qdb-exclude-or-cancel

Conversation

@leonid-s-usov
Copy link
Contributor

Excluding the last root from a set will automatically mark it as QS_CANCELED. Hence, it makes more sense if exclude and cancel share the same op code, rather than exclude and release.

Fixes: https://tracker.ceph.com/issues/66383

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Excluding the last root from a set will automatically mark it as QS_CANCELED.
Hence, it makes more sense if `exclude` and `cancel` share the same op code,
rather than `exclude` and `release`.

Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Fixes: https://tracker.ceph.com/issues/66383
@github-actions github-actions bot added cephfs Ceph File System tests labels Jun 6, 2024
@leonid-s-usov leonid-s-usov requested a review from batrick June 6, 2024 12:32
@leonid-s-usov
Copy link
Contributor Author

Functional tests pass: https://pulpito.ceph.com/leonidus-2024-06-07_08:29:57-fs:functional-wip-lusov-quiesce-distro-default-smithi/

with-quiesce: https://pulpito.ceph.com/leonidus-2024-06-07_08:30:28-fs-wip-lusov-quiesce-distro-default-smithi/

5 total failures vs 25 passed. 2 cases of the teuthology command timeout with status 124. Both are cases of messenger failures

  1. https://pulpito.ceph.com/leonidus-2024-06-07_08:30:28-fs-wip-lusov-quiesce-distro-default-smithi/7744791/

teuthology.log:

2024-06-07T09:09:07.902 INFO:teuthology.orchestra.run.smithi086.stderr:2024-06-07T09:09:07.906+0000 7f406501c640  1 --2- 172.21.15.86:0/659380233 >> [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] conn(0x5592b329bc90 0x5592b32a
8390 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0
2024-06-07T09:09:07.902 INFO:teuthology.orchestra.run.smithi086.stderr:2024-06-07T09:09:07.906+0000 7f406501c640  1 --2- 172.21.15.86:0/659380233 >> [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] conn(0x5592b329bc90 0x5592b32a
8390 crc :-1 s=READY pgs=585 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).ready entity=mds.0 client_cookie=7d18f3ea6e0949d9 server_cookie=5b7809d75dbd8337 in_seq=0 out_seq=0
2024-06-07T09:09:07.902 INFO:teuthology.orchestra.run.smithi086.stderr:2024-06-07T09:09:07.906+0000 7f403ffff640 10 client.5433 ms_handle_connect on v2:172.21.15.165:6836/485690691
2024-06-07T09:09:07.903 INFO:tasks.ceph.mds.a.smithi086.stderr:2024-06-07T09:09:07.907+0000 7f14dbd60640 -1 quiesce.mds.2 <quiesce_dispatch> failed to decode message of type 1285 v1: End of buffer [buffer:2]

mds.f.log:

2024-06-07T09:09:07.906+0000 7f6c11477640  1 -- [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] >> [v2:172.21.15.86:6835/636795820,v1:172.21.15.86:6836/636795820] conn(0x55f286684400 msgr2=0x55f286928000 crc :-1 s=STATE_CONNECT
ION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe
2024-06-07T09:09:07.906+0000 7f6c11477640  1 --2- [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] >> [v2:172.21.15.86:6835/636795820,v1:172.21.15.86:6836/636795820] conn(0x55f286684400 0x55f286928000 crc :-1 s=READY pgs=58 cs=5
1 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).write_message error sending 0x55f2874c1800, (32) Broken pipe
2024-06-07T09:09:07.906+0000 7f6c11477640  1 --2- [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] >> [v2:172.21.15.86:6835/636795820,v1:172.21.15.86:6836/636795820] conn(0x55f286684400 0x55f286928000 crc :-1 s=READY pgs=58 cs=5
1 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).write_event send msg failed
2024-06-07T09:09:07.906+0000 7f6c11477640  1 --2- [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] >> [v2:172.21.15.86:6835/636795820,v1:172.21.15.86:6836/636795820] conn(0x55f286684400 0x55f286928000 crc :-1 s=READY pgs=58 cs=5
1 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).write_event send msg failed

mds.a.log:

2024-06-07T09:09:07.906+0000 7f14de565640  1 -- [v2:172.21.15.86:6835/636795820,v1:172.21.15.86:6836/636795820] >> [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] conn(0x55cf55595000 msgr2=0x55cf55598c00 unknown :-1 s=STATE_CON
NECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 38
2024-06-07T09:09:07.906+0000 7f14de565640  1 -- [v2:172.21.15.86:6835/636795820,v1:172.21.15.86:6836/636795820] >> [v2:172.21.15.165:6836/485690691,v1:172.21.15.165:6837/485690691] conn(0x55cf55595000 msgr2=0x55cf55598c00 unknown :-1 s=STATE_CON
NECTION_ESTABLISHED l=0).read_until read failed
  1. https://pulpito.ceph.com/leonidus-2024-06-07_08:30:28-fs-wip-lusov-quiesce-distro-default-smithi/7744790/
00a8230 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2024-06-07T09:35:09.116 INFO:teuthology.orchestra.run.smithi046.stderr:2024-06-07T09:35:09.109+0000 7f502980f740  4 client.6615 mds_command: new command op to 4279 tid=0 multi_id=0 [{"prefix": "get_command_descriptions"}]
2024-06-07T09:35:09.116 INFO:teuthology.orchestra.run.smithi046.stderr:2024-06-07T09:35:09.109+0000 7f502980f740  1 -- 172.21.15.46:0/2460656757 --> [v2:172.21.15.187:6838/2197482988,v1:172.21.15.187:6839/2197482988] -- command(tid 0: {"prefix":
 "get_command_descriptions"}) -- 0x55bc2ffc7ef0 con 0x55bc30076cb0
2024-06-07T09:35:09.117 INFO:teuthology.orchestra.run.smithi046.stderr:2024-06-07T09:35:09.110+0000 7f5024888640  1 --2- 172.21.15.46:0/2460656757 >> [v2:172.21.15.187:6838/2197482988,v1:172.21.15.187:6839/2197482988] conn(0x55bc30076cb0 0x55bc3
00a8230 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0
2024-06-07T09:35:09.117 INFO:teuthology.orchestra.run.smithi046.stderr:2024-06-07T09:35:09.110+0000 7f4fff7fe640 10 client.6615 ms_handle_connect on v2:172.21.15.187:6838/2197482988
2024-06-07T09:35:09.117 INFO:teuthology.orchestra.run.smithi046.stderr:2024-06-07T09:35:09.110+0000 7f5024888640  1 --2- 172.21.15.46:0/2460656757 >> [v2:172.21.15.187:6838/2197482988,v1:172.21.15.187:6839/2197482988] conn(0x55bc30076cb0 0x55bc3
00a8230 crc :-1 s=READY pgs=2424 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).ready entity=mds.0 client_cookie=4d06d1f8c16c30f0 server_cookie=1a9a20458f114187 in_seq=0 out_seq=0
2024-06-07T09:35:09.117 INFO:teuthology.orchestra.run.smithi046.stderr:2024-06-07T09:35:09.111+0000 7f5024888640  1 -- 172.21.15.46:0/2460656757 >> [v2:172.21.15.187:6838/2197482988,v1:172.21.15.187:6839/2197482988] conn(0x55bc30076cb0 msgr2=0x5
5bc300a8230 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 18
2024-06-07T09:35:09.117 INFO:teuthology.orchestra.run.smithi046.stderr:2024-06-07T09:35:09.111+0000 7f5024888640  1 -- 172.21.15.46:0/2460656757 >> [v2:172.21.15.187:6838/2197482988,v1:172.21.15.187:6839/2197482988] conn(0x55bc30076cb0 msgr2=0x5
5bc300a8230 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed

This PR is safe to merge

@leonid-s-usov leonid-s-usov merged commit d3b710a into main Jun 8, 2024
@leonid-s-usov leonid-s-usov deleted the wip-lusov-qdb-exclude-or-cancel branch June 8, 2024 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants