Project

General

Profile

Actions

Bug #65545

closed

Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce request

Added by Leonid Usov almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Backport:
squid
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, quiesce
Labels (FS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

Reported by the QE team at https://bugzilla.redhat.com/show_bug.cgi?id=2275459

2024-04-17T07:26:33.666+0000 7fa0a7c0b640 10 quiesce.mgr.44189 <sanitize_roots> Normalized root '/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626' to 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.666+0000 7fa0a840c640 10 quiesce.mds.0 <operator()> submit_request: value:file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626
2024-04-17T07:26:33.667+0000 7fa0a840c640 10 quiesce.agt <agent_thread_main> got request handle < mds.0:3431> for 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.669+0000 7fa0a840c640 10 quiesce.mds.0 <operator()> submit_request: value:file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626
...
2024-04-17T07:26:33.670+0000 7fa0a840c640 10 quiesce.agt <agent_thread_main> got request handle < mds.0:3437> for 'file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626'
...
2024-04-17T07:26:33.674+0000 7fa0a7c0b640  5 quiesce.mgr.44189 <leader_upkeep_set> [cg_test1_p00@106,file:/volumes/_nogroup/sv_def_6/c37e6c79-a83a-4b1d-96e8-16584f440626] reported by at least one peer as: QS_FAILED (6)

This problem is due to a race condition that appears when multiple db updates are posted to the agent rapidly.
When new roots begin processing but don't yet make it into the currently tracked set, there is a window for the next update with the same roots to treat them as new.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #65570: squid: Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce requestResolvedLeonid UsovActions
Actions #1

Updated by Leonid Usov almost 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56956
Actions #2

Updated by Leonid Usov almost 2 years ago

  • Description updated (diff)
Actions #3

Updated by Leonid Usov almost 2 years ago

  • Component(FS) quiesce added
Actions #4

Updated by Leonid Usov almost 2 years ago

  • Description updated (diff)
Actions #5

Updated by Leonid Usov almost 2 years ago

  • Backport set to squid
Actions #6

Updated by Leonid Usov almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #65570: squid: Quiesce may fail randomly with EBADF due to the same root submitted to the MDCache multiple times under the same quiesce request added
Actions #9

Updated by Leonid Usov almost 2 years ago

  • Component(FS) MDS added
Actions #10

Updated by Leonid Usov almost 2 years ago

  • Status changed from Pending Backport to Closed
Actions

Also available in: Atom PDF