Skip to content

mds: do remove the cap when seqs equal or larger than last issue#56828

Merged
batrick merged 1 commit intoceph:mainfrom
lxbsz:wip-wip-64977
Jun 23, 2024
Merged

mds: do remove the cap when seqs equal or larger than last issue#56828
batrick merged 1 commit intoceph:mainfrom
lxbsz:wip-wip-64977

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Apr 11, 2024

There is a race in case of:

   MDS                            rw Client
- Issue the 'Asx' caps to
  rw client
                             - Adds the cap, then removes it
			       later by queuing it to the cap
			       release list. But the cap->seq
			       may have been updated by previous
			       cap grant requests.
			       And the cap grant request won't
			       increase the 'last_issue' seq in
			       MDS.
- ro client's lookup
  request comes and the
  MDS sends a 'Ax' caps
  revoke request to rw
  client by increasing
  the 'seq'.
                             - The revoke request just finds
			       that the cap doesn't exist, then
			       queues a new cap release
			       immediately with the new 'seq'.
			       Then trigger to flush the pending
			       cap releases to MDS.
- Just receives the cap
  release request but the
  'seq' > cap's 'last_issue',
  then MDS will skip
  removing the cap. And
  then the _do_cap_release()
  will issue the 'Ax' caps
  back to rw client.

  Then wakes up the ro
  client's lookup request,
  while the lookup request
  will try to revoke the
  'Ax' caps again from the
  rw client.

This will cause an infinite revoke-issue loop on the mds side, preventing the RO client from progressing.

Fixes: https://tracker.ceph.com/issues/64977

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the cephfs Ceph File System label Apr 11, 2024
@chrisphoffman chrisphoffman requested a review from a team May 17, 2024 18:48
}
if (seq != cap->get_last_issue()) {
if (seq < cap->get_last_issue()) {
dout(7) << " issue_seq " << seq << " != " << cap->get_last_issue() << dendl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this dout message, too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks @leonid-s-usov

There is a race in case of:

   MDS                            rw Client
- Issue the 'Asx' caps to
  rw client
                             - Adds the cap, then removes it
			       later by queuing it to the cap
			       release list. But the cap->seq
			       may have been updated by previous
			       cap grant requests.
			       And the cap grant request won't
			       increase the 'last_issue' seq in
			       MDS.
- ro client's lookup
  request comes and the
  MDS sends a 'Ax' caps
  revoke request to rw
  client by increasing
  the 'seq'.
                             - The revoke request just finds
			       that the cap doesn't exist, then
			       queues a new cap release
			       immediately with the new 'seq'.
			       Then trigger to flush the pending
			       cap releases to MDS.
- Just receives the cap
  release request but the
  'seq' > cap's 'last_issue',
  then MDS will skip
  removing the cap. And
  then the _do_cap_release()
  will issue the 'Ax' caps
  back to rw client.

  Then wakes up the ro
  client's lookup request,
  while the lookup request
  will try to revoke the
  'Ax' caps again from the
  rw client.

This will cause a spinlock infinitely in mds side.

Fixes: https://tracker.ceph.com/issues/64977
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but this should be qa'd

@batrick
Copy link
Member

batrick commented Jun 22, 2024

jenkins test api

@batrick
Copy link
Member

batrick commented Jun 22, 2024

This PR is under test in https://tracker.ceph.com/issues/66609.

@batrick
Copy link
Member

batrick commented Jun 23, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants