Skip to content

client: issue a cap release immediately if no cap exists#51500

Merged
vshankar merged 4 commits intoceph:mainfrom
lxbsz:wip-61148
Jun 20, 2023
Merged

client: issue a cap release immediately if no cap exists#51500
vshankar merged 4 commits intoceph:mainfrom
lxbsz:wip-61148

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented May 16, 2023

In case:

           mds                             client
                                - Releases cap and put Inode
  - Increase cap->seq and sends
    revokes req to the client
  - Receives release req and    - Receives & drops the revoke req
    skip removing the cap and
    then eval the CInode and
    issue or revoke caps again.
                                - Receives & drops the caps update
                                  or revoke req
  - Health warning for client
    isn't responding to
    mclientcaps(revoke)

All the IMPORT/REVOKE/GRANT cap ops will increase the session seq
in MDS side and then the client need to issue a cap release to
unblock MDS to remove the corresponding cap to unblock possible
waiters.

Fixes: https://tracker.ceph.com/issues/57244
Fixes: https://tracker.ceph.com/issues/61148
Signed-off-by: Xiubo Li xiubli@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@lxbsz lxbsz requested a review from a team May 16, 2023 02:20
@github-actions github-actions bot added the cephfs Ceph File System label May 16, 2023
@lxbsz lxbsz force-pushed the wip-61148 branch 3 times, most recently from 66b083f to 717d62b Compare May 16, 2023 03:06
Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxbsz Could you explain how this change fixed the issue mentioned in https://tracker.ceph.com/issues/61148#note-7

@lxbsz
Copy link
Member Author

lxbsz commented May 17, 2023

@lxbsz Could you explain how this change fixed the issue mentioned in https://tracker.ceph.com/issues/61148#note-7

The 9fbde6d will remove the cap entity directly in MDS side even the seqs mismatch due to the racy between mds and client.

This is okay for the caps revoke/grant/import cases, because the clients will drop the cap metadatas directly too.
But for the case of https://tracker.ceph.com/issues/61148#note-7 the client won't drop the cap metadatas and will add it to the client instead. Then the cap_id and seq will be stale.

And if a new client request comes to MDS and when encoding the reply message it will add a new cap entity in the MDS side with a fresh new cap_id and a new seq.

This is why we can see the call traces and warning logs in dmesg.

Since only the caps revoke/grant/import have the issue we were trying to fix in 9fbde6d, then we can just issue a release immediately in client side just before dropping them.

@vshankar
Copy link
Contributor

@lxbsz Could you explain how this change fixed the issue mentioned in https://tracker.ceph.com/issues/61148#note-7

The 9fbde6d will remove the cap entity directly in MDS side even the seqs mismatch due to the racy between mds and client.

For the record - commit 9fbde6d removes the part where a cap release from the client removes the cap from the revocation list (but does not remove the cap) if the mds had sent a revoke. Without this the "client failed to respond to cap revoke" warning would show up since the mds is tracking the cap in revokes list but the client has already dropped it.

This is okay for the caps revoke/grant/import cases, because the clients will drop the cap metadatas directly too. But for the case of https://tracker.ceph.com/issues/61148#note-7 the client won't drop the cap metadatas and will add it to the client instead. Then the cap_id and seq will be stale.

Got it. So with 9fbde6d the mds lost the cap metadata but the client keeps a note of it during open() when the open() call in the mds races with the delayed cap release (in the mds).

And if a new client request comes to MDS and when encoding the reply message it will add a new cap entity in the MDS side with a fresh new cap_id and a new seq.

Looks fine so far.

This is why we can see the call traces and warning logs in dmesg.

Since only the caps revoke/grant/import have the issue we were trying to fix in 9fbde6d, then we can just issue a release immediately in client side just before dropping them.

So, we add the logic back where the mds would remove the cap from the revokes list and not remove the cap from the inode, but the client is not tracking the relevant caps. I need to closely check the code you added to send cap releases...

@lxbsz
Copy link
Member Author

lxbsz commented May 18, 2023

This is why we can see the call traces and warning logs in dmesg.
Since only the caps revoke/grant/import have the issue we were trying to fix in 9fbde6d, then we can just issue a release immediately in client side just before dropping them.

So, we add the logic back where the mds would remove the cap from the revokes list and not remove the cap from the inode, but the client is not tracking the relevant caps. I need to closely check the code you added to send cap releases...

This is what the import will do if it couldn't find the corresponding cap in clients. When sending cap releases to MDS in this case it will use the increased seq and then the MDS could remove the cap since the seqs will match this time.

I just add the revoke/grant cases. Please note only the import/revoke/grant cases will increase the seq in MDS side.

m->get_mseq(),
cap_epoch_barrier);
} else {
ldout(cct, 5) << __func__ << " don't have vino " << vino << ", dropping" << dendl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else case flushes the queued release caps. This behaviour is modified by this change - do you see any issues due to that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the else case will only print this debug log and the flush just out side the else case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the current code will always flush queued cap releases if vino isn't found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that's should be okay and the flush cap release will be trigger per tick.

Why it flushed it here is because the MDS may waiting for the cap release here in IMPORT case. And only when the MDS is waiting for it will it make sense to trigger the flush immediately here, or the MDS may will be stuck for 1 seconds at most.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Fair enough.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vshankar
Copy link
Contributor

vshankar commented May 22, 2023

@lxbsz In the interest if time (to get this PR tested and merged), should we revert PR #47752 for the time being?

EDIT: I already ack'd this and plan to put it to test, however, I might be on PTO for sometime. I'll try my best to get this tested asap, however, the main branch runs have the cap_id mismatch asserts.

@lxbsz
Copy link
Member Author

lxbsz commented May 22, 2023

@lxbsz In the interest if time (to get this PR tested and merged), should we revert PR #47752 for the time being?

EDIT: I already ack'd this and plan to put it to test, however, I might be on PTO for sometime. I'll try my best to get this tested asap, however, the main branch runs have the cap_id mismatch asserts.

Sure. Later I will generate a new PR together with #51500.

@lxbsz
Copy link
Member Author

lxbsz commented May 22, 2023

@lxbsz In the interest if time (to get this PR tested and merged), should we revert PR #47752 for the time being?

EDIT: I already ack'd this and plan to put it to test, however, I might be on PTO for sometime. I'll try my best to get this tested asap, however, the main branch runs have the cap_id mismatch asserts.

IMO, only revert 9fbde6d commit should be enough.

@vshankar
Copy link
Contributor

@lxbsz In the interest if time (to get this PR tested and merged), should we revert PR #47752 for the time being?
EDIT: I already ack'd this and plan to put it to test, however, I might be on PTO for sometime. I'll try my best to get this tested asap, however, the main branch runs have the cap_id mismatch asserts.

IMO, only revert 9fbde6d commit should be enough.

True, however, I would prefer a PR revert (since it reverts the whole change) and then pick the other commits into this PR. WDYT?

@lxbsz
Copy link
Member Author

lxbsz commented May 22, 2023

@lxbsz In the interest if time (to get this PR tested and merged), should we revert PR #47752 for the time being?
EDIT: I already ack'd this and plan to put it to test, however, I might be on PTO for sometime. I'll try my best to get this tested asap, however, the main branch runs have the cap_id mismatch asserts.

IMO, only revert 9fbde6d commit should be enough.

True, however, I would prefer a PR revert (since it reverts the whole change) and then pick the other commits into this PR. WDYT?

Sure, np.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

lxbsz added 4 commits May 23, 2023 07:53
Will add the debug logs later in confirm_receipt().

Fixes: https://tracker.ceph.com/issues/57244
Signed-off-by: Xiubo Li <xiubli@redhat.com>
When revoking caps from clients and if the clients could release
some of the caps references and the clients still could send cap
update request back to MDS, while the confirm_receipt() will clear
the _revokes list anyway.

But this cap will still be kept in revoking_caps list.

At the same time add one debug log when revocation is not totally
finished.

Fixes: https://tracker.ceph.com/issues/57244
Signed-off-by: Xiubo Li <xiubli@redhat.com>
In case:

           mds                             client
                                - Releases cap and put Inode
  - Increase cap->seq and sends
    revokes req to the client
  - Receives release req and    - Receives & drops the revoke req
    skip removing the cap and
    then eval the CInode and
    issue or revoke caps again.
                                - Receives & drops the caps update
                                  or revoke req
  - Health warning for client
    isn't responding to
    mclientcaps(revoke)

All the IMPORT/REVOKE/GRANT cap ops will increase the session seq
in MDS side and then the client need to issue a cap release to
unblock MDS to remove the corresponding cap to unblock possible
waiters.

Fixes: https://tracker.ceph.com/issues/57244
Fixes: https://tracker.ceph.com/issues/61148
Signed-off-by: Xiubo Li <xiubli@redhat.com>
When writing to a file and the max_size is approaching the client
will try to trigger to call check_caps() and flush the caps to MDS.
But just in case the MDS is revoking Fsxrw caps, since the client
keeps writing and holding the Fw caps it may only release part of
the caps but the Fw.

Fixes: https://tracker.ceph.com/issues/57244
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented May 22, 2023

Pushed the reverted commits back from #47752 to this new PR and removed the problematic one.

Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vshankar
Copy link
Contributor

Pkgs building now -- will update with the run link.

@vshankar
Copy link
Contributor

@vshankar
Copy link
Contributor

https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/

Test run looks fine. I'll update the run wiki and merge this today. Nice work @lxbsz

@lxbsz
Copy link
Member Author

lxbsz commented Jun 19, 2023

https://pulpito.ceph.com/vshankar-2023-06-15_04:58:28-fs-wip-vshankar-testing-20230614.124123-testing-default-smithi/

Test run looks fine. I'll update the run wiki and merge this today. Nice work @lxbsz

Cool, Thanks @vshankar !

@vshankar
Copy link
Contributor

session->enqueue_cap_release(
m->get_ino(),
m->get_cap_id(),
m->get_seq(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic (change also in the kernel driver), causing:

https://tracker.ceph.com/issues/64977

It should be the last issue_seq but at this point the client does not have a valid issue_seq because the cap is already removed. Furthermore, the MDS does not share its view of the issue_seq associated with the grant/revoke; sadly only clients actually set the issue_seq in an MClientCaps message header.

For the CEPH_CAP_OP_IMPORT case, it was okay to use m->get_seq() because that's implicitly the issue_seq as well but this is not okay for the CEPH_CAP_OP_GRANT or CEPH_CAP_OP_REVOKE cases.

I'm confused by how the original issue occurs with the stated sequence in this PR's description:

https://tracker.ceph.com/issues/57244

specifically the second step on the MDS side "Receives release req and...". The MDS does not increase issue_seq when sending revokes. The client's release should be processed successfully. This is an expected sequence of events which was accounted for as long ago as b180b6c. I think there was something else going on here that should be investigated.

With that said, we're now obligated to support this new scenario where the client sends a cap release with issue_seq == (last revoke/grant) seq. This only happens if the client is basically saying: "I don't acknowledge that I have this cap; please release it.". The MDS should be able to handle this gracefully and that's basically done in #56828.

I'm in the middle of some cleanup relating to issue_seq but wanted to leave this note here for anyone looking at this in the future. Cleanup tracked in https://tracker.ceph.com/issues/66704.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CEPH_CAP_OP_IMPORT case, it was okay to use m->get_seq() because that's implicitly the issue_seq as well but this is not okay for the CEPH_CAP_OP_GRANT or CEPH_CAP_OP_REVOKE cases.

Agreed. For the wider audience, the last_sent/last_issue/mseq has deep history to solve particularly this release/revoke race which unfortunately broke with this commit.

specifically the second step on the MDS side "Receives release req and...". The MDS does not increase issue_seq when sending revokes. The client's release should be processed successfully. This is an expected sequence of events which was accounted for as long ago as b180b6c. I think there was something else going on here that should be investigated.

... and this wasn't caught with fs suite runs which makes me worried a lot. Yes, it's a concurrent release/revoke case, but really we aren't testing to the extent where such cases are covered (esp. since we have now seen a couple of reports for this bug, one without much scale involved).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic (change also in the kernel driver), causing:

https://tracker.ceph.com/issues/64977

It should be the last issue_seq but at this point the client does not have a valid issue_seq because the cap is already removed.

How could this happen is what I eagerly want to figure out, after that I think we can resolve all the confusion about the revoke stuck issues.

Furthermore, the MDS does not share its view of the issue_seq associated with the grant/revoke; sadly only clients actually set the issue_seq in an MClientCaps message header.

The issue_seq in client will be updated only together with the client request reply msgs.

For the CEPH_CAP_OP_IMPORT case, it was okay to use m->get_seq() because that's implicitly the issue_seq as well but this is not okay for the CEPH_CAP_OP_GRANT or CEPH_CAP_OP_REVOKE cases.

I'm confused by how the original issue occurs with the stated sequence in this PR's description:

https://tracker.ceph.com/issues/57244

specifically the second step on the MDS side "Receives release req and...". The MDS does not increase issue_seq when sending revokes. The client's release should be processed successfully. This is an expected sequence of events which was accounted for as long ago as b180b6c. I think there was something else going on here that should be investigated.

With that said, we're now obligated to support this new scenario where the client sends a cap release with issue_seq == (last revoke/grant) seq. This only happens if the client is basically saying: "I don't acknowledge that I have this cap; please release it.". The MDS should be able to handle this gracefully and that's basically done in #56828.

I'm in the middle of some cleanup relating to issue_seq but wanted to leave this note here for anyone looking at this in the future. Cleanup tracked in https://tracker.ceph.com/issues/66704.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants