Skip to content

mds,client: update the oldest_client_tid via the renew caps#54259

Merged
vshankar merged 2 commits intoceph:mainfrom
lxbsz:wip-63364
Nov 15, 2023
Merged

mds,client: update the oldest_client_tid via the renew caps#54259
vshankar merged 2 commits intoceph:mainfrom
lxbsz:wip-63364

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Oct 31, 2023

Update the oldest_client_tid via the session renew caps msg to
make sure that the MDSs won't pile up the completed request list
in a very large size.

Fixes: https://tracker.ceph.com/issues/63364

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@lxbsz lxbsz requested review from a team and mchangir October 31, 2023 03:52
@github-actions github-actions bot added the cephfs Ceph File System label Oct 31, 2023
mds->locker->resume_stale_caps(session);
mds->sessionmap.touch_session(session);
}
trim_completed_request_list(m->oldest_client_tid, session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the FLUSH_ACK that is usually sent back for a FLUSH req ?
isn't that necessary in this situation ?
maybe we should send back the oldest_tid in the renew caps ack msg and act on the oldest tid on the client side as if a FLUSH_ACK was was received

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flush ack sounds not reliable, we couldn't guarantee that this will always work. But for the session's renew caps msg we can be sure that this will be sent per tick. This will also be used to keep the client alive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please note the WRITE requests do not mean the real write contents to Rados, the WRITE means the request ops have the bit of #define CEPH_MDS_OP_WRITE 0x001000. Such as create/setattr,etc.

So if all the WRITE requests are sync create ops, it seems there has no cap flush or flush ack between MDS and clients.

lxbsz added 2 commits October 31, 2023 12:27
Update the oldest_client_tid via the session renew caps msg to
make sure that the MDSs won't pile up the completed request list
in a very large size.

Fixes: https://tracker.ceph.com/issues/63364
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@vshankar
Copy link
Contributor

Can this be worked out without the clients' intervention by maintaining the last tid the MDS acknowledged to a client (during safe reply) and then in trimming completed_requests in tick()?

@lxbsz
Copy link
Member Author

lxbsz commented Nov 1, 2023

Can this be worked out without the clients' intervention by maintaining the last tid the MDS acknowledged to a client (during safe reply) and then in trimming completed_requests in tick()?

I am afraid this won't work. From my understanding the completed_requests list just means to make sure that the clients correctly receive the client request replies and the reqs are well handled. If we just skip clients' intervention it will change the purpose of completed_requests.

@lxbsz
Copy link
Member Author

lxbsz commented Nov 1, 2023

jenkins retest this please

@mchangir
Copy link
Contributor

mchangir commented Nov 2, 2023

Can this be worked out without the clients' intervention by maintaining the last tid the MDS acknowledged to a client (during safe reply) and then in trimming completed_requests in tick()?

I am afraid this won't work. From my understanding the completed_requests list just means to make sure that the clients correctly receive the client request replies and the reqs are well handled. If we just skip clients' intervention it will change the purpose of completed_requests.

pardon my ignorance ... but what is the use of maintaining completed_requests ?
(maybe avoiding retries and achieving idempotence ?)
how does the client use this info once the request has been executed and acked by the mds ?
anything related to async calls ?

@lxbsz
Copy link
Member Author

lxbsz commented Nov 2, 2023

Can this be worked out without the clients' intervention by maintaining the last tid the MDS acknowledged to a client (during safe reply) and then in trimming completed_requests in tick()?

I am afraid this won't work. From my understanding the completed_requests list just means to make sure that the clients correctly receive the client request replies and the reqs are well handled. If we just skip clients' intervention it will change the purpose of completed_requests.

pardon my ignorance ... but what is the use of maintaining completed_requests ? (maybe avoiding retries and achieving idempotence ?) how does the client use this info once the request has been executed and acked by the mds ? anything related to async calls ?

Such as from https://github.com/ceph/ceph/blob/main/src/mds/Server.cc#L2497-L2528, when the clients are replaying the completed requests, the MDS will just treat them as lookup requests, instead of handling the request twice.

There also some other places doing similar check in mds. I am afraid handling the same request twice will fail, such when creating a new file, the second try will fail with file already existing ?

So we couldn't trim the completed_request list by skipping the clients' intervention IMO.

Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vshankar
Copy link
Contributor

jenkins test api

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants