client: fix file cache cap leak which can stall async read call#55144
client: fix file cache cap leak which can stall async read call#55144
Conversation
ecd4152 to
658b36f
Compare
| TearDown(); | ||
| SetUp(); | ||
|
|
||
| sprintf(filename, "test_llreadvcontiguousllwritevnoncontiguousfile%u", mypid); |
There was a problem hiding this comment.
nit: smaller file names should be fine w/ pid suffixed :)
src/client/Client.cc
Outdated
| // was incremented. So, it is a must to put it down before readahead | ||
| // otherwise it will stall the IO call since it will keep on waiting for | ||
| // the file cache cap to be released. | ||
| if ((in->cap_refs.at(CEPH_CAP_FILE_CACHE)) > 1) { |
There was a problem hiding this comment.
No, this is absolutely wrong. In no case can you condition changing a ref count on the current value of the ref count — that just means you're not reference counting correctly elsewhere.
You need to figure out how the other usage is incorrect, and resolve it in a way that is coherent across the entire use system.
There was a problem hiding this comment.
I do believe the do_readahead() code is buggy, because it always assumes that once here it already has gotten the Frc caps and will call the get_cap_ref() directly instead of get_caps(), which will wait for the caps to be available.
But what if in Fl case ? It also could trigger the do_readahead() from _read_async(). What the kclient will do is that for the readahead it will try to get the needed Frc caps just before doing readehead and if fails to get the caps it will just skip the readahead.
There was a problem hiding this comment.
But what if in Fl case ? It also could trigger the do_readahead() from _read_async(). What the kclient will do is that for the readahead it will try to get the needed Frc caps just before doing readehead and if fails to get the caps it will just skip the readahead.
We get Frc in _read so even if it is Fl we do have the needed cap ref count while doing readahead
2024-01-25T11:29:26.227+0530 7f5fe588c9c0 10 client.4297 inode before calling _read_async 0x10000000000.head(faked_ino=0 nref=12 ll_ref=1 cap_refs={4=0,1024=1,2048=1,4096=0,8192=1} open={3=1} mode=100666 size=14/4194304 nlink=1 btime=2024-01-25T11:29:26.180449+0530 mtime=2024-01-25T11:29:26.228389+0530 ctime=2024-01-25T11:29:26.228389+0530 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 14] parents=0x1.head["test_llreadvllwritevfile475879"] 0x7f5fb80060f0)
above is the debug line i had added
There was a problem hiding this comment.
Please see https://github.com/ceph/ceph/blob/main/src/mds/locks.c#L109-L110, it could just issue Frl.
There was a problem hiding this comment.
https://github.com/ceph/ceph/blob/main/src/client/Client.cc#L10827 here it gets the Frc
There was a problem hiding this comment.
@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the
gotcaps, which isneed | want, references.While for the userspace client it will only increase the
needcaps references. This is why in the_read_async()we need to increase theFccaps reference before unlocking theclient_lockor before doing readahead.Maybe we can just fix this in
get_caps()to get the whole references forneed | wantand then it could make the code simpler.@gregsfortytwo @vshankar What do you think ?
If we do this, does this need changes to the way we drop the cap refs? How is it done in the kernel client code?
There was a problem hiding this comment.
so if the cap ref count of want is incremented in get_caps(), then the get_cap_ref() calls for want caps will need to be removed from all the places but this also means that the put_cap_ref() calls are still a thing that needs to be managed manually at respective places; operating get and put at different places looks weird and this might be the reason why cap ref count incrementation of want caps is handled manually. Need to check how kclient does it
There was a problem hiding this comment.
I have two solutions in my mind -
- As @lxbsz mentioned, enable get_caps() to increment cap ref count of
wantcaps too. OR - Increment cap ref count in _read_async() before calling ObjectCacher::file_read and decrement once we're done reading
There was a problem hiding this comment.
@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the
gotcaps, which isneed | want, references.While for the userspace client it will only increase the
needcaps references. This is why in the_read_async()we need to increase theFccaps reference before unlocking theclient_lockor before doing readahead.
That is rather an implementation difference IMO: need caps need to be issued by the MDS and want caps should not be revoked (or under revocation). In the end, the client needs to hold the set of caps that are required for an operation to be performed - so, until the want caps are not released, the functionality is safe, correct? @lxbsz
Maybe we can just fix this in
get_caps()to get the whole references forneed | wantand then it could make the code simpler.
+1 for this.
@gregsfortytwo @vshankar What do you think ?
(going through further comments on this)
There was a problem hiding this comment.
@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the
gotcaps, which isneed | want, references.
When the client does not already have the wanted caps, then only need caps are taken refs on. There is a bit on difference the way ref handling in done b/w the kclient and user-space driver.
|
From my conversations with @lxbsz, we discussed how kclient handles the async/sync ops. The function to get caps in kclient would increment ref count for both ea86729 would increment the
Do let me know if this makes sense @gregsfortytwo @lxbsz |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
0bddbae to
2903934
Compare
|
@vshankar what do you think about the approach? |
|
rationale(copied from the commit message) We need the Therefore provide
|
Couldn't get to reviewing it :/ Will have a look.... |
@dparmar18 I'd prefer to do it the same with kclient. I know there maybe more code need to be refactored but it will make it simple to maintain in future. |
Had a discussion with @lxbsz about this and it seems there will be a lot of refactoring involved here and there are some special cases (like lazyio) that will need to taken care off. I'm in strong favour of aligning userspace client code with kclient's but I'd not be practical to start doing it without having proper context(s), I want to read kclient code first to understand how it's done there before jumping on to doing anything. |
Tracker for the issue https://tracker.ceph.com/issues/64284 |
|
This PR is under test in https://tracker.ceph.com/issues/66753. |
rishabh-d-dave
left a comment
There was a problem hiding this comment.
QA run was unsuccessful - https://pulpito.ceph.com/rishabh-2024-07-04_16:36:07-fs-wip-rishabh-testing-20240701.141150-debug-testing-default-smithi/7787244. Test added in this PR failed.
* refs/pull/55144/head: client: fix file cache cap leak which can stall async read call test/client: test contiguous read for a non-contiguous write Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
We need the `Fc` cap ref when ObjectCacher::file_read returns zero bytes; then wait for the I/O to finish; then do readahead but this cap ref needs to be put down as well at the right place. The problem with the current implementation is that it is difficult to decide whether to revoke the `Fc` cap ref in Client::C_Read_Async_Finisher::finish or not since the destructor cannot differenciate between the cases as it is fed with the bytes that are read after waiting for the I/O to be finished. Therefore provide `Fc` cap ref to the inode before making a call to ObjectCacher::file_read in Client::_read_async and handle revoking it: - if async read then in Client::C_Read_Async_Finisher::finish - else if sync read then within Client::_read_async right before calling Client::do_readahead. Fixes: https://tracker.ceph.com/issues/63896 Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
I hope this just works fine in teuthology too 🤞 @rishabh-d-dave |
|
jenkins test make check |
|
jenkins test api |
|
jenkins test make check |
Could you please elaborate on what the issue was in the failed job? |
In the previous iteration, cap ref was wrongly being put down which lead to an assertion failure: This has been fixed now and the code has been made simpler. |
| put_cap_ref(in, CEPH_CAP_FILE_CACHE); | ||
| update_read_io_size(bl->length()); | ||
| } else { | ||
| put_cap_ref(in, CEPH_CAP_FILE_CACHE); |
There was a problem hiding this comment.
nit: this could be done in the fall through path.
There was a problem hiding this comment.
doable but I think it's better readable this way, if the num of bytes read is 0, do some tasks, if not, just release the cap ref.
|
This PR is under test in https://tracker.ceph.com/issues/67369. |
* refs/pull/55144/head: client: fix file cache cap leak which can stall async read call test/client: test contiguous read for a non-contiguous write Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e