client: fix file cache cap leak which can stall async read call by dparmar18 · Pull Request #55144 · ceph/ceph

dparmar18 · 2024-01-11T11:34:24Z

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

src/client/Client.cc

vshankar · 2024-01-17T15:31:45Z

src/test/client/nonblocking.cc

+  TearDown();
+  SetUp();
+
+  sprintf(filename, "test_llreadvcontiguousllwritevnoncontiguousfile%u", mypid);


nit: smaller file names should be fine w/ pid suffixed :)

src/client/Client.cc

gregsfortytwo · 2024-01-24T17:39:47Z

src/client/Client.cc

+  // was incremented. So, it is a must to put it down before readahead
+  // otherwise it will stall the IO call since it will keep on waiting for
+  // the file cache cap to be released.
+  if ((in->cap_refs.at(CEPH_CAP_FILE_CACHE)) > 1) {


No, this is absolutely wrong. In no case can you condition changing a ref count on the current value of the ref count — that just means you're not reference counting correctly elsewhere.
You need to figure out how the other usage is incorrect, and resolve it in a way that is coherent across the entire use system.

I do believe the do_readahead() code is buggy, because it always assumes that once here it already has gotten the Frc caps and will call the get_cap_ref() directly instead of get_caps(), which will wait for the caps to be available.

But what if in Fl case ? It also could trigger the do_readahead() from _read_async(). What the kclient will do is that for the readahead it will try to get the needed Frc caps just before doing readehead and if fails to get the caps it will just skip the readahead.

But what if in Fl case ? It also could trigger the do_readahead() from _read_async(). What the kclient will do is that for the readahead it will try to get the needed Frc caps just before doing readehead and if fails to get the caps it will just skip the readahead.

We get Frc in _read so even if it is Fl we do have the needed cap ref count while doing readahead

2024-01-25T11:29:26.227+0530 7f5fe588c9c0 10 client.4297 inode before calling _read_async 0x10000000000.head(faked_ino=0 nref=12 ll_ref=1 cap_refs={4=0,1024=1,2048=1,4096=0,8192=1} open={3=1} mode=100666 size=14/4194304 nlink=1 btime=2024-01-25T11:29:26.180449+0530 mtime=2024-01-25T11:29:26.228389+0530 ctime=2024-01-25T11:29:26.228389+0530 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 14] parents=0x1.head["test_llreadvllwritevfile475879"] 0x7f5fb80060f0)

above is the debug line i had added

Please see https://github.com/ceph/ceph/blob/main/src/mds/locks.c#L109-L110, it could just issue Frl.

https://github.com/ceph/ceph/blob/main/src/client/Client.cc#L10827 here it gets the Frc

@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the got caps, which is need | want, references.

While for the userspace client it will only increase the need caps references. This is why in the _read_async() we need to increase the Fc caps reference before unlocking the client_lock or before doing readahead.

Maybe we can just fix this in get_caps() to get the whole references for need | want and then it could make the code simpler.

@gregsfortytwo @vshankar What do you think ?

If we do this, does this need changes to the way we drop the cap refs? How is it done in the kernel client code?

so if the cap ref count of want is incremented in get_caps(), then the get_cap_ref() calls for want caps will need to be removed from all the places but this also means that the put_cap_ref() calls are still a thing that needs to be managed manually at respective places; operating get and put at different places looks weird and this might be the reason why cap ref count incrementation of want caps is handled manually. Need to check how kclient does it

I have two solutions in my mind -

As @lxbsz mentioned, enable get_caps() to increment cap ref count of want caps too. OR

Increment cap ref count in _read_async() before calling ObjectCacher::file_read and decrement once we're done reading

@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the got caps, which is need | want, references.

While for the userspace client it will only increase the need caps references. This is why in the _read_async() we need to increase the Fc caps reference before unlocking the client_lock or before doing readahead.

That is rather an implementation difference IMO: need caps need to be issued by the MDS and want caps should not be revoked (or under revocation). In the end, the client needs to hold the set of caps that are required for an operation to be performed - so, until the want caps are not released, the functionality is safe, correct? @lxbsz

Maybe we can just fix this in get_caps() to get the whole references for need | want and then it could make the code simpler.

+1 for this.

@gregsfortytwo @vshankar What do you think ?

(going through further comments on this)

@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the got caps, which is need | want, references.

When the client does not already have the wanted caps, then only need caps are taken refs on. There is a bit on difference the way ref handling in done b/w the kclient and user-space driver.

dparmar18 · 2024-01-29T12:02:49Z

From my conversations with @lxbsz, we discussed how kclient handles the async/sync ops. The function to get caps in kclient would increment ref count for both need and want caps VS userspace client does this for ONLY need caps. We can try to do this but would require us refactoring a lot of code since getting/putting the ref counts of want caps would need to taken care off and I'm completely in favour of this if all agrees to, if not I've tried to mimic what kclient does but not entirely the way it does it.

ea86729 would increment the Fc cap ref count just before the ObjectCacher::file_read call in Client::_read_async, while decrement would happen at the respective place i.e.:

if async call then Client::C_Read_Async_Finisher::finish
if sync call then inside Client::_read_async just before calling Client::do_readahead

Do let me know if this makes sense @gregsfortytwo @lxbsz

github-actions · 2024-01-30T04:54:16Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

dparmar18 · 2024-01-31T08:29:53Z

@vshankar what do you think about the approach?

dparmar18 · 2024-01-31T08:30:59Z

rationale(copied from the commit message)

We need the Fc cap ref when ObjectCacher::file_read returns zero bytes; then
wait for the I/O to finish; then do readahead but this cap ref needs to be put
down as well at the right place. The problem with the current implementation is
that it is difficult to decide whether to revoke the Fc cap ref in
Client::C_Read_Async_Finisher::finish or not since the destructor cannot
differenciate between the cases as it is fed with the bytes that are read after
waiting for the I/O to be finished.

Therefore provide Fc cap ref to the inode before making a call to
ObjectCacher::file_read in Client::_read_async and handle revoking it:

if async read then in Client::C_Read_Async_Finisher::finish
else if sync read then within Client::_read_async right before calling
Client::do_readahead.

vshankar · 2024-01-31T08:32:21Z

@vshankar what do you think about the approach?

Couldn't get to reviewing it :/

Will have a look....

lxbsz · 2024-02-01T04:36:51Z

From my conversations with @lxbsz, we discussed how kclient handles the async/sync ops. The function to get caps in kclient would increment ref count for both need and want caps VS userspace client does this for ONLY need caps. We can try to do this but would require us refactoring a lot of code since getting/putting the ref counts of want caps would need to taken care off and I'm completely in favour of this if all agrees to, if not I've tried to mimic what kclient does but not entirely the way it does it.

ea86729 would increment the Fc cap ref count just before the ObjectCacher::file_read call in Client::_read_async, while decrement would happen at the respective place i.e.:

if async call then Client::C_Read_Async_Finisher::finish

if sync call then inside Client::_read_async just before calling Client::do_readahead

Do let me know if this makes sense @gregsfortytwo @lxbsz

@dparmar18 I'd prefer to do it the same with kclient. I know there maybe more code need to be refactored but it will make it simple to maintain in future.

dparmar18 · 2024-02-01T10:20:45Z

From my conversations with @lxbsz, we discussed how kclient handles the async/sync ops. The function to get caps in kclient would increment ref count for both need and want caps VS userspace client does this for ONLY need caps. We can try to do this but would require us refactoring a lot of code since getting/putting the ref counts of want caps would need to taken care off and I'm completely in favour of this if all agrees to, if not I've tried to mimic what kclient does but not entirely the way it does it.
ea86729 would increment the Fc cap ref count just before the ObjectCacher::file_read call in Client::_read_async, while decrement would happen at the respective place i.e.:

if async call then Client::C_Read_Async_Finisher::finish

if sync call then inside Client::_read_async just before calling Client::do_readahead

Do let me know if this makes sense @gregsfortytwo @lxbsz

@dparmar18 I'd prefer to do it the same with kclient. I know there maybe more code need to be refactored but it will make it simple to maintain in future.

Had a discussion with @lxbsz about this and it seems there will be a lot of refactoring involved here and there are some special cases (like lazyio) that will need to taken care off. I'm in strong favour of aligning userspace client code with kclient's but I'd not be practical to start doing it without having proper context(s), I want to read kclient code first to understand how it's done there before jumping on to doing anything.

dparmar18 · 2024-02-01T10:21:01Z

@dparmar18 I'd prefer to do it the same with kclient. I know there maybe more code need to be refactored but it will make it simple to maintain in future.

Had a discussion with @lxbsz about this and it seems there will be a lot of refactoring involved here and there are some special cases (like lazyio) that will need to taken care off. I'm in strong favour of aligning userspace client code with kclient's but I'd not be practical to start doing it without having proper context(s), I want to read kclient code first to understand how it's done there before jumping on to doing anything.

Tracker for the issue https://tracker.ceph.com/issues/64284

rishabh-d-dave · 2024-06-28T13:54:13Z

This PR is under test in https://tracker.ceph.com/issues/66753.

rishabh-d-dave

QA run was unsuccessful - https://pulpito.ceph.com/rishabh-2024-07-04_16:36:07-fs-wip-rishabh-testing-20240701.141150-debug-testing-default-smithi/7787244. Test added in this PR failed.

* refs/pull/55144/head: client: fix file cache cap leak which can stall async read call test/client: test contiguous read for a non-contiguous write Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>

We need the `Fc` cap ref when ObjectCacher::file_read returns zero bytes; then wait for the I/O to finish; then do readahead but this cap ref needs to be put down as well at the right place. The problem with the current implementation is that it is difficult to decide whether to revoke the `Fc` cap ref in Client::C_Read_Async_Finisher::finish or not since the destructor cannot differenciate between the cases as it is fed with the bytes that are read after waiting for the I/O to be finished. Therefore provide `Fc` cap ref to the inode before making a call to ObjectCacher::file_read in Client::_read_async and handle revoking it: - if async read then in Client::C_Read_Async_Finisher::finish - else if sync read then within Client::_read_async right before calling Client::do_readahead. Fixes: https://tracker.ceph.com/issues/63896 Signed-off-by: Dhairya Parmar <dparmar@redhat.com>

dparmar18 · 2024-07-11T22:05:51Z

ceph_test_client:

[       OK ] TestClient.LlreadvLlwritevInvalidFileHandleSync (130 ms)
[----------] 27 tests from TestClient (7975 ms total)

[----------] Global test environment tear-down
[==========] 27 tests from 1 test suite ran. (7975 ms total)
[  PASSED  ] 27 tests.

ceph_test_libcephfs

[----------] Global test environment tear-down
[==========] 116 tests from 4 test suites ran. (147651 ms total)
[  PASSED  ] 116 tests.

ceh_test_libcephfs_lazyio

[----------] Global test environment tear-down
[==========] 5 tests from 1 test suite ran. (1517 ms total)
[  PASSED  ] 5 tests.

I hope this just works fine in teuthology too 🤞 @rishabh-d-dave

dparmar18 · 2024-07-12T12:22:59Z

green https://pulpito.ceph.com/dparmar-2024-07-12_07:55:39-fs:libcephfs-wip-63896-distro-default-smithi/

dparmar18 · 2024-07-12T15:34:15Z

jenkins test make check

dparmar18 · 2024-07-12T15:34:23Z

jenkins test api

dparmar18 · 2024-07-15T12:32:02Z

jenkins test make check

vshankar · 2024-07-18T06:17:01Z

green https://pulpito.ceph.com/dparmar-2024-07-12_07:55:39-fs:libcephfs-wip-63896-distro-default-smithi/

Could you please elaborate on what the issue was in the failed job?

dparmar18 · 2024-08-02T12:32:05Z

green https://pulpito.ceph.com/dparmar-2024-07-12_07:55:39-fs:libcephfs-wip-63896-distro-default-smithi/

Could you please elaborate on what the issue was in the failed job?

In the previous iteration, cap ref was wrongly being put down which lead to an assertion failure:

g9dba1e83/rpm/el9/BUILD/ceph-19.0.0-4667-g9dba1e83/src/client/Inode.cc: In function 'int Inode::put_cap_ref(int)' thread 7f3724c00640 time 2024-07-04T17:30:13.240196+0000
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-4667-g9dba1e83/rpm/el9/BUILD/ceph-19.0.0-4667-g9dba1e83/src/client/Inode.cc: 207: FAILED ceph_assert(cap_refs[c] > 0)
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr:
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: ceph version 19.0.0-4667-g9dba1e83 (9dba1e836b4756bf3a32be2a4936f371dff55008) squid (dev)
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x127) [0x7f3735e49f05]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x24a112) [0x7f3735e4a112]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 3: (Inode::put_cap_ref(int)+0x21b) [0x562d8580fdf1]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 4: (Client::put_cap_ref(Inode*, int)+0x32) [0x562d857683b8]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 5: (Client::_flushed(Inode*)+0x3f) [0x562d8576893b]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 6: (Client::flush_set_callback(ObjectCacher::ObjectSet*)+0x4e) [0x562d85768a9c]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 7: (client_flush_set_callback(void*, ObjectCacher::ObjectSet*)+0xd) [0x562d8573db32]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 8: (ObjectCacher::bh_write_commit(long, sobject_t, std::vector<std::pair<long, unsigned long>, std::allocator<std::pair<long, unsigned long> > >&, unsigned long, int)+0x1084) [0x562d85865f5c]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 9: ceph_test_client(+0x209147) [0x562d85866147]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 10: ceph_test_client(+0xb54ad) [0x562d857124ad]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 11: ceph_test_client(+0xca4a2) [0x562d857274a2]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 12: ceph_test_client(+0xb54ad) [0x562d857124ad]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 13: (Finisher::finisher_thread_entry()+0x495) [0x7f3735dd300d]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 14: ceph_test_client(+0xdd29d) [0x562d8573a29d]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 15: (Thread::entry_wrapper()+0x43) [0x7f3735e23c9f]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 16: (Thread::_entry_func(void*)+0xd) [0x7f3735e23cbb]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 17: /lib64/libc.so.6(+0x89c02) [0x7f3735489c02]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 18: /lib64/libc.so.6(+0x10ec40) [0x7f373550ec40]

This has been fixed now and the code has been made simpler.

vshankar · 2024-08-05T04:02:38Z

src/client/Client.cc

    put_cap_ref(in, CEPH_CAP_FILE_CACHE);
    update_read_io_size(bl->length());
+  } else {
+    put_cap_ref(in, CEPH_CAP_FILE_CACHE);


nit: this could be done in the fall through path.

doable but I think it's better readable this way, if the num of bytes read is 0, do some tasks, if not, just release the cap ref.

vshankar · 2024-08-06T16:41:51Z

This PR is under test in https://tracker.ceph.com/issues/67369.

* refs/pull/55144/head: client: fix file cache cap leak which can stall async read call test/client: test contiguous read for a non-contiguous write Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>

vshankar

https://tracker.ceph.com/projects/cephfs/wiki/Main#wip-vshankar-testing-20240814051955-debug

github-actions bot added cephfs Ceph File System tests labels Jan 11, 2024

dparmar18 force-pushed the wip-63896 branch 2 times, most recently from ecd4152 to 658b36f Compare January 11, 2024 11:37

dparmar18 requested review from a team and ffilz January 11, 2024 11:38

chrisphoffman reviewed Jan 18, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

vshankar reviewed Jan 23, 2024

View reviewed changes

dparmar18 force-pushed the wip-63896 branch from 658b36f to 70ccbca Compare January 24, 2024 11:41

lxbsz reviewed Jan 24, 2024

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

gregsfortytwo previously requested changes Jan 24, 2024

View reviewed changes

dparmar18 force-pushed the wip-63896 branch from 70ccbca to ea86729 Compare January 29, 2024 11:52

github-actions bot added the needs-rebase label Jan 30, 2024

dparmar18 force-pushed the wip-63896 branch from ea86729 to 7a2a579 Compare January 30, 2024 09:26

github-actions bot removed the needs-rebase label Jan 30, 2024

dparmar18 force-pushed the wip-63896 branch 3 times, most recently from 0bddbae to 2903934 Compare January 31, 2024 08:29

dparmar18 requested review from chrisphoffman, gregsfortytwo, lxbsz and vshankar January 31, 2024 10:28

rishabh-d-dave added wip-rishabh-testing Rishabh's testing label and removed wip-rishabh-testing Rishabh's testing label labels Jul 4, 2024

rishabh-d-dave previously requested changes Jul 11, 2024

View reviewed changes

rishabh-d-dave removed needs-qa wip-rishabh-testing Rishabh's testing label labels Jul 11, 2024

dparmar18 force-pushed the wip-63896 branch from 6aa2aee to ff9b3e9 Compare July 11, 2024 21:52

dparmar18 force-pushed the wip-63896 branch from ff9b3e9 to 9a7645e Compare July 11, 2024 22:01

vshankar reviewed Aug 5, 2024

View reviewed changes

dparmar18 added the wip-dparmar-testing label Aug 6, 2024

vshankar added wip-vshankar-testing4 wip-vshankar-testing and removed wip-vshankar-testing4 labels Aug 6, 2024

dparmar18 removed the wip-dparmar-testing label Aug 6, 2024

vshankar added wip-vshankar-testing5 and removed wip-vshankar-testing labels Aug 14, 2024

vshankar approved these changes Aug 22, 2024

View reviewed changes

vshankar removed the wip-vshankar-testing5 label Aug 22, 2024

dparmar18 mentioned this pull request Oct 9, 2024

squid: client: contiguous read fails for non-contiguous write (in async I/O api) #60218

Merged

14 tasks

Conversation

dparmar18 commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dparmar18 commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2024

Uh oh!

dparmar18 commented Jan 31, 2024

Uh oh!

dparmar18 commented Jan 31, 2024

Uh oh!

vshankar commented Jan 31, 2024

Uh oh!

lxbsz commented Feb 1, 2024

Uh oh!

dparmar18 commented Feb 1, 2024

Uh oh!

dparmar18 commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rishabh-d-dave commented Jun 28, 2024

Uh oh!

rishabh-d-dave left a comment

Choose a reason for hiding this comment

Uh oh!

dparmar18 commented Jul 11, 2024

Uh oh!

dparmar18 commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dparmar18 commented Jul 12, 2024

Uh oh!

dparmar18 commented Jul 12, 2024

Uh oh!

dparmar18 commented Jul 15, 2024

Uh oh!

vshankar commented Jul 18, 2024

Uh oh!

dparmar18 commented Aug 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vshankar commented Aug 6, 2024

Uh oh!

vshankar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

dparmar18 commented Jan 11, 2024 •

edited

Loading

dparmar18 commented Jan 29, 2024 •

edited

Loading

dparmar18 commented Feb 1, 2024 •

edited

Loading

dparmar18 commented Jul 12, 2024 •

edited

Loading