Skip to content

client: fix file cache cap leak which can stall async read call#55144

Merged
vshankar merged 2 commits intoceph:mainfrom
dparmar18:wip-63896
Aug 22, 2024
Merged

client: fix file cache cap leak which can stall async read call#55144
vshankar merged 2 commits intoceph:mainfrom
dparmar18:wip-63896

Conversation

@dparmar18
Copy link
Contributor

@dparmar18 dparmar18 commented Jan 11, 2024

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added cephfs Ceph File System tests labels Jan 11, 2024
@dparmar18 dparmar18 force-pushed the wip-63896 branch 2 times, most recently from ecd4152 to 658b36f Compare January 11, 2024 11:37
@dparmar18 dparmar18 requested review from a team and ffilz January 11, 2024 11:38
TearDown();
SetUp();

sprintf(filename, "test_llreadvcontiguousllwritevnoncontiguousfile%u", mypid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: smaller file names should be fine w/ pid suffixed :)

// was incremented. So, it is a must to put it down before readahead
// otherwise it will stall the IO call since it will keep on waiting for
// the file cache cap to be released.
if ((in->cap_refs.at(CEPH_CAP_FILE_CACHE)) > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is absolutely wrong. In no case can you condition changing a ref count on the current value of the ref count — that just means you're not reference counting correctly elsewhere.
You need to figure out how the other usage is incorrect, and resolve it in a way that is coherent across the entire use system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe the do_readahead() code is buggy, because it always assumes that once here it already has gotten the Frc caps and will call the get_cap_ref() directly instead of get_caps(), which will wait for the caps to be available.

But what if in Fl case ? It also could trigger the do_readahead() from _read_async(). What the kclient will do is that for the readahead it will try to get the needed Frc caps just before doing readehead and if fails to get the caps it will just skip the readahead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if in Fl case ? It also could trigger the do_readahead() from _read_async(). What the kclient will do is that for the readahead it will try to get the needed Frc caps just before doing readehead and if fails to get the caps it will just skip the readahead.

We get Frc in _read so even if it is Fl we do have the needed cap ref count while doing readahead

2024-01-25T11:29:26.227+0530 7f5fe588c9c0 10 client.4297 inode before calling _read_async 0x10000000000.head(faked_ino=0 nref=12 ll_ref=1 cap_refs={4=0,1024=1,2048=1,4096=0,8192=1} open={3=1} mode=100666 size=14/4194304 nlink=1 btime=2024-01-25T11:29:26.180449+0530 mtime=2024-01-25T11:29:26.228389+0530 ctime=2024-01-25T11:29:26.228389+0530 change_attr=1 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[0x10000000000 ts 0/0 objects 1 dirty_or_tx 14] parents=0x1.head["test_llreadvllwritevfile475879"] 0x7f5fb80060f0)

above is the debug line i had added

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the got caps, which is need | want, references.

While for the userspace client it will only increase the need caps references. This is why in the _read_async() we need to increase the Fc caps reference before unlocking the client_lock or before doing readahead.

Maybe we can just fix this in get_caps() to get the whole references for need | want and then it could make the code simpler.

@gregsfortytwo @vshankar What do you think ?

If we do this, does this need changes to the way we drop the cap refs? How is it done in the kernel client code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if the cap ref count of want is incremented in get_caps(), then the get_cap_ref() calls for want caps will need to be removed from all the places but this also means that the put_cap_ref() calls are still a thing that needs to be managed manually at respective places; operating get and put at different places looks weird and this might be the reason why cap ref count incrementation of want caps is handled manually. Need to check how kclient does it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two solutions in my mind -

  1. As @lxbsz mentioned, enable get_caps() to increment cap ref count of want caps too. OR
  2. Increment cap ref count in _read_async() before calling ObjectCacher::file_read and decrement once we're done reading

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the got caps, which is need | want, references.

While for the userspace client it will only increase the need caps references. This is why in the _read_async() we need to increase the Fc caps reference before unlocking the client_lock or before doing readahead.

That is rather an implementation difference IMO: need caps need to be issued by the MDS and want caps should not be revoked (or under revocation). In the end, the client needs to hold the set of caps that are required for an operation to be performed - so, until the want caps are not released, the functionality is safe, correct? @lxbsz

Maybe we can just fix this in get_caps() to get the whole references for need | want and then it could make the code simpler.

+1 for this.

@gregsfortytwo @vshankar What do you think ?

(going through further comments on this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dparmar18 As we discussed, when trying to get caps in kclient it will increase all the got caps, which is need | want, references.

When the client does not already have the wanted caps, then only need caps are taken refs on. There is a bit on difference the way ref handling in done b/w the kclient and user-space driver.

@dparmar18
Copy link
Contributor Author

dparmar18 commented Jan 29, 2024

From my conversations with @lxbsz, we discussed how kclient handles the async/sync ops. The function to get caps in kclient would increment ref count for both need and want caps VS userspace client does this for ONLY need caps. We can try to do this but would require us refactoring a lot of code since getting/putting the ref counts of want caps would need to taken care off and I'm completely in favour of this if all agrees to, if not I've tried to mimic what kclient does but not entirely the way it does it.

ea86729 would increment the Fc cap ref count just before the ObjectCacher::file_read call in Client::_read_async, while decrement would happen at the respective place i.e.:

  • if async call then Client::C_Read_Async_Finisher::finish
  • if sync call then inside Client::_read_async just before calling Client::do_readahead

Do let me know if this makes sense @gregsfortytwo @lxbsz

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@dparmar18
Copy link
Contributor Author

@vshankar what do you think about the approach?

@dparmar18
Copy link
Contributor Author

rationale(copied from the commit message)

We need the Fc cap ref when ObjectCacher::file_read returns zero bytes; then
wait for the I/O to finish; then do readahead but this cap ref needs to be put
down as well at the right place. The problem with the current implementation is
that it is difficult to decide whether to revoke the Fc cap ref in
Client::C_Read_Async_Finisher::finish or not since the destructor cannot
differenciate between the cases as it is fed with the bytes that are read after
waiting for the I/O to be finished.

Therefore provide Fc cap ref to the inode before making a call to
ObjectCacher::file_read in Client::_read_async and handle revoking it:

  • if async read then in Client::C_Read_Async_Finisher::finish
  • else if sync read then within Client::_read_async right before calling
    Client::do_readahead.

@vshankar
Copy link
Contributor

@vshankar what do you think about the approach?

Couldn't get to reviewing it :/

Will have a look....

@lxbsz
Copy link
Member

lxbsz commented Feb 1, 2024

From my conversations with @lxbsz, we discussed how kclient handles the async/sync ops. The function to get caps in kclient would increment ref count for both need and want caps VS userspace client does this for ONLY need caps. We can try to do this but would require us refactoring a lot of code since getting/putting the ref counts of want caps would need to taken care off and I'm completely in favour of this if all agrees to, if not I've tried to mimic what kclient does but not entirely the way it does it.

ea86729 would increment the Fc cap ref count just before the ObjectCacher::file_read call in Client::_read_async, while decrement would happen at the respective place i.e.:

  • if async call then Client::C_Read_Async_Finisher::finish
  • if sync call then inside Client::_read_async just before calling Client::do_readahead

Do let me know if this makes sense @gregsfortytwo @lxbsz

@dparmar18 I'd prefer to do it the same with kclient. I know there maybe more code need to be refactored but it will make it simple to maintain in future.

@dparmar18
Copy link
Contributor Author

From my conversations with @lxbsz, we discussed how kclient handles the async/sync ops. The function to get caps in kclient would increment ref count for both need and want caps VS userspace client does this for ONLY need caps. We can try to do this but would require us refactoring a lot of code since getting/putting the ref counts of want caps would need to taken care off and I'm completely in favour of this if all agrees to, if not I've tried to mimic what kclient does but not entirely the way it does it.
ea86729 would increment the Fc cap ref count just before the ObjectCacher::file_read call in Client::_read_async, while decrement would happen at the respective place i.e.:

  • if async call then Client::C_Read_Async_Finisher::finish
  • if sync call then inside Client::_read_async just before calling Client::do_readahead

Do let me know if this makes sense @gregsfortytwo @lxbsz

@dparmar18 I'd prefer to do it the same with kclient. I know there maybe more code need to be refactored but it will make it simple to maintain in future.

Had a discussion with @lxbsz about this and it seems there will be a lot of refactoring involved here and there are some special cases (like lazyio) that will need to taken care off. I'm in strong favour of aligning userspace client code with kclient's but I'd not be practical to start doing it without having proper context(s), I want to read kclient code first to understand how it's done there before jumping on to doing anything.

@dparmar18
Copy link
Contributor Author

dparmar18 commented Feb 1, 2024

@dparmar18 I'd prefer to do it the same with kclient. I know there maybe more code need to be refactored but it will make it simple to maintain in future.

Had a discussion with @lxbsz about this and it seems there will be a lot of refactoring involved here and there are some special cases (like lazyio) that will need to taken care off. I'm in strong favour of aligning userspace client code with kclient's but I'd not be practical to start doing it without having proper context(s), I want to read kclient code first to understand how it's done there before jumping on to doing anything.

Tracker for the issue https://tracker.ceph.com/issues/64284

@rishabh-d-dave
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/66753.

@rishabh-d-dave rishabh-d-dave added wip-rishabh-testing Rishabh's testing label and removed wip-rishabh-testing Rishabh's testing label labels Jul 4, 2024
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabh-d-dave rishabh-d-dave removed needs-qa wip-rishabh-testing Rishabh's testing label labels Jul 11, 2024
rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request Jul 11, 2024
* refs/pull/55144/head:
	client: fix file cache cap leak which can stall async read call
	test/client: test contiguous read for a non-contiguous write

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
We need the `Fc` cap ref when ObjectCacher::file_read returns zero bytes; then
wait for the I/O to finish; then do readahead but this cap ref needs to be put
down as well at the right place. The problem with the current implementation is
that it is difficult to decide whether to revoke the `Fc` cap ref in
Client::C_Read_Async_Finisher::finish or not since the destructor cannot
differenciate between the cases as it is fed with the bytes that are read after
waiting for the I/O to be finished.

Therefore provide `Fc` cap ref to the inode before making a call to
ObjectCacher::file_read in Client::_read_async and handle revoking it:
- if async read then in Client::C_Read_Async_Finisher::finish
- else if sync read then within Client::_read_async right before calling
  Client::do_readahead.

Fixes: https://tracker.ceph.com/issues/63896
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
@dparmar18
Copy link
Contributor Author

  • ceph_test_client:
[       OK ] TestClient.LlreadvLlwritevInvalidFileHandleSync (130 ms)
[----------] 27 tests from TestClient (7975 ms total)

[----------] Global test environment tear-down
[==========] 27 tests from 1 test suite ran. (7975 ms total)
[  PASSED  ] 27 tests.
  • ceph_test_libcephfs
[----------] Global test environment tear-down
[==========] 116 tests from 4 test suites ran. (147651 ms total)
[  PASSED  ] 116 tests.
  • ceh_test_libcephfs_lazyio
[----------] Global test environment tear-down
[==========] 5 tests from 1 test suite ran. (1517 ms total)
[  PASSED  ] 5 tests.

I hope this just works fine in teuthology too 🤞 @rishabh-d-dave

@dparmar18
Copy link
Contributor Author

dparmar18 commented Jul 12, 2024

@dparmar18
Copy link
Contributor Author

jenkins test make check

@dparmar18
Copy link
Contributor Author

jenkins test api

@dparmar18
Copy link
Contributor Author

jenkins test make check

@vshankar
Copy link
Contributor

green https://pulpito.ceph.com/dparmar-2024-07-12_07:55:39-fs:libcephfs-wip-63896-distro-default-smithi/

Could you please elaborate on what the issue was in the failed job?

@dparmar18
Copy link
Contributor Author

green https://pulpito.ceph.com/dparmar-2024-07-12_07:55:39-fs:libcephfs-wip-63896-distro-default-smithi/

Could you please elaborate on what the issue was in the failed job?

In the previous iteration, cap ref was wrongly being put down which lead to an assertion failure:

g9dba1e83/rpm/el9/BUILD/ceph-19.0.0-4667-g9dba1e83/src/client/Inode.cc: In function 'int Inode::put_cap_ref(int)' thread 7f3724c00640 time 2024-07-04T17:30:13.240196+0000
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-4667-g9dba1e83/rpm/el9/BUILD/ceph-19.0.0-4667-g9dba1e83/src/client/Inode.cc: 207: FAILED ceph_assert(cap_refs[c] > 0)
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr:
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: ceph version 19.0.0-4667-g9dba1e83 (9dba1e836b4756bf3a32be2a4936f371dff55008) squid (dev)
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x127) [0x7f3735e49f05]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x24a112) [0x7f3735e4a112]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 3: (Inode::put_cap_ref(int)+0x21b) [0x562d8580fdf1]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 4: (Client::put_cap_ref(Inode*, int)+0x32) [0x562d857683b8]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 5: (Client::_flushed(Inode*)+0x3f) [0x562d8576893b]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 6: (Client::flush_set_callback(ObjectCacher::ObjectSet*)+0x4e) [0x562d85768a9c]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 7: (client_flush_set_callback(void*, ObjectCacher::ObjectSet*)+0xd) [0x562d8573db32]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 8: (ObjectCacher::bh_write_commit(long, sobject_t, std::vector<std::pair<long, unsigned long>, std::allocator<std::pair<long, unsigned long> > >&, unsigned long, int)+0x1084) [0x562d85865f5c]
2024-07-04T17:30:13.286 INFO:tasks.workunit.client.0.smithi062.stderr: 9: ceph_test_client(+0x209147) [0x562d85866147]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 10: ceph_test_client(+0xb54ad) [0x562d857124ad]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 11: ceph_test_client(+0xca4a2) [0x562d857274a2]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 12: ceph_test_client(+0xb54ad) [0x562d857124ad]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 13: (Finisher::finisher_thread_entry()+0x495) [0x7f3735dd300d]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 14: ceph_test_client(+0xdd29d) [0x562d8573a29d]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 15: (Thread::entry_wrapper()+0x43) [0x7f3735e23c9f]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 16: (Thread::_entry_func(void*)+0xd) [0x7f3735e23cbb]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 17: /lib64/libc.so.6(+0x89c02) [0x7f3735489c02]
2024-07-04T17:30:13.287 INFO:tasks.workunit.client.0.smithi062.stderr: 18: /lib64/libc.so.6(+0x10ec40) [0x7f373550ec40]

This has been fixed now and the code has been made simpler.

put_cap_ref(in, CEPH_CAP_FILE_CACHE);
update_read_io_size(bl->length());
} else {
put_cap_ref(in, CEPH_CAP_FILE_CACHE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be done in the fall through path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doable but I think it's better readable this way, if the num of bytes read is 0, do some tasks, if not, just release the cap ref.

@vshankar
Copy link
Contributor

vshankar commented Aug 6, 2024

This PR is under test in https://tracker.ceph.com/issues/67369.

vshankar added a commit to vshankar/ceph that referenced this pull request Aug 20, 2024
* refs/pull/55144/head:
	client: fix file cache cap leak which can stall async read call
	test/client: test contiguous read for a non-contiguous write

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants