WIP - libcephfs: add async I/O capability by ffilz · Pull Request #44991 · ceph/ceph

ffilz · 2022-02-11T15:50:08Z

This is the first pass at supporting async I/O calls from Ganesha. I would appreciate another set of eyes to verify that I have appropriately covered all the bases. I am concerned that somehow I have left a path for synchronous completion which leaves confusion about the return value. The intent is to have all I/O complete async even if it completes before returning to the caller (which is prepared to deal with that situation). I'm also concerned about the best way to detect EOF. Ganesha currently assumes a 0 length read indicates EOF, but maybe we can do better.

Signed-off-by: Frank S. Filz ffilzlnx@mindspring.com

I haven't tried to complete the checklist, I know there is at least documentation that should be written, and tests are yet to be written.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

github-actions · 2022-02-15T09:41:30Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

jtlayton · 2022-03-25T15:24:22Z

AIUI, the only reliable way to detect the EOF in ceph is to keep track of the inode size and clamp your reads there. Because we're reading from an array of objects, a short read doesn't necessarily indicate an EOF condition. It may just be that the object has a sparse region at the end. Ditto for a non-existent OSD object.

That said, the cephfs read handler should fix up the result of the read from the OSD to look like a short read if you're at the EOF.

jtlayton · 2022-03-25T15:29:58Z

This patch is enormous. It'd be very nice to break this up into smaller changes, as it's a little hard for me to follow what your general approach is. That'll also be nice if this causes a regression and we have to bisect to find it.

mchangir · 2022-03-25T15:46:51Z

AIUI, the only reliable way to detect the EOF in ceph is to keep track of the inode size and clamp your reads there. Because we're reading from an array of objects, a short read doesn't necessarily indicate an EOF condition. It may just be that the object has a sparse region at the end. Ditto for a non-existent OSD object.

That's odd. For an existing object, I'd expect the object size to be truncated to the one as defined in its file-layout to avoid a messy short read. Or, is that intentional ?

That said, the cephfs read handler should fix up the result of the read from the OSD to look like a short read if you're at the EOF.

jtlayton · 2022-03-25T16:45:42Z

That's odd. For an existing object, I'd expect the object size to be truncated to the one as defined in its file-layout to avoid a messy short read. Or, is that intentional ?

The OSD doesn't really have the concept of file layouts. Those are a cephfs concept and are managed by the MDS. All the OSD understands is objects, and multiple (usually 4M) objects back a single cephfs inode.

OSD objects can be sparse with byte granularity as well, so it really only stores what the clients have written. If the client leaves a sparse area in an object, or never creates an object that is completely sparse, then that's OK and is preferable since it takes up less space in the backing store. This is also what makes fscrypt possible -- we have to know which parts are actually sparse to properly decrypt the data.

The upshot is that cephfs clients must be prepared to zero-fill these holes in OSD data for the caller on a read. A hole could even be at the end of the file too which is why we have to carefully track the file size.

src/client/Client.cc

ffilz · 2022-05-17T20:34:03Z

I've updated this patch set. I hope I've addressed comments. I've also added a unit test and resolved several issues that arose from testing. I will shortly start testing from Ganesha.

adamemerson · 2022-05-18T18:02:07Z

src/client/Client.cc

 int Client::uninline_data(Inode *in, Context *onfinish)
 {
  if (!in->inline_data.length()) {
+    // TODO - should we drop the client_lock here?


@vshankar Can you weigh in here?

Sorry - I wasn't following this PR closely. I'll take a look next week.

although the onfinish context signals a condition variable on completion, its a no-op if there's no inline data to be uninlined, since the onfinish context is not handed over to an asynchronous network operation to signal is completion
so, IMO, we don't need to drop the client_lock here

Thanks. Removed the comment.

Also, onunline is now only done fro the write path AND is done synchronously, so all the async onunline bits that I had added in are gone.

ajarr

I've gone through the write path and Client:: _write in detail. I've a couple of minor questions/comments.

I need to go through the read path, Client:: _read() in detail. I'm yet to go through fsync related changes.

ajarr · 2022-07-10T02:16:43Z

src/client/Client.h

+      iofinished_r = 0;
+      onuninlinefinished_r = 0;
+      iofinished = false;
+      onuninlinefinished = onuninline == nullptr;


onuninlinefinished = false;

We don't always have to uninline a file. If so, there will be no onuninline Context, so we need to mark it already finished.

I should comment to that effect though...

@ajarr means the type of onuninlinefinished is bool, you should assign it to false instead of nullptr.

@lxbsz thanks! that's what I meant. onunlinefinished is bool.

Right, onuninlinefinished is bool, It isn't being assigned onuninline

It's being assigned the result of (onuninline == nullptr). The == operator results in a bool.

Ah OK! It looked like onunlinefinished = onunline = false. Thanks!

Yes, a comment mentioning "We don't always have to uninline a file. If so, there will be no onuninline Context, so we need to mark it already finished." should also help.

Removing/changing special onunline handling, so I think this one becomes moot

ajarr · 2022-07-10T02:29:46Z

src/client/Client.h

+      iofinished_r = 0;
+      onuninlinefinished_r = 0;
+      iofinished = false;
+      onuninlinefinished = onuninline == nullptr;


onunlinefinished = false;

We don't always have to uninline a file. If so, there will be no onuninline Context, so we need to mark it already finished.

I should comment to that effect though...

Got it. A comment will help.

Removing/changing special onunline handling, so I think this one becomes moot

ajarr · 2022-07-11T17:28:45Z

src/client/Client.cc

+  // time
+  lat = ceph_clock_now();
+  lat -= start;
+  logger->tinc(l_c_wrlat, lat);


The above line was replaced by,

++nr_write_request; update_io_stat_write(lat);

Please see the following lines in the commit, 967e24fe5c0efd9d7#diff-7a3052fe46aebfed0382c9d0bb9880ea1328add824e0b10c5d551ddfee282cd1R10572

Thanks for that, I put that change in.

ajarr · 2022-07-11T19:23:17Z

src/test/client/async.cc

+ * License version 2.1, as published by the Free Software
+ * Foundation.  See file COPYING.
+ *
+ */


I still see 2021?

ajarr · 2022-07-11T21:14:27Z

src/client/Client.cc

    if (rc < 0)
      goto done;
+  } else if (onfinish) {
+    // handle _sync_read asynchronously...


thanks for the comments. I'm able to understand this better.

ajarr

I went through Client::_read() in detail. I've a few more questions/comments.

I am finding Client::C_Read_Sync_Async::finish() tricky. I'm still going through that couple of more times.

I also need to go through fsync related changes.

ajarr · 2022-07-12T16:36:29Z

src/client/Client.cc

+
+    lat = ceph_clock_now();
+    lat -= start;
+    clnt->logger->tinc(l_c_read, lat);


This above line was replaced by,

++nr_read_request; update_io_stat_read(lat);

Please see 967e24fe5c0efd9d7#diff-7a3052fe46aebfed0382c9d0bb9880ea1328add824e0b10c5d551ddfee282cd1R10111

Thanks for that, I put that change in.

ajarr · 2022-07-12T16:38:12Z

src/client/Client.cc

+  // Caller holds client_lock so we don't need to take it.
+
+  if (r >= 0) {
+    if (is_read_async) {


I don't follow the need for this if...else statement . Can you please elaborate?

Ah, for that one, I had a question if I needed to do anything to deal with a short read

Handling would be different depending on whether read is sync or async (a separate sync/async from what I'm doing...)

ajarr · 2022-07-12T16:43:28Z

src/client/Client.cc

+    }
+
+    ceph_assert(r >= 0);
+    if (movepos) {


do we need to call call update_read_io_size() above this line as we do in during success: in Client::_read()?

Yes, that will be fixed in the next update.

ajarr · 2022-07-12T19:48:43Z

src/client/Client.cc

+{
+  bool fini;
+
+  clnt->client_lock.lock();


Similar to Client::C_Read_Finisher::finish_io() wouldn't the caller still hold the client _lock? I don't see where it's dropped before calling this method, finish_onunline()

Hmm, uninline_data can complete immediately, clearly with the client_lock held.

If it actually runs, it's not clear to me if the client_lock is held or not (its all done in objector), I'm thinking not.

The original onuninline using a C_SaferCond doesn't need the client lock to signal or wait for the condition, so doesn't care if client_lock is held or not...

Considering we really shouldn't have to uninline data, maybe I should just make uninline synchronous, and always use a C_SaferCond and wait for it to be done before returning, the _read (for example) calls finish_onunline with the client_lock held.

As far as I understand, yes, I think we should be okay to make uninline synchronous here and in the write path. I suspect that'd simplify the code?

Cool, I have done that then.

ajarr · 2022-07-12T20:00:09Z

src/client/Client.cc

+ */
+void Client::C_Read_Sync_Async::finish(int r)
+{
+  clnt->client_lock.lock();


wouldn't the caller hold the client_lock? I don't see where client_lock is getting dropped in Client::_read() before this method is called

finish will be called via read_trunc, which looks like it is scheduled so runs on another thread. I haven't followed that code path all the way to the gory end. Maybe it does take the client_lock somehwere? But it all goes down into filer and objecter so seems unlikely any of that would acquire the client lock, and I don't see any inline completion that could call finish on the submitting thread that WOULD still hold the client_lock.

I welcome more analysis on this is one place I got in over my head on following and understanding the code.

We don't even ask for and to be sure that we have granted the Fw caps when reading, we shouldn't write contents to Rados. The bug was introduced by commit a0cb524 (client: Read inline data path) Fixes: https://tracker.ceph.com/issues/56553 Signed-off-by: Xiubo Li <xiubli@redhat.com>

src/client/Client.cc

For async I/O, we will want to be able to override block_writes_upfront so rename the member cfg_block_writes_upfront and add an option to pass block_writes_upfront as a parameter along with a member access method so caller can pass cfg_block_writes_upfront. Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

These bits of code need to be invoked from a separate spot when we introduce async I/O, so break them out now. Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

Vicente-Cheng · 2022-07-18T11:38:36Z

src/client/Client.cc

+    }
+    return 0;
+  }
+


io_finish.get() always called on the above, but only release when onfinish != nullptr.

Is there a leak reference when onfinish == nullptr?

No, it's a managed pointer (std::unique_ptr). It will go out of context and be cleaned up on any exit from Client::_read_async(). We release it HERE because it needs to have a life beyond the call to _read_async().

ffilz · 2022-07-18T21:07:47Z

There is some confusion about use of sync and async in existing code. I'm thinking maybe I need to change the wording of my efforts from async to nonblocking.

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

…xt list To make an async version of fsync (to be used for async write and commit), we need to be able to signal an arbitrary Context on completion of either of these lists. add_async_onfinish_to_context_list Adds such a Context to the list. Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

We will need the ability to do an async write that finishes with fsync so we need non-blocking fsync. Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

…readv_writev Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

ffilz · 2022-07-18T22:50:10Z

OK, I believe this update includes all the required fixes.

I'm still not sure if I need any short read handling in C_Read_Finisher::finish_io.

The client_lock issue in C_Read_Sync_Async::finish still needs more analysis.

ffilz · 2022-07-20T22:35:38Z

I posted a new version changing the title and names from async to nonblocking to reduce confusion.

Here is a pastebin of the diff:

https://pastebin.com/MU7ZtLaL

Here is the nonblocking-io branch:

https://github.com/ffilz/ceph/commits/nonblocking-io

If folks think this looks good, I will push the changes to this pull request (and update the pull request title).

I think this makes things easier to understand and gets rid of Sync_Async in function names...

lxbsz · 2022-07-21T01:31:35Z

@ffilz I suggest just wait for @mchangir uninline_data PR#44359, which will add a scrub command to uninline the inline_data, and a following PR which will remove the inline data code from Client.cc.

Or in this PR you can just skip the uninline related code by adding some comments or just return not supported if it has inline data ?

ffilz · 2022-07-21T21:43:19Z

Dear reviewers,

I need to figure out one more thing, and maybe I need to drop the client_lock to make the callback to Ganesha. I also possibly have to close the filehandle used. For ganesha to call ceph_ll_close, the client_lock can't be held since ll_close will take it.

It was reasonable to add non-blocking fsync on write completion, but file close at end of I/O may be too much to ask... (plus it's a bit tricky in the Ganesha code since the callback isn't actually managing the lifetime of the filehandle).

Can we safely drop the client_lock in the callback?

ffilz · 2022-09-09T23:03:50Z

Please see new pull request #48038

github-actions bot added the cephfs Ceph File System label Feb 11, 2022

vshankar requested a review from a team February 14, 2022 06:57

github-actions bot added the needs-rebase label Feb 15, 2022

jtlayton changed the title ~~WIP - libcephfs: add anych I/O capability~~ WIP - libcephfs: add async I/O capability Feb 15, 2022

adamemerson self-requested a review March 3, 2022 18:30

ajarr self-requested a review March 21, 2022 14:34

vshankar requested a review from jtlayton March 23, 2022 13:19

ajarr reviewed Mar 29, 2022

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

src/client/Client.cc Outdated Show resolved Hide resolved

lxbsz reviewed Mar 30, 2022

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

lxbsz reviewed Mar 30, 2022

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

ffilz force-pushed the async-io branch from c5c0a19 to cbcc87f Compare April 12, 2022 16:21

github-actions bot added build/ops tests labels Apr 12, 2022

adamemerson self-assigned this Apr 12, 2022

ffilz force-pushed the async-io branch 2 times, most recently from dc87488 to 24e465e Compare April 21, 2022 00:09

ffilz force-pushed the async-io branch 2 times, most recently from a20a1e8 to 5f879a8 Compare April 29, 2022 23:27

ajarr reviewed May 11, 2022

View reviewed changes

src/client/Client.cc Outdated Show resolved Hide resolved

src/client/Client.cc Outdated Show resolved Hide resolved

src/client/Client.cc Outdated Show resolved Hide resolved

src/client/Client.cc Outdated Show resolved Hide resolved

ffilz force-pushed the async-io branch from 5f879a8 to 9d1eaf7 Compare May 17, 2022 20:28

ffilz force-pushed the async-io branch from 9d1eaf7 to 1dc023a Compare May 17, 2022 23:28

adamemerson reviewed May 18, 2022

View reviewed changes

djgalloway changed the base branch from master to main May 25, 2022 20:03

ffilz requested a review from batrick June 23, 2022 17:14

ajarr reviewed Jul 11, 2022

View reviewed changes

ajarr reviewed Jul 13, 2022

View reviewed changes

lxbsz reviewed Jul 14, 2022

View reviewed changes

src/client/Client.cc Show resolved Hide resolved

ffilz added 3 commits July 14, 2022 15:24

Buffers: Add function to buffer.h to copy bufferlist to an iovec

a19c412

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

Client: Break out some code into new methods in prep for async

b0aa0be

These bits of code need to be invoked from a separate spot when we introduce async I/O, so break them out now. Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

ffilz force-pushed the async-io branch from 23ed43f to 47ac661 Compare July 15, 2022 18:02

github-actions bot removed the needs-rebase label Jul 15, 2022

Vicente-Cheng reviewed Jul 18, 2022

View reviewed changes

ffilz force-pushed the async-io branch from 47ac661 to b061145 Compare July 18, 2022 20:08

ffilz added 8 commits July 18, 2022 15:39

Client: Add async helper classes

4ca2ad9

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

Client: Add ll_preadv_pwritev to expose async I/O to libcephfs

6fd8643

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

libcephfs: Add async readv/writev I/O interface

bacab9b

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

test: Add async client test

330f978

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

Client: Add non-blocking fsync

11c49f8

We will need the ability to do an async write that finishes with fsync so we need non-blocking fsync. Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

Client: Hook nonblocking fsync into the write path of ll_preadv_pwritev

6e6dee8

Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

libcephfs: Add ability to request fsync with write via ceph_ll_async_…

964595e

…readv_writev Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>

ffilz force-pushed the async-io branch from b061145 to 964595e Compare July 18, 2022 22:43

ffilz requested review from dang and mattbenjamin July 20, 2022 22:36

ffilz mentioned this pull request Sep 9, 2022

libcephfs: Nonblocking io #48038

Merged

14 tasks

ffilz closed this Sep 9, 2022

Conversation

ffilz commented Feb 11, 2022

Checklist

Uh oh!

github-actions bot commented Feb 15, 2022

Uh oh!

jtlayton commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtlayton commented Mar 25, 2022

Uh oh!

mchangir commented Mar 25, 2022

Uh oh!

jtlayton commented Mar 25, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ffilz commented May 17, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajarr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffilz Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajarr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jtlayton commented Mar 25, 2022 •

edited

Loading

ffilz Jul 12, 2022 •

edited

Loading

lxbsz commented Jul 21, 2022 •

edited

Loading