Skip to content

rgw/posix: add file read/write ops for rgw::Aio#50635

Draft
cbodley wants to merge 4 commits intoceph:mainfrom
cbodley:wip-rgw-posix-aio
Draft

rgw/posix: add file read/write ops for rgw::Aio#50635
cbodley wants to merge 4 commits intoceph:mainfrom
cbodley:wip-rgw-posix-aio

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Mar 23, 2023

enables asio's uring interfaces, adds bufferlist adapters for asio's ConstBufferSequence and MutableBufferSequence concepts, then implements simple async ops for use with rgw::Aio

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cbodley
Copy link
Contributor Author

cbodley commented Mar 23, 2023

this piece doesn't depend on optional_yield, but the posix ReadOp and WriteOp will need to construct a boost::asio::random_access_file with the correct executor type:

  • given a real yield_context, you can get its executor the same way that librados_op() does
  • given an empty optional_yield, i hope we can use the asio::system_executor() which runs on an "unspecified system thread pool"

then the posix ReadOp::iterate() can construct an aio throttle with auto aio = rgw::make_throttle(window_size, y), and submit i/os similar to RGWRados::get_obj_iterate_cb() except you'd pass in a file_read_op() instead of librados_op()

Copy link
Contributor

@dang dang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems straight forward enough.

@github-actions
Copy link

github-actions bot commented Jun 8, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor Author

cbodley commented Aug 5, 2023

@dang @mattbenjamin i was thinking about this a bit more, and i'm skeptical that we want to mimic the rados backend's strategy for reads/writes. RadosDriver makes use of the AioThrottle to transfer several chunks in parallel, because each chunk can likely be satisfied by a different osd

from a local file, i think we're likely to get the best performance by streaming the reads sequentially and letting the os handle readahead. but what about a distributed filesystem like gpfs? would we know enough about the layout to take advantage of any parallelism there?

i pushed a commit that preserves the sequential behavior of POSIXReadOp::iterate(), but uses boost::asio::random_access_file just to suspend the coroutine instead of blocking

@mattbenjamin
Copy link
Contributor

mattbenjamin commented Aug 6, 2023

hi casey,

I think it may be worth experimenting with this more. Local file can be a number of things, including mirrored and striped devices. gpfs could be similar. still, you're probably right that under most conditions i/o will be very sequential, so we might not notice

Matt

@cbodley
Copy link
Contributor Author

cbodley commented Aug 7, 2023

can we discuss what d3n and d4n's 'ssd driver' should do in the d4n call tomorrow? cc @pritha-srivastava

@mattbenjamin
Copy link
Contributor

@mkogan1 ^^

@mkogan1
Copy link
Contributor

mkogan1 commented Aug 8, 2023

hi casey,

I think it may be worth experimenting with this more. Local file can be a number of things, including mirrored and striped devices. gpfs could be similar. still, you're probably right that under most conditions i/o will be very sequential, so we might not notice

Matt

+1 for evaluating configurable transfer several chunks in parallel (not enabled by default)
in recent years RAID controllers got much faster and are able to perform RAID5 of NVMEs IO at millions of IOPS,
and servers are also can have optional dual such controllers.
https://www.storagereview.com/review/dell-poweredge-direct-drives-vs-perc-12-review
image

@github-actions
Copy link

github-actions bot commented Oct 7, 2023

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Oct 7, 2023
@github-actions
Copy link

github-actions bot commented Nov 6, 2023

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Nov 6, 2023
@mattbenjamin
Copy link
Contributor

per @mkogan1 's comment, let's re-open this

@mattbenjamin mattbenjamin self-assigned this Nov 6, 2023
@mattbenjamin mattbenjamin removed the stale label Nov 6, 2023
@mattbenjamin mattbenjamin reopened this Nov 6, 2023
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Feb 13, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Mar 14, 2024
cbodley added 4 commits March 9, 2026 10:37
Signed-off-by: Casey Bodley <cbodley@redhat.com>
there's a conflicting BLOCK_SIZE macro defined inside the liburing headers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
adapters that allow bufferlist to be used as the `buffers` argument in
functions like `asio::async_read()` and `asio::async_write()`

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@mkogan1
Copy link
Contributor

mkogan1 commented Mar 9, 2026

@cbodley rebased and run a small object PUT workload (100,000 * 4KB objs into multiple buckets)
the IO/s performance with the PR cherry-pick(screenshot bottom) is lower than before the PR cherry-pick(screenshot top)
image
working to see if can find a way to optimize (myb BOOST_ASIO_HAS_IO_URING_AS_DEFAULT can help)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants