Skip to content

neorados: Update exerciser/demo to use C++20 coroutines#48129

Closed
adamemerson wants to merge 6 commits intoceph:mainfrom
adamemerson:wip-coro-test
Closed

neorados: Update exerciser/demo to use C++20 coroutines#48129
adamemerson wants to merge 6 commits intoceph:mainfrom
adamemerson:wip-coro-test

Conversation

@adamemerson
Copy link
Contributor

@adamemerson adamemerson commented Sep 15, 2022

Just to work the bugs out and give some demo code.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@adamemerson
Copy link
Contributor Author

jenkins test make check

@yuvalif
Copy link
Contributor

yuvalif commented Sep 19, 2022

isn't the neorados code currently used with the RGW?
shouldn't the neorados library support both a spawn/yield API and c++20 coroutines API?

@cbodley
Copy link
Contributor

cbodley commented Sep 19, 2022

isn't the neorados code currently used with the RGW? shouldn't the neorados library support both a spawn/yield API and c++20 coroutines API?

it does support both as part of asio's Asynchronous Model, specifically through our handling of the CompletionToken template type

@adamemerson
Copy link
Contributor Author

This looks mostly reasonable, I think? I'm not as familiar with how much background failure RBD has.

https://pulpito.ceph.com/aemerson-2022-09-20_03:59:09-rbd-wip-coro-test-distro-default-smithi/

@yuvalif
Copy link
Contributor

yuvalif commented Sep 20, 2022

do we have teuthology tests running ceph_test_neorados?

@adamemerson
Copy link
Contributor Author

do we have teuthology tests running ceph_test_neorados?

No, not yet.

@cbodley
Copy link
Contributor

cbodley commented Sep 21, 2022

This looks mostly reasonable, I think? I'm not as familiar with how much background failure RBD has.

https://pulpito.ceph.com/aemerson-2022-09-20_03:59:09-rbd-wip-coro-test-distro-default-smithi/

yikes, from those cli_generic.sh failures i see that rbd bench is causing osds to report a bunch of slow requests:

2022-09-20T06:18:21.142 INFO:tasks.ceph.osd.0.smithi171.stderr:2022-09-20T06:18:21.139+0000 7f9299790700 -1 osd.0 158 get_health_metrics reporting 256 slow ops, oldest is osd_op(client.10593.0:2055 3.3 3:db356461:test1::rbd_data.295e2a328d04.00000000000000f0:head [set-alloc-hint object_size 4194304 write_size 4194304,write 3420160~4096 [fadvise_random] in=4096b] snapc 0=[] RETRY=4 ondisk+retry+write+known_if_redirected+supports_pool_eio e158)
2022-09-20T06:18:21.448 INFO:tasks.ceph.osd.1.smithi171.stderr:2022-09-20T06:18:21.443+0000 7f52fc777700 -1 osd.1 158 get_health_metrics reporting 256 slow ops, oldest is osd_op(client.10593.0:2007 3.2 3:50ad0aa8:test1::rbd_data.295e2a328d04.0000000000000060:head [set-alloc-hint object_size 4194304 write_size 4194304,write 1519616~4096 [fadvise_random] in=4096b] snapc 0=[] RETRY=3 ondisk+retry+write+known_if_redirected+supports_pool_eio e158)

do these failures look familiar @idryomov?

@cbodley cbodley requested a review from idryomov September 21, 2022 14:51
@yuvalif yuvalif mentioned this pull request Sep 28, 2022
14 tasks
@cbodley
Copy link
Contributor

cbodley commented Oct 6, 2022

@adamemerson i've been watching other runs of the rbd suite like https://pulpito.ceph.com/dis-2022-10-05_10:59:41-rbd-wip-dis-testing-distro-default-smithi/ and haven't seen any of those cli_generic.sh failures. you might try a rebase and rerun to see if they persist?

@adamemerson
Copy link
Contributor Author

@cbodley
Copy link
Contributor

cbodley commented Oct 7, 2022

looks much better! for the one test_librbd.sh failure, it got stuck on TestLibRBD.ConcurentOperations and never finished. not sure what's going on there, but maybe that's related to https://tracker.ceph.com/issues/53440? cc @idryomov

@cbodley
Copy link
Contributor

cbodley commented Oct 18, 2022

@idryomov a friendly ping to review the rbd suite results

@adamemerson
Copy link
Contributor Author

jenkins test api

@yuvalif
Copy link
Contributor

yuvalif commented Jan 22, 2023

@adamemerson any status update?

@yuvalif yuvalif mentioned this pull request Jan 22, 2023
14 tasks
@adamemerson
Copy link
Contributor Author

@adamemerson any status update?

@idryomov Says he will include it in his next test run of RBD PRs.

@cbodley
Copy link
Contributor

cbodley commented Mar 6, 2023

@adamemerson i know that you've been carrying these commits in your wip-coro-after-reef branch. now that reef has forked, i think we should start by getting just this PR reviewed/tested/merged. i don't want to ask other components to review everything in #49737. but merging this PR could unblock all of the rgw-specific in your feature branch. what do you think?

@cbodley
Copy link
Contributor

cbodley commented Mar 6, 2023

i also see that the rbd suite is looking very green for reef, so that should make it easier to validate the results

@adamemerson
Copy link
Contributor Author

@adamemerson i know that you've been carrying these commits in your wip-coro-after-reef branch. now that reef has forked, i think we should start by getting just this PR reviewed/tested/merged. i don't want to ask other components to review everything in #49737. but merging this PR could unblock all of the rgw-specific in your feature branch. what do you think?

That sounds fine to me, I think.

Comment on lines +1022 to +1038
namespace fmt {
template <>
struct formatter<neorados::Object> : ostream_formatter {};

template <>
struct formatter<neorados::IOContext> : ostream_formatter {};

template <>
struct formatter<neorados::Op> : ostream_formatter {};
} // namespace fmt
Copy link
Contributor

@cbodley cbodley Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be the cause of centos9 build failures in shaman:

src/neorados/RADOS.cc:38:
src/include/neorados/RADOS.hpp:1024:56: error: expected class-name before ‘{’ token
 1024 | struct formatter<neorados::Object> : ostream_formatter {};
      |                                                        ^
src/include/neorados/RADOS.hpp:1027:59: error: expected class-name before ‘{’ token
 1027 | struct formatter<neorados::IOContext> : ostream_formatter {};
      |                                                           ^
src/include/neorados/RADOS.hpp:1030:52: error: expected class-name before ‘{’ token
 1030 | struct formatter<neorados::Op> : ostream_formatter {};
      |                                                    ^

edit: same for the jammy build

@cbodley
Copy link
Contributor

cbodley commented Mar 12, 2023

passed the rgw suite in https://pulpito.ceph.com/cbodley-2023-03-11_01:08:10-rgw-wip-coro-test-distro-default-smithi/

the rbd suite in https://pulpito.ceph.com/cbodley-2023-03-11_01:12:31-rbd-wip-coro-test-distro-default-smithi/ showed a number of failures in cli_generic.sh, rbd_mirror_stress.sh and rbd_mirror_ha.sh

Previously we had versions of all the calls that took an int64_t and
optional key and namespace.

This didn't really offer much benefit and doubled the maintenance
burden for changing anything.

As such just make the IOContext constructor non-explicit, and make all
its mutators return the same IOContext so people can supply them
builder-style if they want to.

Also don't wrap things in 'optional' when the underlying library
doesn't. The callers just end up having to wrap and unwrap with
value-or repeatedly.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
This form of completion handling is compatible with C++20 Completions
and generally more flexible stuff.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
The lock will continue to be held over the 'dispatch' with C++20
coroutines.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Apparently in some cases we were using references to objects on the
caller stack that happen to no longer exist if the caller's stack goes
away.

C++20 Coroutines tickled this bug.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Use `min` rather than `max` when deciding how much we need to read in
`read()`.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Just as a demonstration to see how well they work and how to put
things together with them.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
@cbodley
Copy link
Contributor

cbodley commented Mar 21, 2023

the 'ceph API tests' check fails to build with clang without the boost 1.81 changes:

src/tools/neorados.cc:68:5: error: no template named 'awaitable' in namespace 'boost::asio'
ba::awaitable<R::IOContext> lookup_pool(R::RADOS& r, const std::string& pname)
~~~~^

@cbodley
Copy link
Contributor

cbodley commented Mar 22, 2023

the rbd suite in https://pulpito.ceph.com/cbodley-2023-03-11_01:12:31-rbd-wip-coro-test-distro-default-smithi/ showed a number of failures in cli_generic.sh, rbd_mirror_stress.sh and rbd_mirror_ha.sh

the --rerun was queued for so long that shaman had deleted its packages. i scheduled a new rbd suite on yesterday's builds

@cbodley
Copy link
Contributor

cbodley commented Mar 22, 2023

the 'ceph API tests' check fails to build with clang without the boost 1.81 changes

@adamemerson do we need to pull in the boost 1.81 commits from #49737?

@cbodley
Copy link
Contributor

cbodley commented Mar 23, 2023

https://pulpito.ceph.com/cbodley-2023-03-22_18:12:21-rbd-wip-coro-test-distro-default-smithi/ shows the same cli_generic.sh, rbd_mirror_stress.sh and rbd_mirror_ha.sh failures as the previous run; they might be caused by this pr, but i wouldn't know how to debug further

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants