neorados: Learning from experience by adamemerson · Pull Request #52495 · ceph/ceph

adamemerson · 2023-07-18T01:17:39Z

Being a set of changes to the neorados library to fix bugs, support C++20 coroutines, and generally improve concurrency.

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

idryomov

rbd: Give RBD an independent copy of neorados

Since we'll be iterating quickly, let them have the current semi-stable-ish version.

Hi Adam,

What do you mean by iterating quickly? Do you expect to keep breaking a major (and only, so far) in-tree user of neorados?

They can switch back to the main line version after it settles down a bit.

How long would you expect the stabilization to take? And who is going to own the "switch back" task?

We are almost at the beginning of the release cycle, it's the best time to introduce potentially breaking changes. With that in mind, I would suggest that you take a shot at converting librbd in this PR (once the PR is otherwise good to go).

If the new version is too raw and the conversion isn't worth the effort, I think this should done be the other way around: instead of introducing libneorbdrados, introduce libneoneorados and iterate/experiment there. Moving ~3000 lines of generic code into librbd because the replacement isn't stable enough is really the wrong approach.

cbodley · 2023-07-18T17:48:31Z

thanks @idryomov. i have a strong preference not to fork neorados, which is already a fork of librados

we've intentionally avoided freezing/versioning the neorados API so it could continue to evolve. now that the S release cycle is kicking off, rgw will (finally) be starting to adopt neorados and will desperately need the ability to iterate here

we have a lot of related work already staged on a feature branch https://github.com/ceph/ceph/compare/wip-coro-after-reef. that includes changes to Objecter which effect all librados/neorados clients, and need to be tested and remain stable everywhere. i feel the same way about neorados too - we shouldn't be merging changes to main that aren't tested and stable

part of our frustration just came from delays in #48129 where we weren't able to review the rbd results. how can we avoid blocking on this going forward? my sense is that Adam's neorbdrados fork is a technical solution to a process problem, and i'd much rather work together to iron out the process

cbodley · 2023-07-18T18:04:57Z

@adamemerson i know some of your general plans for neorados, but maybe it would help to write up an outline of proposed changes during the S release? the highest priorities seem to be support for c++20 coroutines and the client interfaces for object classes. the latter shouldn't effect rbd at all. is there much more past that, that would potentially destabilize rbd?

adamemerson · 2023-07-18T19:16:57Z

@adamemerson i know some of your general plans for neorados, but maybe it would help to write up an outline of proposed changes during the S release? the highest priorities seem to be support for c++20 coroutines and the client interfaces for object classes. the latter shouldn't effect rbd at all. is there much more past that, that would potentially destabilize rbd?

Those are the priorities for S cycle since we want to focus on RGW, but that would not make the library stable. (As far as I'm concerned it's not stable, never was, and having people not base more things we have to deal with would be an up-side until it is.)

The two biggest ones not necessary for S (would be nice, though) but necessary for stability are:

Split into a public API that does not expose bufferlist or other Ceph internal types and an in-tree API that inherits form that and does.
Get rid of some of the places where we make new completions/handlers for mid-sequences, likely replacing them with async_compose or co_compose.

I have some other changes in mind that are also not backward compatible, but those mostly involve having more people use it in its current unstable state and getting their feedback about how to dimension things after those big two.

idryomov · 2023-07-18T19:55:08Z

thanks @idryomov. i have a strong preference not to fork neorados, which is already a fork of librados

we've intentionally avoided freezing/versioning the neorados API so it could continue to evolve. now that the S release cycle is kicking off, rgw will (finally) be starting to adopt neorados and will desperately need the ability to iterate here

If by iterating you and Adam mean making sensible changes to the API, that is totally fine with me. What I'm concerned about is Adam's communications that neorados is "semi-stable-ish" and "it's not stable, never was" and -- I'm conjecturing, please correct if I misinterpreted -- the willingness to treat it as such not just when it comes to the API but also to the implementation. Essentially making risky wide-reaching changes, merging them quickly, etc because the bar is believed to be low.

we have a lot of related work already staged on a feature branch https://github.com/ceph/ceph/compare/wip-coro-after-reef. that includes changes to Objecter which effect all librados/neorados clients, and need to be tested and remain stable everywhere. i feel the same way about neorados too - we shouldn't be merging changes to main that aren't tested and stable

That makes two of us! For better or for worse, neorados implementation is assumed to be stable. RBD has been using it since pacific, it's out there in the wild.

part of our frustration just came from delays in #48129 where we weren't able to review the rbd results. how can we avoid blocking on this going forward? my sense is that Adam's neorbdrados fork is a technical solution to a process problem, and i'd much rather work together to iron out the process

My apologies for dropping the ball there. I think I actually included that PR in one of my runs but apparently forgot to comment. Feel free to poke me directly in case PR notifications fall through the cracks.

adamemerson · 2023-07-20T04:53:03Z

Hi Adam,

What do you mean by iterating quickly? Do you expect to keep breaking a major (and only, so far) in-tree user of neorados?

My plan is to not break rbd. That's the entire point.

How long would you expect the stabilization to take? And who is going to own the "switch back" task?

There are two aspects of 'stabilization'. One is simply the API changes, coroutine changes, and general internal cleanups needed internally. That should be completely done by S. Likely fairly quickly in S, though later development may turn up something that needs to be changed.

As to the changes necessary to present an external library, I'd like to get those in by S if possible, but since they're more disruptive and I have in-tree obligations, they might have to wait until T.

I was assuming I would do the 'switch back' task and could do so after making all the changes needed for S if desired. I don't really see it as difficult, especially after we've hammered on it for a while and shaken any bugs out.

We are almost at the beginning of the release cycle, it's the best time to introduce potentially breaking changes. With that in mind, I would suggest that you take a shot at converting librbd in this PR (once the PR is otherwise good to go).

I did exactly that before. That's not the hard part. I just don't want to get blocked for months. This way you don't have to be bound to our timeline and we don't have to be bound to yours.

If the new version is too raw and the conversion isn't worth the effort, I think this should done be the other way around: instead of introducing libneorbdrados, introduce libneoneorados and iterate/experiment there. Moving ~3000 lines of generic code into librbd because the replacement isn't stable enough is really the wrong approach.

Obviously, I disagree. The newer versions are what new users should be using and they're the path to the stable version that should eventually be usable by out-of-tree users.

If RBD will test things in a reasonable timeframe and provide some guidance in debugging failures, then I don't have a problem not splitting things, and it would have some simplifying advantages. However, I don't really see the objection to allowing our schedules to be independent for a time then bringing them together later.

This PR has what's needed to unblock the next step in RGW's async work.

Longer term, the main changes I plan to make are:

Simplify the neorados↔Objecter interface and get rid of some of the allocations
Clean up the back-end to synthesize fewer callbacks.
Document everything properly
Split the interface between include/neorados/RADOS.hpp and src/neorados/RADOSint.hpp (or some other name, you get the idea). With the calls that write into bufferlists kept to the second and the first using Asio's buffer abstractions.
Get some feedback from a few people on readability and clarity and take some of that into account.

I like to think some of my ideas are sensible. The back-end and objecter-interface cleanups might be 'high risk' but since we're trying to be high performance, I want to get rid of the extra allocations and callbacks. I don't consider it to have a particularly low bar, I just don't want to be blocked for long periods of time.

The interface split is likely to be the most outright disruptive. It's not critical for anything in-tree, so doesn't block any other deliverables, but I do consider it essential to exposing an external library and declaring a stable, versioned interface.

idryomov · 2023-07-20T11:43:06Z

We are almost at the beginning of the release cycle, it's the best time to introduce potentially breaking changes. With that in mind, I would suggest that you take a shot at converting librbd in this PR (once the PR is otherwise good to go).

I did exactly that before. That's not the hard part. I just don't want to get blocked for months. This way you don't have to be bound to our timeline and we don't have to be bound to yours.

I think we have (or at least we are supposed to have) the same timeline: one bound to Ceph upstream releases. Once it gets close to the freeze date (which for reef was supposed to be mid-January), changes like these are automatically moved out and review is deprioritized (unless an exception is made).

I have already acknowledged and apologised for not providing a timely review on #48129. The title ("neorados: Update exerciser/demo to use C++20 coroutines") didn't communicate that the library itself was getting tweaked as opposed to just the neorados exerciser tool. Later, test runs were impacted by throbbing build and lab issues which also took months to address.

If RBD will test things in a reasonable timeframe and provide some guidance in debugging failures, then I don't have a problem not splitting things, and it would have some simplifying advantages. However, I don't really see the objection to allowing our schedules to be independent for a time then bringing them together later.

I don't buy the "different schedules" argument, but my objection was more to how the fork was performed and less to the fork itself (although I'm definitely not a fan of the fork). It seems like that discussion can be put aside though as it appears that the fork not needed at all. I'll do my best to help review and test changes needed to adapt librbd to the updated API, as quick as possible.

cbodley · 2023-07-20T18:02:48Z

I like to think some of my ideas are sensible. The back-end and objecter-interface cleanups might be 'high risk' but since we're trying to be high performance, I want to get rid of the extra allocations and callbacks. I don't consider it to have a particularly low bar, I just don't want to be blocked for long periods of time.

@adamemerson i think the rbd suite would provide valuable test coverage to help validate these higher-risk changes. it's better to discover problems there early in development than at the very end when we try to drop the fork. and rbd would enjoy those performance improvements in the meantime

mattbenjamin · 2023-07-20T19:33:36Z

It feels to me as if forcing a flag day on neorados is artificial. Adam's idea of partially decoupling the breaking changes from different high value clients (rados and rbd)--that are relied on for maximal stability by users and the maintainers--is just good software engineering. You're creating pressure to do all the work out-of-tree, I would say. I won't feel pressured to upstream what we ship if that's the easiest path, and then we find that rbd and rados have "deprioritized" reviewing it.

cbodley · 2023-08-02T21:17:28Z

@adamemerson i updated the "common/async: completion uses free functions for defer/dispatch/post" and "common/async: SharedMutex uses free function post" commits to use bind_executor() and see their unit tests passing again

cbodley · 2023-08-03T14:43:25Z

In file included from /home/jenkins-build/build/workspace/ceph-api/src/tools/neorados.cc:36:
/home/jenkins-build/build/workspace/ceph-api/src/include/neorados/RADOS.hpp:1600:41: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
       max, filter = std::move(filter), this](auto&& handler) mutable {
                                        ^
/home/jenkins-build/build/workspace/ceph-api/src/tools/neorados.cc:99:18: note: in instantiation of function template specialization 'neorados::RADOS::enumerate_objects<boost::asio::redirect_error_t<boost::asio::use_awaitable_t<>>>' requested here
      co_await r.enumerate_objects(pool, next, R::Cursor::end(), 1000, {},
                 ^

idryomov · 2023-08-07T11:20:42Z

@cbodley Did you root cause why make check passed by any chance? Was a different compiler used or is it just that neorados tool is somehow not built by make check?

cbodley · 2023-08-07T16:22:45Z

@cbodley Did you root cause why make check passed by any chance? Was a different compiler used or is it just that neorados tool is somehow not built by make check?

looks like it just isn't built for 'make check', but i'm not sure why. i tried uncommenting the cmake install directive in #52862 but that didn't seem to help

idryomov · 2023-08-08T13:19:47Z

looks like it just isn't built for 'make check', but i'm not sure why.

I think this needs to be tracked down and fixed. From my perspective, it's just a coincidence that the API check build happened to include the tool -- Dashboard folks are free to tweak that job at any time. It's make check that is supposed to build everything (that is enabled by default, at least).

cbodley · 2023-12-07T21:07:52Z

The cmpext thing was the only concerning change under src/librbd.

@yuriw could you please reschedule the rbd and rados suites against the builds in https://shaman.ceph.com/builds/ceph/wip-neorados-learning-from-experience/d710c09d2ffe0f0858468330fd3d69f1f4506222/?

I'm assuming it never happened, will run this through the RBD suite myself.

the previous results were in https://pulpito.ceph.com/?branch=wip-neorados-learning-from-experience

Yuri is rescheduling rados/rbd suites - i cc'ed you on the trello

A failure will produce the output `osd_errc::cmpext_mismatch`. On failure, the `unmatch` out parameter, if provided, will be set to the index of the first nonmatching character; On success, it will be set to `-1`. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Omap and Xattr comparison use the same values, but in the Xattr case it's eight bits wide. Rename the enum to `cmp_op` to reflect this, and open-code encoding the assertions so we don't have to allocate an intermediate structure. Signed-off-by: Adam Emerson <aemerson@redhat.com>

A map from strings to value/op pairs is a bit much and a bit annoying for what's essentially a list of assertions. Just use a vector. Also put op in the middle so it matches `cmpxattr`. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Mostly for getting things into and out of them conveniently. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Rather than using `buffer::list`s for initial values and hash results, serialize to/from the expected types. Signed-off-by: Adam Emerson <aemerson@redhat.com>

This includes cls, cmd, and read_operations. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Since both const lvalue and rvalue make sense, instead of duplicating everything, just take a value and move from it. Signed-off-by: Adam Emerson <aemerson@redhat.com>

The test framework in common test can now support erasure-coded pools and be more easily extended generally. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Now that C++20 has designated initializers, this is the least confusing way to handle objects like this. Also switch to using the spaceship operator, while we're at it. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Signed-off-by: Adam Emerson <aemerson@redhat.com>

This includes write_operations, io, misc, and pool. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Signed-off-by: Adam Emerson <aemerson@redhat.com>

Since Objecter has a set used to validate pointers converted from watch cookies before dereferencing them, expose a function to do so. Also make the set a `std::unordered_set` since we use it for only this purpose. Signed-off-by: Adam Emerson <aemerson@redhat.com>

1. Objecter expects a count of seconds, not milliseconds 2. Actually return result Signed-off-by: Adam Emerson <aemerson@redhat.com>

This includes snapshots and watch_notify Signed-off-by: Adam Emerson <aemerson@redhat.com>

To make sure it's built by `make check`. Signed-off-by: Adam Emerson <aemerson@redhat.com>

Signed-off-by: Adam Emerson <aemerson@redhat.com>

idryomov · 2023-12-08T00:24:09Z

Yuri is rescheduling rados/rbd suites - i cc'ed you on the trello

Reverts for https://tracker.ceph.com/issues/63682 need to be included for RBD, so I think I'll do it myself anyway.

idryomov · 2023-12-11T11:20:45Z

No related failures in RBD:

https://pulpito.ceph.com/dis-2023-12-09_23:57:15-rbd-wip-dis-testing-distro-default-smithi/
https://pulpito.ceph.com/dis-2023-12-10_11:10:46-rbd-wip-dis-testing-distro-default-smithi/

https://pulpito.ceph.com/dis-2023-12-10_11:12:57-rbd-wip-dis-testing-distro-default-smithi/
https://pulpito.ceph.com/dis-2023-12-11_01:43:42-rbd-wip-dis-testing-distro-default-smithi/

idryomov

Ack from the RBD perspective

adamemerson requested review from a team as code owners July 18, 2023 01:17

github-actions bot added build/ops cephfs Ceph File System common core mon rbd tests labels Jul 18, 2023

idryomov reviewed Jul 18, 2023

View reviewed changes

cbodley mentioned this pull request Jul 24, 2023

rgw/beast: switch back to polymorphic executor #50055

Merged

cbodley force-pushed the wip-neorados-learning-from-experience branch from 211d6b9 to 25d2b9c Compare August 2, 2023 21:16

adamemerson force-pushed the wip-neorados-learning-from-experience branch 4 times, most recently from edb6b5f to 879a1ca Compare August 9, 2023 07:19

adamemerson added 21 commits December 7, 2023 16:23

neorados: Improve cmp_omap argument structure

6fab415

A map from strings to value/op pairs is a bit much and a bit annoying for what's essentially a list of assertions. Just use a vector. Also put op in the middle so it matches `cmpxattr`. Signed-off-by: Adam Emerson <aemerson@redhat.com>

test/neorados: common_tests gets functions for bufferlists

a668ddb

Mostly for getting things into and out of them conveniently. Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados: Implement ReadOp::checksum

3775d7e

Rather than using `buffer::list`s for initial values and hash results, serialize to/from the expected types. Signed-off-by: Adam Emerson <aemerson@redhat.com>

test/neorados: Rounding out test coverage, part 1

ef49105

This includes cls, cmd, and read_operations. Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados: Take claim_appended buffer::lists by value

39ede8b

Since both const lvalue and rvalue make sense, instead of duplicating everything, just take a value and move from it. Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados/test: Support erasure coded pools

29ad5be

The test framework in common test can now support erasure-coded pools and be more easily extended generally. Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados: Make neorados::Entry aggregate initializable

b8a1923

Now that C++20 has designated initializers, this is the least confusing way to handle objects like this. Also switch to using the spaceship operator, while we're at it. Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados: Moved-from operations are equivalent to new

cd99649

Signed-off-by: Adam Emerson <aemerson@redhat.com>

test/neorados: Rounding out test coverage, part 2

1076a09

This includes write_operations, io, misc, and pool. Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados: execute returns success on empty operations

ecd9411

Signed-off-by: Adam Emerson <aemerson@redhat.com>

osdc: with_osdmap should be const

0dd6486

Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados: More snapshot support

ccc1b89

Signed-off-by: Adam Emerson <aemerson@redhat.com>

neorados: Notify fixes

1d6b9c8

1. Objecter expects a count of seconds, not milliseconds 2. Actually return result Signed-off-by: Adam Emerson <aemerson@redhat.com>

test/neorados: Rounding out test coverage, part 3

dce2326

This includes snapshots and watch_notify Signed-off-by: Adam Emerson <aemerson@redhat.com>

build: Make tests depend on neorados

5d38a2f

To make sure it's built by `make check`. Signed-off-by: Adam Emerson <aemerson@redhat.com>

qa/rados: Add neorados tests to run

8a07fdb

Signed-off-by: Adam Emerson <aemerson@redhat.com>

libcephfs: Fix order of destruction of context pool

e64d609

Signed-off-by: Adam Emerson <aemerson@redhat.com>

test/os: Add missing include to Fragmentation_simulator.cc

eb27507

Signed-off-by: Adam Emerson <aemerson@redhat.com>

adamemerson force-pushed the wip-neorados-learning-from-experience branch from 36dd22c to eb27507 Compare December 7, 2023 21:27

cbodley mentioned this pull request Dec 8, 2023

librbd: include asio strand headers in librbd.cc #54839

Merged

idryomov approved these changes Dec 11, 2023

View reviewed changes

cbodley merged commit ac48f16 into ceph:main Dec 18, 2023

Matan-B mentioned this pull request Dec 19, 2023

crimson/osd/pg_backend: fix rollback on whiteouts #54609

Merged

14 tasks

adamemerson deleted the wip-neorados-learning-from-experience branch December 20, 2023 20:47

Conversation

adamemerson commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

idryomov left a comment

Choose a reason for hiding this comment

Uh oh!

cbodley commented Jul 18, 2023

Uh oh!

cbodley commented Jul 18, 2023

Uh oh!

adamemerson commented Jul 18, 2023

Uh oh!

idryomov commented Jul 18, 2023

Uh oh!

adamemerson commented Jul 20, 2023

Uh oh!

idryomov commented Jul 20, 2023

Uh oh!

cbodley commented Jul 20, 2023

Uh oh!

mattbenjamin commented Jul 20, 2023

Uh oh!

cbodley commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbodley commented Aug 3, 2023

Uh oh!

idryomov commented Aug 7, 2023

Uh oh!

cbodley commented Aug 7, 2023

Uh oh!

idryomov commented Aug 8, 2023

Uh oh!

cbodley commented Dec 7, 2023

Uh oh!

idryomov commented Dec 8, 2023

Uh oh!

idryomov commented Dec 11, 2023

Uh oh!

idryomov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

adamemerson commented Jul 18, 2023 •

edited

Loading

cbodley commented Aug 2, 2023 •

edited

Loading