Skip to content

neorados: Learning from experience#52495

Merged
cbodley merged 42 commits intoceph:mainfrom
adamemerson:wip-neorados-learning-from-experience
Dec 18, 2023
Merged

neorados: Learning from experience#52495
cbodley merged 42 commits intoceph:mainfrom
adamemerson:wip-neorados-learning-from-experience

Conversation

@adamemerson
Copy link
Contributor

@adamemerson adamemerson commented Jul 18, 2023

Being a set of changes to the neorados library to fix bugs, support C++20 coroutines, and generally improve concurrency.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rbd: Give RBD an independent copy of neorados

Since we'll be iterating quickly, let them have the current semi-stable-ish version.

Hi Adam,

What do you mean by iterating quickly? Do you expect to keep breaking a major (and only, so far) in-tree user of neorados?

They can switch back to the main line version after it settles down a bit.

How long would you expect the stabilization to take? And who is going to own the "switch back" task?

We are almost at the beginning of the release cycle, it's the best time to introduce potentially breaking changes. With that in mind, I would suggest that you take a shot at converting librbd in this PR (once the PR is otherwise good to go).

If the new version is too raw and the conversion isn't worth the effort, I think this should done be the other way around: instead of introducing libneorbdrados, introduce libneoneorados and iterate/experiment there. Moving ~3000 lines of generic code into librbd because the replacement isn't stable enough is really the wrong approach.

@cbodley
Copy link
Contributor

cbodley commented Jul 18, 2023

thanks @idryomov. i have a strong preference not to fork neorados, which is already a fork of librados

we've intentionally avoided freezing/versioning the neorados API so it could continue to evolve. now that the S release cycle is kicking off, rgw will (finally) be starting to adopt neorados and will desperately need the ability to iterate here

we have a lot of related work already staged on a feature branch https://github.com/ceph/ceph/compare/wip-coro-after-reef. that includes changes to Objecter which effect all librados/neorados clients, and need to be tested and remain stable everywhere. i feel the same way about neorados too - we shouldn't be merging changes to main that aren't tested and stable

part of our frustration just came from delays in #48129 where we weren't able to review the rbd results. how can we avoid blocking on this going forward? my sense is that Adam's neorbdrados fork is a technical solution to a process problem, and i'd much rather work together to iron out the process

@cbodley
Copy link
Contributor

cbodley commented Jul 18, 2023

@adamemerson i know some of your general plans for neorados, but maybe it would help to write up an outline of proposed changes during the S release? the highest priorities seem to be support for c++20 coroutines and the client interfaces for object classes. the latter shouldn't effect rbd at all. is there much more past that, that would potentially destabilize rbd?

@adamemerson
Copy link
Contributor Author

@adamemerson i know some of your general plans for neorados, but maybe it would help to write up an outline of proposed changes during the S release? the highest priorities seem to be support for c++20 coroutines and the client interfaces for object classes. the latter shouldn't effect rbd at all. is there much more past that, that would potentially destabilize rbd?

Those are the priorities for S cycle since we want to focus on RGW, but that would not make the library stable. (As far as I'm concerned it's not stable, never was, and having people not base more things we have to deal with would be an up-side until it is.)

The two biggest ones not necessary for S (would be nice, though) but necessary for stability are:

  • Split into a public API that does not expose bufferlist or other Ceph internal types and an in-tree API that inherits form that and does.
  • Get rid of some of the places where we make new completions/handlers for mid-sequences, likely replacing them with async_compose or co_compose.

I have some other changes in mind that are also not backward compatible, but those mostly involve having more people use it in its current unstable state and getting their feedback about how to dimension things after those big two.

@idryomov
Copy link
Contributor

thanks @idryomov. i have a strong preference not to fork neorados, which is already a fork of librados

we've intentionally avoided freezing/versioning the neorados API so it could continue to evolve. now that the S release cycle is kicking off, rgw will (finally) be starting to adopt neorados and will desperately need the ability to iterate here

If by iterating you and Adam mean making sensible changes to the API, that is totally fine with me. What I'm concerned about is Adam's communications that neorados is "semi-stable-ish" and "it's not stable, never was" and -- I'm conjecturing, please correct if I misinterpreted -- the willingness to treat it as such not just when it comes to the API but also to the implementation. Essentially making risky wide-reaching changes, merging them quickly, etc because the bar is believed to be low.

we have a lot of related work already staged on a feature branch https://github.com/ceph/ceph/compare/wip-coro-after-reef. that includes changes to Objecter which effect all librados/neorados clients, and need to be tested and remain stable everywhere. i feel the same way about neorados too - we shouldn't be merging changes to main that aren't tested and stable

That makes two of us! For better or for worse, neorados implementation is assumed to be stable. RBD has been using it since pacific, it's out there in the wild.

part of our frustration just came from delays in #48129 where we weren't able to review the rbd results. how can we avoid blocking on this going forward? my sense is that Adam's neorbdrados fork is a technical solution to a process problem, and i'd much rather work together to iron out the process

My apologies for dropping the ball there. I think I actually included that PR in one of my runs but apparently forgot to comment. Feel free to poke me directly in case PR notifications fall through the cracks.

@adamemerson
Copy link
Contributor Author

Hi Adam,

What do you mean by iterating quickly? Do you expect to keep breaking a major (and only, so far) in-tree user of neorados?

My plan is to not break rbd. That's the entire point.

How long would you expect the stabilization to take? And who is going to own the "switch back" task?

There are two aspects of 'stabilization'. One is simply the API changes, coroutine changes, and general internal cleanups needed internally. That should be completely done by S. Likely fairly quickly in S, though later development may turn up something that needs to be changed.

As to the changes necessary to present an external library, I'd like to get those in by S if possible, but since they're more disruptive and I have in-tree obligations, they might have to wait until T.

I was assuming I would do the 'switch back' task and could do so after making all the changes needed for S if desired. I don't really see it as difficult, especially after we've hammered on it for a while and shaken any bugs out.

We are almost at the beginning of the release cycle, it's the best time to introduce potentially breaking changes. With that in mind, I would suggest that you take a shot at converting librbd in this PR (once the PR is otherwise good to go).

I did exactly that before. That's not the hard part. I just don't want to get blocked for months. This way you don't have to be bound to our timeline and we don't have to be bound to yours.

If the new version is too raw and the conversion isn't worth the effort, I think this should done be the other way around: instead of introducing libneorbdrados, introduce libneoneorados and iterate/experiment there. Moving ~3000 lines of generic code into librbd because the replacement isn't stable enough is really the wrong approach.

Obviously, I disagree. The newer versions are what new users should be using and they're the path to the stable version that should eventually be usable by out-of-tree users.

If RBD will test things in a reasonable timeframe and provide some guidance in debugging failures, then I don't have a problem not splitting things, and it would have some simplifying advantages. However, I don't really see the objection to allowing our schedules to be independent for a time then bringing them together later.

This PR has what's needed to unblock the next step in RGW's async work.

Longer term, the main changes I plan to make are:

  • Simplify the neorados↔Objecter interface and get rid of some of the allocations
  • Clean up the back-end to synthesize fewer callbacks.
  • Document everything properly
  • Split the interface between include/neorados/RADOS.hpp and src/neorados/RADOSint.hpp (or some other name, you get the idea). With the calls that write into bufferlists kept to the second and the first using Asio's buffer abstractions.
  • Get some feedback from a few people on readability and clarity and take some of that into account.

I like to think some of my ideas are sensible. The back-end and objecter-interface cleanups might be 'high risk' but since we're trying to be high performance, I want to get rid of the extra allocations and callbacks. I don't consider it to have a particularly low bar, I just don't want to be blocked for long periods of time.

The interface split is likely to be the most outright disruptive. It's not critical for anything in-tree, so doesn't block any other deliverables, but I do consider it essential to exposing an external library and declaring a stable, versioned interface.

@idryomov
Copy link
Contributor

We are almost at the beginning of the release cycle, it's the best time to introduce potentially breaking changes. With that in mind, I would suggest that you take a shot at converting librbd in this PR (once the PR is otherwise good to go).

I did exactly that before. That's not the hard part. I just don't want to get blocked for months. This way you don't have to be bound to our timeline and we don't have to be bound to yours.

I think we have (or at least we are supposed to have) the same timeline: one bound to Ceph upstream releases. Once it gets close to the freeze date (which for reef was supposed to be mid-January), changes like these are automatically moved out and review is deprioritized (unless an exception is made).

I have already acknowledged and apologised for not providing a timely review on #48129. The title ("neorados: Update exerciser/demo to use C++20 coroutines") didn't communicate that the library itself was getting tweaked as opposed to just the neorados exerciser tool. Later, test runs were impacted by throbbing build and lab issues which also took months to address.

If RBD will test things in a reasonable timeframe and provide some guidance in debugging failures, then I don't have a problem not splitting things, and it would have some simplifying advantages. However, I don't really see the objection to allowing our schedules to be independent for a time then bringing them together later.

I don't buy the "different schedules" argument, but my objection was more to how the fork was performed and less to the fork itself (although I'm definitely not a fan of the fork). It seems like that discussion can be put aside though as it appears that the fork not needed at all. I'll do my best to help review and test changes needed to adapt librbd to the updated API, as quick as possible.

@cbodley
Copy link
Contributor

cbodley commented Jul 20, 2023

I like to think some of my ideas are sensible. The back-end and objecter-interface cleanups might be 'high risk' but since we're trying to be high performance, I want to get rid of the extra allocations and callbacks. I don't consider it to have a particularly low bar, I just don't want to be blocked for long periods of time.

@adamemerson i think the rbd suite would provide valuable test coverage to help validate these higher-risk changes. it's better to discover problems there early in development than at the very end when we try to drop the fork. and rbd would enjoy those performance improvements in the meantime

@mattbenjamin
Copy link
Contributor

It feels to me as if forcing a flag day on neorados is artificial. Adam's idea of partially decoupling the breaking changes from different high value clients (rados and rbd)--that are relied on for maximal stability by users and the maintainers--is just good software engineering. You're creating pressure to do all the work out-of-tree, I would say. I won't feel pressured to upstream what we ship if that's the easiest path, and then we find that rbd and rados have "deprioritized" reviewing it.

@cbodley cbodley force-pushed the wip-neorados-learning-from-experience branch from 211d6b9 to 25d2b9c Compare August 2, 2023 21:16
@cbodley
Copy link
Contributor

cbodley commented Aug 2, 2023

@adamemerson i updated the "common/async: completion uses free functions for defer/dispatch/post" and "common/async: SharedMutex uses free function post" commits to use bind_executor() and see their unit tests passing again

@cbodley
Copy link
Contributor

cbodley commented Aug 3, 2023

In file included from /home/jenkins-build/build/workspace/ceph-api/src/tools/neorados.cc:36:
/home/jenkins-build/build/workspace/ceph-api/src/include/neorados/RADOS.hpp:1600:41: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
       max, filter = std::move(filter), this](auto&& handler) mutable {
                                        ^
/home/jenkins-build/build/workspace/ceph-api/src/tools/neorados.cc:99:18: note: in instantiation of function template specialization 'neorados::RADOS::enumerate_objects<boost::asio::redirect_error_t<boost::asio::use_awaitable_t<>>>' requested here
      co_await r.enumerate_objects(pool, next, R::Cursor::end(), 1000, {},
                 ^

@idryomov
Copy link
Contributor

idryomov commented Aug 7, 2023

@cbodley Did you root cause why make check passed by any chance? Was a different compiler used or is it just that neorados tool is somehow not built by make check?

@cbodley
Copy link
Contributor

cbodley commented Aug 7, 2023

@cbodley Did you root cause why make check passed by any chance? Was a different compiler used or is it just that neorados tool is somehow not built by make check?

looks like it just isn't built for 'make check', but i'm not sure why. i tried uncommenting the cmake install directive in #52862 but that didn't seem to help

@idryomov
Copy link
Contributor

idryomov commented Aug 8, 2023

looks like it just isn't built for 'make check', but i'm not sure why.

I think this needs to be tracked down and fixed. From my perspective, it's just a coincidence that the API check build happened to include the tool -- Dashboard folks are free to tweak that job at any time. It's make check that is supposed to build everything (that is enabled by default, at least).

@adamemerson adamemerson force-pushed the wip-neorados-learning-from-experience branch 4 times, most recently from edb6b5f to 879a1ca Compare August 9, 2023 07:19
@cbodley
Copy link
Contributor

cbodley commented Dec 7, 2023

The cmpext thing was the only concerning change under src/librbd.

@yuriw could you please reschedule the rbd and rados suites against the builds in https://shaman.ceph.com/builds/ceph/wip-neorados-learning-from-experience/d710c09d2ffe0f0858468330fd3d69f1f4506222/?

I'm assuming it never happened, will run this through the RBD suite myself.

the previous results were in https://pulpito.ceph.com/?branch=wip-neorados-learning-from-experience

Yuri is rescheduling rados/rbd suites - i cc'ed you on the trello

A failure will produce the output `osd_errc::cmpext_mismatch`. On
failure, the `unmatch` out parameter, if provided, will be set to the
index of the first nonmatching character; On success, it will be set
to `-1`.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Omap and Xattr comparison use the same values, but in the Xattr case
it's eight bits wide.

Rename the enum to `cmp_op` to reflect this, and open-code encoding
the assertions so we don't have to allocate an intermediate structure.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
A map from strings to value/op pairs is a bit much and a bit annoying
for what's essentially a list of assertions. Just use a vector.

Also put op in the middle so it matches `cmpxattr`.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Mostly for getting things into and out of them conveniently.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Rather than using `buffer::list`s for initial values and hash results,
serialize to/from the expected types.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
This includes cls, cmd, and read_operations.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Since both const lvalue and rvalue make sense, instead of duplicating
everything, just take a value and move from it.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
The test framework in common test can now support erasure-coded pools
and be more easily extended generally.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Now that C++20 has designated initializers, this is the least
confusing way to handle objects like this.

Also switch to using the spaceship operator, while we're at it.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
This includes write_operations, io, misc, and pool.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Since Objecter has a set used to validate pointers converted from
watch cookies before dereferencing them, expose a function to do so.

Also make the set a `std::unordered_set` since we use it for only this
purpose.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
1. Objecter expects a count of seconds, not milliseconds
2. Actually return result

Signed-off-by: Adam Emerson <aemerson@redhat.com>
This includes snapshots and watch_notify

Signed-off-by: Adam Emerson <aemerson@redhat.com>
To make sure it's built by `make check`.

Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
@adamemerson adamemerson force-pushed the wip-neorados-learning-from-experience branch from 36dd22c to eb27507 Compare December 7, 2023 21:27
@idryomov
Copy link
Contributor

idryomov commented Dec 8, 2023

Yuri is rescheduling rados/rbd suites - i cc'ed you on the trello

Reverts for https://tracker.ceph.com/issues/63682 need to be included for RBD, so I think I'll do it myself anyway.

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack from the RBD perspective

@cbodley cbodley merged commit ac48f16 into ceph:main Dec 18, 2023
@adamemerson adamemerson deleted the wip-neorados-learning-from-experience branch December 20, 2023 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants