neorados: Learning from experience#52495
Conversation
idryomov
left a comment
There was a problem hiding this comment.
rbd: Give RBD an independent copy of neorados
Since we'll be iterating quickly, let them have the current semi-stable-ish version.
Hi Adam,
What do you mean by iterating quickly? Do you expect to keep breaking a major (and only, so far) in-tree user of neorados?
They can switch back to the main line version after it settles down a bit.
How long would you expect the stabilization to take? And who is going to own the "switch back" task?
We are almost at the beginning of the release cycle, it's the best time to introduce potentially breaking changes. With that in mind, I would suggest that you take a shot at converting librbd in this PR (once the PR is otherwise good to go).
If the new version is too raw and the conversion isn't worth the effort, I think this should done be the other way around: instead of introducing libneorbdrados, introduce libneoneorados and iterate/experiment there. Moving ~3000 lines of generic code into librbd because the replacement isn't stable enough is really the wrong approach.
|
thanks @idryomov. i have a strong preference not to fork neorados, which is already a fork of librados we've intentionally avoided freezing/versioning the neorados API so it could continue to evolve. now that the S release cycle is kicking off, rgw will (finally) be starting to adopt neorados and will desperately need the ability to iterate here we have a lot of related work already staged on a feature branch https://github.com/ceph/ceph/compare/wip-coro-after-reef. that includes changes to part of our frustration just came from delays in #48129 where we weren't able to review the rbd results. how can we avoid blocking on this going forward? my sense is that Adam's neorbdrados fork is a technical solution to a process problem, and i'd much rather work together to iron out the process |
|
@adamemerson i know some of your general plans for neorados, but maybe it would help to write up an outline of proposed changes during the S release? the highest priorities seem to be support for c++20 coroutines and the client interfaces for object classes. the latter shouldn't effect rbd at all. is there much more past that, that would potentially destabilize rbd? |
Those are the priorities for S cycle since we want to focus on RGW, but that would not make the library stable. (As far as I'm concerned it's not stable, never was, and having people not base more things we have to deal with would be an up-side until it is.) The two biggest ones not necessary for S (would be nice, though) but necessary for stability are:
I have some other changes in mind that are also not backward compatible, but those mostly involve having more people use it in its current unstable state and getting their feedback about how to dimension things after those big two. |
If by iterating you and Adam mean making sensible changes to the API, that is totally fine with me. What I'm concerned about is Adam's communications that neorados is "semi-stable-ish" and "it's not stable, never was" and -- I'm conjecturing, please correct if I misinterpreted -- the willingness to treat it as such not just when it comes to the API but also to the implementation. Essentially making risky wide-reaching changes, merging them quickly, etc because the bar is believed to be low.
That makes two of us! For better or for worse, neorados implementation is assumed to be stable. RBD has been using it since pacific, it's out there in the wild.
My apologies for dropping the ball there. I think I actually included that PR in one of my runs but apparently forgot to comment. Feel free to poke me directly in case PR notifications fall through the cracks. |
My plan is to not break rbd. That's the entire point.
There are two aspects of 'stabilization'. One is simply the API changes, coroutine changes, and general internal cleanups needed internally. That should be completely done by S. Likely fairly quickly in S, though later development may turn up something that needs to be changed. As to the changes necessary to present an external library, I'd like to get those in by S if possible, but since they're more disruptive and I have in-tree obligations, they might have to wait until T. I was assuming I would do the 'switch back' task and could do so after making all the changes needed for S if desired. I don't really see it as difficult, especially after we've hammered on it for a while and shaken any bugs out.
I did exactly that before. That's not the hard part. I just don't want to get blocked for months. This way you don't have to be bound to our timeline and we don't have to be bound to yours.
Obviously, I disagree. The newer versions are what new users should be using and they're the path to the stable version that should eventually be usable by out-of-tree users. If RBD will test things in a reasonable timeframe and provide some guidance in debugging failures, then I don't have a problem not splitting things, and it would have some simplifying advantages. However, I don't really see the objection to allowing our schedules to be independent for a time then bringing them together later. This PR has what's needed to unblock the next step in RGW's async work. Longer term, the main changes I plan to make are:
I like to think some of my ideas are sensible. The back-end and objecter-interface cleanups might be 'high risk' but since we're trying to be high performance, I want to get rid of the extra allocations and callbacks. I don't consider it to have a particularly low bar, I just don't want to be blocked for long periods of time. The interface split is likely to be the most outright disruptive. It's not critical for anything in-tree, so doesn't block any other deliverables, but I do consider it essential to exposing an external library and declaring a stable, versioned interface. |
I think we have (or at least we are supposed to have) the same timeline: one bound to Ceph upstream releases. Once it gets close to the freeze date (which for reef was supposed to be mid-January), changes like these are automatically moved out and review is deprioritized (unless an exception is made). I have already acknowledged and apologised for not providing a timely review on #48129. The title ("neorados: Update exerciser/demo to use C++20 coroutines") didn't communicate that the library itself was getting tweaked as opposed to just the
I don't buy the "different schedules" argument, but my objection was more to how the fork was performed and less to the fork itself (although I'm definitely not a fan of the fork). It seems like that discussion can be put aside though as it appears that the fork not needed at all. I'll do my best to help review and test changes needed to adapt librbd to the updated API, as quick as possible. |
@adamemerson i think the rbd suite would provide valuable test coverage to help validate these higher-risk changes. it's better to discover problems there early in development than at the very end when we try to drop the fork. and rbd would enjoy those performance improvements in the meantime |
|
It feels to me as if forcing a flag day on neorados is artificial. Adam's idea of partially decoupling the breaking changes from different high value clients (rados and rbd)--that are relied on for maximal stability by users and the maintainers--is just good software engineering. You're creating pressure to do all the work out-of-tree, I would say. I won't feel pressured to upstream what we ship if that's the easiest path, and then we find that rbd and rados have "deprioritized" reviewing it. |
211d6b9 to
25d2b9c
Compare
|
@adamemerson i updated the "common/async: completion uses free functions for defer/dispatch/post" and "common/async: SharedMutex uses free function post" commits to use bind_executor() and see their unit tests passing again |
|
|
@cbodley Did you root cause why |
looks like it just isn't built for 'make check', but i'm not sure why. i tried uncommenting the cmake |
I think this needs to be tracked down and fixed. From my perspective, it's just a coincidence that the API check build happened to include the tool -- Dashboard folks are free to tweak that job at any time. It's |
edb6b5f to
879a1ca
Compare
the previous results were in https://pulpito.ceph.com/?branch=wip-neorados-learning-from-experience Yuri is rescheduling rados/rbd suites - i cc'ed you on the trello |
A failure will produce the output `osd_errc::cmpext_mismatch`. On failure, the `unmatch` out parameter, if provided, will be set to the index of the first nonmatching character; On success, it will be set to `-1`. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Omap and Xattr comparison use the same values, but in the Xattr case it's eight bits wide. Rename the enum to `cmp_op` to reflect this, and open-code encoding the assertions so we don't have to allocate an intermediate structure. Signed-off-by: Adam Emerson <aemerson@redhat.com>
A map from strings to value/op pairs is a bit much and a bit annoying for what's essentially a list of assertions. Just use a vector. Also put op in the middle so it matches `cmpxattr`. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Mostly for getting things into and out of them conveniently. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Rather than using `buffer::list`s for initial values and hash results, serialize to/from the expected types. Signed-off-by: Adam Emerson <aemerson@redhat.com>
This includes cls, cmd, and read_operations. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Since both const lvalue and rvalue make sense, instead of duplicating everything, just take a value and move from it. Signed-off-by: Adam Emerson <aemerson@redhat.com>
The test framework in common test can now support erasure-coded pools and be more easily extended generally. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Now that C++20 has designated initializers, this is the least confusing way to handle objects like this. Also switch to using the spaceship operator, while we're at it. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
This includes write_operations, io, misc, and pool. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Since Objecter has a set used to validate pointers converted from watch cookies before dereferencing them, expose a function to do so. Also make the set a `std::unordered_set` since we use it for only this purpose. Signed-off-by: Adam Emerson <aemerson@redhat.com>
1. Objecter expects a count of seconds, not milliseconds 2. Actually return result Signed-off-by: Adam Emerson <aemerson@redhat.com>
This includes snapshots and watch_notify Signed-off-by: Adam Emerson <aemerson@redhat.com>
To make sure it's built by `make check`. Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
Signed-off-by: Adam Emerson <aemerson@redhat.com>
36dd22c to
eb27507
Compare
Reverts for https://tracker.ceph.com/issues/63682 need to be included for RBD, so I think I'll do it myself anyway. |
idryomov
left a comment
There was a problem hiding this comment.
Ack from the RBD perspective
Being a set of changes to the neorados library to fix bugs, support C++20 coroutines, and generally improve concurrency.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows