osd/rados/rgw/cephfs: Modernize cls interface with compile time safety#66258
osd/rados/rgw/cephfs: Modernize cls interface with compile time safety#66258
Conversation
bill-scales
left a comment
There was a problem hiding this comment.
How are you going to identify the set of class methods that need to be annotated as read only to get the performance benefit of direct reads?
Testing different types of I/O and looking for ops that don't get treated as direct reads is one method, but unless you have very good test coverage you might miss something, especially if an exec call is only added to ops when a specific feature (e.g. rbd mirroring) is in use.
I suspect it might be easier to just modify as many of the cls client code functions (and implementations in neorados) to add the read only flag rather than try and find cases via testing. Maybe that should be a 2nd PR?
There is a balance between these two approaches. The first is conservative and relies on testing identifying the necessary commands. Since we only need balanced reads to work for "most" reads for any one user, I was not worrying about edge cases. I was going with the test-and-see approach as a result. The other method involves code inspection, then updating the cls clients in both src/cls and src/neorados/cls. An API exists for a client to manually specify these class calls, so there might be holes there too. Since this is a manual process and there are significant numbers of read only calls, I was worried about making mistakes. Since human error is a possibility, regression testing of every cls method is a requirement - this seemed like significant amounts of work and I am not sure we have the time and resources for this. Another approach could be to look for any class method which has a decent unit test and only update the exec call for code paths which are tested. Clearly there is still room for human error in identifying which tests exist, but I think this might be a good half-way approach. |
|
jenkins test api |
|
jenkins test api |
|
is there some way the rados client library can base this direct-read decision solely on the use of that wouldn't help with |
There is currently no policing in librados, or the OSD that the client correctly used ObjectReadOperation vs ObjectWriteOperation. I did consider simply overriding the ObjectOperation::exec() functions in ObjectReadOperation and assuming they are all read only. However, if any client has mistakenly used an ObjectReadOperation::exec() incorrectly, then this change would prevent that client from working. I considered this to be too high risk. I am open to arguments to the contrary on this, as overrides would be a really neat way of implementing this. Unfortunately, this would not work for NEORADOS, where I can't see a way of being explicit about read vs write. |
|
thanks @aainscow,
i was hoping we could handle this in the
i think it would be nice to catch any misuse*. ceph components may not have enough test coverage to rely on osd enforcement to catch everything, but a code audit probably could. worth discussing, at least? edit: *especially now that #56180 relies on this for local/balanced read affinity |
|
@cbodley In response to your last comment, I would be interested in your take on my new templates, in the second commit. There is quite a bit of boiler plate there in order to keep the legacy API/ABI in place, but I think this is a nice approach to policing that the client correctly marks class call/methods as read only or not. When using the new interface:
I think this should make balanced reads much safer, as well as providing a safe interface for my split-reads work. |
|
@adamemerson - I have included you on the review, as you were last to modify the exec code in neorados. |
|
@SrinivasaBharath I have removed the needs_qa. If you have not started a run, no need to include this PR. If you have started a run, do not abort it. The old code should not cause any regressions. |
|
@aainscow i like the idea of compile-time checks, but i wonder how much of this is already covered by the "cls client" interfaces like cls_version_client.h: void cls_version_set(librados::ObjectWriteOperation& op, obj_version& ver);
void cls_version_inc(librados::ObjectWriteOperation& op);
void cls_version_inc(librados::ObjectWriteOperation& op, obj_version& ver, VersionCond cond);
void cls_version_read(librados::ObjectReadOperation& op, obj_version *objv);
void cls_version_check(librados::ObjectOperation& op, obj_version& ver, VersionCond cond);assuming ceph code is calling these functions instead of making raw calls to librados::ObjectReadOperation op;
cls_version_inc(op);the fact that |
|
@cbodley Clients which call cls_* code are protected, so this gives safety, assuming the developer is well behaved and sticks to these functions. The cls_* code itself, however, is unprotected. With balanced reads and EC split ops, data-integrity is at risk if a class call is marked as read only when it is not, so I wanted to add paranoid levels of compile-time protection. If we are going to refactor all read-only execs to use the new read-only class call, then I think this code provides protection for the refactoring that we will require. IMO, the new interface is much cleaner (in C++ at least). Having class and method strings hanging around in multiple places is "code smell". |
|
i agree that the exec interface isn't great, but as long as we're committed to librados api/abi compatibility there will always be an unsafe way to use it. so i think the osd's enforcement is critical to making this safe. you're currently relying on a new CEPH_OSD_CLS_FLAG_READ_ONLY flag for that, so need a reliable way for the client to mark read-only execs but does the osd not already have some logic to reject client writes that were misdirected at a non-primary osd? if not, maybe that would be a more general solution to this class of problems? |
|
That is a good point, so perhaps "data integrity" is going too far. The OSD will never action a write for a read-only op and only a read-only op would ever be executed. Any call would still, however, be broken (the op will be failed back to the client) and as such, a more robust interface is still important. So I think we have three ways to go:
I am going to dwell on this... Option (2) makes me uneasy.... |
4fdf9ea to
98f02ad
Compare
|
@adamemerson I have addressed your reviews with explicit commits. Please verify:
|
adamemerson
left a comment
There was a problem hiding this comment.
Looks good to me! Drop the string_view change. Everything else is fine and should be good compile-only.
fdf4293 to
7481e15
Compare
|
jenkins test make check |
|
Latest push was to fill in a missing signed-off statement |
|
jenkins test make check |
|
jenkins test make check |
7481e15 to
dca0c73
Compare
|
jenkins test make check |
|
Recent changes mean I need to request a new review |
|
@aainscow (and @SrinivasaBharath ) - the only new failure in the test run is a CLS compilation |
54fff2f to
c7e95c8
Compare
|
@batrick I have rebased this to squash all the correcionss following their review. Only new addition is addressing the compile failure in the test run and new release notes as requested. |
Status Quo: Librados currently uses string-based identifiers for RADOS class method calls (exec) in both ObjectReadOperation and ObjectWriteOperation. The API does not distinguish between methods that modify object state and those that are read-only. Problem: This lack of semantic enforcement has become a critical safety gap with the introduction of EC Direct Reads. Because the EC direct read path is optimized for performance, it may process the same read-only operation multiple times (e.g., across different shards or during retries). Previously, while technically a violation of API semantics, a "read-that-writes" would often function correctly because operations were processed in-order and without retries on the read path. However, in the context of EC direct reads, executing a non-idempotent write through the read path poses a silent and critical risk to data integrity. Solution: Introduce a template-based trait system to encode method semantics into the API. By tagging methods with traits (MethodRead or MethodWrite), the compiler now prevents state-modifying methods from being added to ObjectReadOperation. This ensures that only side-effect-free methods reach the EC direct read path, eliminating the risk of accidental double-writes at the source. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Migrate existing internal RADOS classes and the associated test infrastructure to utilize the new template-based exec() interface. While the previous commit introduced the mechanism, this change applies it across the codebase to eliminate legacy string-based calls in internal logic. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
The osd code was not present, the radosacl tool did nothing useful and the scratchtoolpp was testing something that doesn't exist. key_value_store is optionally compiled into the key store, but seems to serve no useful purpose. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
There much mocking of exec going on in these tests. An extensive search and replace was required to find them all. Note that the failure looks like memory corruption when manipulating the ops array - but the actually issue is the mocking not intercepting the exec_impl call (and similar) correctly. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Update the Rados Gateway driver to utilize the new trait-based class method interface. This replaces legacy string-based exec() calls with the typesafe template-based alternatives. Key Changes: Updated reset_stats in buckets.cc to use cls::user::method::reset_user_stats2. Updated RGWRadosBILogTrimCR in rgw_cr_rados.cc to use cls::rgw::method::bi_log_trim. This change ensures that RGW operations are correctly categorized as reads or writes at compile-time, aligning with the broader effort to improve RADOS class method safety and OSD routing efficiency. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
c7e95c8 to
3faf8da
Compare
|
jenkins test api |
1 similar comment
|
jenkins test api |
|
Re-run of failed tests worked: Re-scheduling for QA |
This is a preparation PR for EC direct reads, but is also provides protection for balanced reads.
This PR does the following:
Following reviews of earlier versions and discussion at the CDM, we are NOT:
A later PR will add policing to the OSD:
Tracker: https://tracker.ceph.com/issues/73986
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.