Skip to content

osd/rados/rgw/cephfs: Modernize cls interface with compile time safety#66258

Open
aainscow wants to merge 6 commits intoceph:mainfrom
aainscow:read_only_execs
Open

osd/rados/rgw/cephfs: Modernize cls interface with compile time safety#66258
aainscow wants to merge 6 commits intoceph:mainfrom
aainscow:read_only_execs

Conversation

@aainscow
Copy link
Contributor

@aainscow aainscow commented Nov 14, 2025

This is a preparation PR for EC direct reads, but is also provides protection for balanced reads.

This PR does the following:

  1. Adds a new C++ interface, with compile time safety, which prevents any code from attempting to call an exec which may do a write without marking the OP as write.
  2. Deletes any C++ interface which can be used unsafely.
  3. Enforces that IOCtx::exec() and IOCtx::aio_exec() are read only.
  4. Refactors all clients within Ceph to use the new interface.
  5. Deletes unused clients (cls_acl, cls_crypto and key_value_store) + associated tests

Following reviews of earlier versions and discussion at the CDM, we are NOT:

  • Deprecating the C interfaces and associated bindings to other languages.
  • Providing any safety in the client for the C / other languages interfaces

A later PR will add policing to the OSD:

  • Umbrella clients sending an illegal exec (i.e. one where a read contains a write) will be failed by the OSD.
  • Pre-Umbrella clients sending an illegal exec will cause a health warning to be generated, encouraging the user to update their client.

Tracker: https://tracker.ceph.com/issues/73986

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

Copy link
Contributor

@bill-scales bill-scales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you going to identify the set of class methods that need to be annotated as read only to get the performance benefit of direct reads?

Testing different types of I/O and looking for ops that don't get treated as direct reads is one method, but unless you have very good test coverage you might miss something, especially if an exec call is only added to ops when a specific feature (e.g. rbd mirroring) is in use.

I suspect it might be easier to just modify as many of the cls client code functions (and implementations in neorados) to add the read only flag rather than try and find cases via testing. Maybe that should be a 2nd PR?

@aainscow
Copy link
Contributor Author

aainscow commented Nov 17, 2025

How are you going to identify the set of class methods that need to be annotated as read only to get the performance benefit of direct reads?

Testing different types of I/O and looking for ops that don't get treated as direct reads is one method, but unless you have very good test coverage you might miss something, especially if an exec call is only added to ops when a specific feature (e.g. rbd mirroring) is in use.

I suspect it might be easier to just modify as many of the cls client code functions (and implementations in neorados) to add the read only flag rather than try and find cases via testing. Maybe that should be a 2nd PR?

There is a balance between these two approaches. The first is conservative and relies on testing identifying the necessary commands. Since we only need balanced reads to work for "most" reads for any one user, I was not worrying about edge cases. I was going with the test-and-see approach as a result.

The other method involves code inspection, then updating the cls clients in both src/cls and src/neorados/cls. An API exists for a client to manually specify these class calls, so there might be holes there too. Since this is a manual process and there are significant numbers of read only calls, I was worried about making mistakes. Since human error is a possibility, regression testing of every cls method is a requirement - this seemed like significant amounts of work and I am not sure we have the time and resources for this.

Another approach could be to look for any class method which has a decent unit test and only update the exec call for code paths which are tested. Clearly there is still room for human error in identifying which tests exist, but I think this might be a good half-way approach.

@aainscow
Copy link
Contributor Author

jenkins test api

@aainscow
Copy link
Contributor Author

jenkins test api

@cbodley
Copy link
Contributor

cbodley commented Nov 17, 2025

is there some way the rados client library can base this direct-read decision solely on the use of ObjectWriteOperation vs ObjectReadOperation, without requiring applications to mark their calls to exec?

that wouldn't help with IoCtx::exec(), but https://tracker.ceph.com/issues/65889 led to two commits af17631 and 4612195 that replaced some IoCtx::exec() with calls to ObjectWriteOperation::exec(). if this implies IoCtx::exec() should be used only for reads, then that could be flagged as read-only and enforced on the osd with EIO

@aainscow
Copy link
Contributor Author

is there some way the rados client library can base this direct-read decision solely on the use of ObjectWriteOperation vs ObjectReadOperation, without requiring applications to mark their calls to exec?

that wouldn't help with IoCtx::exec(), but https://tracker.ceph.com/issues/65889 led to two commits af17631 and 4612195 that replaced some IoCtx::exec() with calls to ObjectWriteOperation::exec(). if this implies IoCtx::exec() should be used only for reads, then that could be flagged as read-only and enforced on the osd with EIO

There is currently no policing in librados, or the OSD that the client correctly used ObjectReadOperation vs ObjectWriteOperation. I did consider simply overriding the ObjectOperation::exec() functions in ObjectReadOperation and assuming they are all read only. However, if any client has mistakenly used an ObjectReadOperation::exec() incorrectly, then this change would prevent that client from working. I considered this to be too high risk.

I am open to arguments to the contrary on this, as overrides would be a really neat way of implementing this.

Unfortunately, this would not work for NEORADOS, where I can't see a way of being explicit about read vs write.

@cbodley
Copy link
Contributor

cbodley commented Nov 17, 2025

thanks @aainscow,

I did consider simply overriding the ObjectOperation::exec() functions in ObjectReadOperation and assuming they are all read only.

i was hoping we could handle this in the ObjectReadOperation overloads for librados::IoCtx::operate()/aio_operate() (and neorados::RADOS::execute() for ReadOp) without needing to treat exec() as a special case

However, if any client has mistakenly used an ObjectReadOperation::exec() incorrectly, then this change would prevent that client from working. I considered this to be too high risk.

i think it would be nice to catch any misuse*. ceph components may not have enough test coverage to rely on osd enforcement to catch everything, but a code audit probably could. worth discussing, at least?

edit: *especially now that #56180 relies on this for local/balanced read affinity

@aainscow
Copy link
Contributor Author

@cbodley In response to your last comment, I would be interested in your take on my new templates, in the second commit.

There is quite a bit of boiler plate there in order to keep the legacy API/ABI in place, but I think this is a nice approach to policing that the client correctly marks class call/methods as read only or not.

When using the new interface:

  • It is a compile-time error to use a read-only class call with a write method.
  • The same mechanism defines the client and OSD definition of read/write/promote.
  • The cls/method strings are defined in one place only. (note I have only converted cls_version so far).

I think this should make balanced reads much safer, as well as providing a safe interface for my split-reads work.

@aainscow
Copy link
Contributor Author

@adamemerson - I have included you on the review, as you were last to modify the exec code in neorados.

@aainscow
Copy link
Contributor Author

@SrinivasaBharath I have removed the needs_qa. If you have not started a run, no need to include this PR. If you have started a run, do not abort it. The old code should not cause any regressions.

@cbodley
Copy link
Contributor

cbodley commented Nov 18, 2025

@aainscow i like the idea of compile-time checks, but i wonder how much of this is already covered by the "cls client" interfaces like cls_version_client.h:

void cls_version_set(librados::ObjectWriteOperation& op, obj_version& ver);
void cls_version_inc(librados::ObjectWriteOperation& op);
void cls_version_inc(librados::ObjectWriteOperation& op, obj_version& ver, VersionCond cond);
void cls_version_read(librados::ObjectReadOperation& op, obj_version *objv);
void cls_version_check(librados::ObjectOperation& op, obj_version& ver, VersionCond cond);

assuming ceph code is calling these functions instead of making raw calls to exec(), i'm not sure we're gaining much from the new templates - for example, this already fails to compile:

librados::ObjectReadOperation op;
cls_version_inc(op);

the fact that cls_version_check() takes the base librados::ObjectOperation& does make it difficult to flag its exec as read-only, but does ec-direct-read really need special handling for exec? once we submit the op to Objecter, it'll be flagged as CEPH_OSD_FLAG_READ and/or CEPH_OSD_FLAG_WRITE - can't the direct-read decision rely on that alone?

@aainscow
Copy link
Contributor Author

aainscow commented Nov 18, 2025

@cbodley Clients which call cls_* code are protected, so this gives safety, assuming the developer is well behaved and sticks to these functions.

The cls_* code itself, however, is unprotected. With balanced reads and EC split ops, data-integrity is at risk if a class call is marked as read only when it is not, so I wanted to add paranoid levels of compile-time protection.

If we are going to refactor all read-only execs to use the new read-only class call, then I think this code provides protection for the refactoring that we will require.

IMO, the new interface is much cleaner (in C++ at least). Having class and method strings hanging around in multiple places is "code smell".

@cbodley
Copy link
Contributor

cbodley commented Nov 18, 2025

i agree that the exec interface isn't great, but as long as we're committed to librados api/abi compatibility there will always be an unsafe way to use it. so i think the osd's enforcement is critical to making this safe. you're currently relying on a new CEPH_OSD_CLS_FLAG_READ_ONLY flag for that, so need a reliable way for the client to mark read-only execs

but does the osd not already have some logic to reject client writes that were misdirected at a non-primary osd? if not, maybe that would be a more general solution to this class of problems?

@aainscow
Copy link
Contributor Author

That is a good point, so perhaps "data integrity" is going too far. The OSD will never action a write for a read-only op and only a read-only op would ever be executed. Any call would still, however, be broken (the op will be failed back to the client) and as such, a more robust interface is still important.

So I think we have three ways to go:

  1. Stick with commit one only + maybe hunt down some more use-cases we need to split ops in RGW.
  2. Allow through all calls which are marked as balanced reads and trust the client. I would need to find a new way of policing this in the OSD, but maybe thats OK.
  3. The refactor I have proposed (or something similar).

I am going to dwell on this... Option (2) makes me uneasy....

@aainscow aainscow requested a review from idryomov November 19, 2025 14:22
@adamemerson adamemerson self-assigned this Nov 21, 2025
@aainscow aainscow changed the title osd: Introduce read only class calls and associated APIs osd/rados/rgw/cephfs: Modernise cls interface with compile time safety Nov 24, 2025
@aainscow aainscow requested a review from a team as a code owner November 24, 2025 21:29
@aainscow
Copy link
Contributor Author

aainscow commented Feb 4, 2026

@adamemerson I have addressed your reviews with explicit commits. Please verify:

  1. These latest commits are compile-only changes and do not need a new teuthology qa run.
  2. You are happy with the changes - I wondered if you would prefer that I carry the string_view deeper into the interfaces

Copy link
Contributor

@adamemerson adamemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Drop the string_view change. Everything else is fine and should be good compile-only.

@aainscow aainscow force-pushed the read_only_execs branch 2 times, most recently from fdf4293 to 7481e15 Compare February 5, 2026 19:33
@aainscow
Copy link
Contributor Author

aainscow commented Feb 5, 2026

jenkins test make check

@aainscow
Copy link
Contributor Author

aainscow commented Feb 5, 2026

Latest push was to fill in a missing signed-off statement

@aainscow
Copy link
Contributor Author

aainscow commented Feb 5, 2026

jenkins test make check

@adamemerson
Copy link
Contributor

jenkins test make check

@aainscow
Copy link
Contributor Author

jenkins test make check

@aainscow
Copy link
Contributor Author

aainscow commented Mar 6, 2026

Recent changes mean I need to request a new review

@ronen-fr
Copy link
Contributor

@aainscow (and @SrinivasaBharath ) - the only new failure in the test run is a CLS compilation
issue, and as such - may well be related to the PR.
See skanta-2026-03-08_04:40:13-rados-wip-bharath7-testing-2026-03-07-1317-distro-default-trial/94050/

@aainscow
Copy link
Contributor Author

@batrick I have rebased this to squash all the correcionss following their review. Only new addition is addressing the compile failure in the test run and new release notes as requested.

Status Quo:
Librados currently uses string-based identifiers for RADOS class method calls
(exec) in both ObjectReadOperation and ObjectWriteOperation. The API does
not distinguish between methods that modify object state and those that
are read-only.

Problem:
This lack of semantic enforcement has become a critical safety gap with the
introduction of EC Direct Reads. Because the EC direct read path is optimized
for performance, it may process the same read-only operation multiple times
(e.g., across different shards or during retries).

Previously, while technically a violation of API semantics, a "read-that-writes"
would often function correctly because operations were processed in-order and
without retries on the read path. However, in the context of EC direct reads,
executing a non-idempotent write through the read path poses a silent and
critical risk to data integrity.

Solution:
Introduce a template-based trait system to encode method semantics into the
API. By tagging methods with traits (MethodRead or MethodWrite), the compiler
now prevents state-modifying methods from being added to ObjectReadOperation.
This ensures that only side-effect-free methods reach the EC direct read
path, eliminating the risk of accidental double-writes at the source.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Migrate existing internal RADOS classes and the associated test infrastructure
to utilize the new template-based exec() interface. While the previous
commit introduced the mechanism, this change applies it across the
codebase to eliminate legacy string-based calls in internal logic.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
The osd code was not present, the radosacl tool did nothing useful and the scratchtoolpp
was testing something that doesn't exist.

key_value_store is optionally compiled into the key store, but seems to serve no useful purpose.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
There much mocking of exec going on in these tests. An extensive
search and replace was required to find them all.

Note that the failure looks like memory corruption when manipulating
the ops array - but the actually issue is the mocking not intercepting
the exec_impl call (and similar) correctly.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Update the Rados Gateway driver to utilize the new trait-based class
method interface. This replaces legacy string-based exec() calls with
the typesafe template-based alternatives.

Key Changes:

Updated reset_stats in buckets.cc to use cls::user::method::reset_user_stats2.

Updated RGWRadosBILogTrimCR in rgw_cr_rados.cc to use cls::rgw::method::bi_log_trim.

This change ensures that RGW operations are correctly categorized as
reads or writes at compile-time, aligning with the broader effort to
improve RADOS class method safety and OSD routing efficiency.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
@aainscow
Copy link
Contributor Author

jenkins test api

1 similar comment
@aainscow
Copy link
Contributor Author

jenkins test api

@aainscow
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.