crimson/osd/pg_backend: needn't check if os.exist by liu-chunmei · Pull Request #63144 · ceph/ceph

liu-chunmei · 2025-05-06T23:15:15Z

for omap_get_vals_by_keys

fix ./bin/ceph_test_cls_cmpomap error for noexist tests

https://tracker.ceph.com/issues/71225

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

Matan-B

ceph_test_cls_cmpomap is being run from suites/rgw/verify/tasks/cls.yaml and therefore was never tested with Crimson.

Can we add ceph_test_cls_cmpomap either into makecheck unit tests or alternatively into our suite so we would avoid regressions?

Matan-B · 2025-05-07T12:09:25Z

src/crimson/osd/pg_backend.cc

  object_stat_sum_t& delta_stats) const
 {
-  if (!os.exists || os.oi.is_whiteout()) {
-    logger().debug("{}: object does not exist: {}", __func__, os.oi.soid);


Can you please explain why is it expected to operate on non-exsiting objects in the commit messsage?

since the test case expect false value return, not error code.

What do you mean by "false value"? Error codes are used internally - the raw error code value (int) is returned as an answer to the client eventually.

Looking at the cmp_set_vals_noexist_str test:

EXPECT_EQ(do_cmp_set_vals(oid, Mode::String, Op::EQ, {{"eq", value}}), 0);

0 is expected while Crimson returns -2.

I'm not sure I understand why Classic is returning 0 here since the object store actually returns -2 as well in omap_get_values:

if (!o || !o->exists) { r = -ENOENT; goto out; }

I think that either:

Classic should also return -2 and the test itself should be fixed.

~~CEPH_OSD_OP_FLAG_FAILOK is set, which hides the error and returns 0 instead - (why?)~~

Going back to my previous comment:

Can you please explain why is it expected to operate on non-existing objects in the commit message?

We should not change the behavior because the tests expects so as the test itself might possibly be wrong.

Lastly, I think that my comment regarding adding ceph_test_cls_cmpomap into makecheck or the crimson suite was missed. What do you think?

0 is returned (when it shouldn't) for non-existing objects for classic:

if (oi.is_omap()) { osd->store->omap_get_values(ch, ghobject_t(soid), keys_to_get, &out); } // else return empty omap entries ...

The non-existing object is not an omap so no error is returned.

For Crimson:

if (oi.is_omap()) { return store->omap_get_values(coll, ghobject_t{oi.soid}, keys_to_get); } else { return crimson::ct_error::enodata::make(); }

and then:

}).handle_error_interruptible( crimson::ct_error::enodata::handle([&osd_op] { .. osd_op.rval = 0; return ll_read_errorator::now(); }),

We actually return enodata since this object is not an omap - but then we change the error code to zero (probably to align with classic).

I actually think we should return ENOENT here in both cases and change the test itself.
What do you think?

@Matan-B Can't change the behavior without auditing users first. OMAPGETVALS, OMAPGETHEADER, and OMAPGETKEYS all seem to behave the same way -- empty, but successful return.

Short term, I'd suggest matching classic's behavior since this is a public interface other components already rely on.

Longer term, I do agree that the behavior is inconsistent with most of the other read ops, but changing the behavior would mean auditing rgw and rbd to make sure it won't break anything.

Ah, so until the longer term solution is there we could do something like this:

if (!os.exists || os.oi.is_whiteout()) { logger().debug("{}: object does not exist: {} returning empty result", __func__, os.oi.soid); // Although an error should be expected here since the object doesn't exist, // we want to match classic's behavior as other components already rely on it. // Return an empty, but successful return instead. //return crimson::ct_error::enoent::make(); return ll_read_errorator::now(); }

Instead of removing the check completely, we could keep it and explain what's going on to avoid future confusion here. Since the error "removal" is not trivial.
@athanatos, @liu-chunmei - What do you think?

As suggested earlier, let's also add this to our regular testing.

Matan-B

I think that Crimson is actually might be correct here.

Matan-B · 2025-05-15T09:12:14Z

src/crimson/osd/pg_backend.cc

  object_stat_sum_t& delta_stats) const
 {
-  if (!os.exists || os.oi.is_whiteout()) {
-    logger().debug("{}: object does not exist: {}", __func__, os.oi.soid);


0 is returned (when it shouldn't) for non-existing objects for classic:

if (oi.is_omap()) { osd->store->omap_get_values(ch, ghobject_t(soid), keys_to_get, &out); } // else return empty omap entries ...

The non-existing object is not an omap so no error is returned.

For Crimson:

if (oi.is_omap()) { return store->omap_get_values(coll, ghobject_t{oi.soid}, keys_to_get); } else { return crimson::ct_error::enodata::make(); }

and then:

}).handle_error_interruptible( crimson::ct_error::enodata::handle([&osd_op] { .. osd_op.rval = 0; return ll_read_errorator::now(); }),

We actually return enodata since this object is not an omap - but then we change the error code to zero (probably to align with classic).

I actually think we should return ENOENT here in both cases and change the test itself.
What do you think?

liu-chunmei · 2025-06-24T23:49:31Z

jenkins test make check

liu-chunmei · 2025-06-24T23:49:48Z

jenkins test make check arm64

liu-chunmei · 2025-06-24T23:50:35Z

jenkins retest this please

liu-chunmei · 2025-06-25T00:36:55Z

@Matan-B and @athanatos remove that 2 line check and the remaining logic just like matan said as follows is reasonable for me.
For Crimson: (

maybe_get_omap_vals_by_keys(
  crimson::os::FuturizedStore::Shard* store,
  const crimson::os::CollectionRef& coll,
  const object_info_t& oi,
  const std::set<std::string>& keys_to_get)
{
  if (oi.is_omap()) {
    return store->omap_get_values(coll, ghobject_t{oi.soid}, keys_to_get);
  } else {
    return crimson::ct_error::enodata::make();
  }
}

and then:

PGBackend::omap_get_vals_by_keys(
...
return maybe_get_omap_vals_by_keys(store, coll, os.oi, keys_to_get)
...   
 }).handle_error_interruptible(
      crimson::ct_error::enodata::handle([&osd_op] {
        ..
        osd_op.rval = 0;
        return ll_read_errorator::now();
      }),

liu-chunmei · 2025-06-25T02:53:24Z

@Matan-B I remember we return osd_op.rval = 0; just to meet the test requirements to keep the same as classic return.

Matan-B

LGTM, two comments:

The test will pass now due to:

    }).handle_error_interruptible(
      crimson::ct_error::enodata::handle([&osd_op] {
        ..
        osd_op.rval = 0;
        return ll_read_errorator::now();
      }),

Can you please add a comment above:

        // Although an error should be expected here since the object doesn't exist,
        // we want to match classic's behavior as clients possibly rely on it.
        // Return an empty, but successful return instead.
        osd_op.rval = 0;

Lastly, can we please add that to the crimson-rados suite?
Thanks!

addressed

omap_get_vals_by_keys fix ./bin/ceph_test_cls_cmpomap error for noexist tests since the test case expect false value return, not error code. Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>

Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>

liu-chunmei · 2025-06-25T07:20:40Z

Can you please add a comment above:

        // Although an error should be expected here since the object doesn't exist,
        // we want to match classic's behavior as clients possibly rely on it.
        // Return an empty, but successful return instead.
        osd_op.rval = 0;

Lastly, can we please add that to the crimson-rados suite? Thanks!

done, add test_cls_cmpomap.sh to basic/tasks/cls.yaml

Matan-B · 2025-07-06T08:08:34Z

https://shaman.ceph.com/builds/ceph/wip-matanb-crimson-only-testing-6-jul/

liu-chunmei · 2025-07-06T22:09:04Z

@Matan-B the teuthology test result is at : https://pulpito.ceph.com/liucm-2025-07-02_21:32:59-crimson-rados-wip-liucm-omap-noexist-crimson-only-distro-crimson-debug-smithi/
seems no new errors, just Command failed on smithi045 with status 1: 'sudo chmod 777 /var/log/ceph' and slow op, please take a look, Thanks!

Matan-B · 2025-07-07T06:30:24Z

2025-07-04T19:26:39.996 INFO:tasks.workunit.client.0.smithi055.stdout:[----------] 31 tests from CmpOmap (1858 ms total)
2025-07-04T19:26:39.997 INFO:tasks.workunit.client.0.smithi055.stdout:
2025-07-04T19:26:39.997 INFO:tasks.workunit.client.0.smithi055.stdout:[----------] Global test environment tear-down
2025-07-04T19:26:40.169 INFO:tasks.workunit.client.0.smithi055.stdout:[==========] 31 tests from 1 test suite ran. (3445 ms total)
2025-07-04T19:26:40.169 INFO:tasks.workunit.client.0.smithi055.stdout:[  PASSED  ] 31 tests.

liu-chunmei requested a review from a team as a code owner May 6, 2025 23:15

github-actions bot added the crimson label May 6, 2025

liu-chunmei requested review from a team and Matan-B and removed request for a team May 6, 2025 23:15

Matan-B added this to Crimson May 7, 2025

Matan-B moved this to Awaits review in Crimson May 7, 2025

Matan-B reviewed May 7, 2025

View reviewed changes

Matan-B moved this from Awaits review to In Progress in Crimson May 7, 2025

liu-chunmei force-pushed the omap_noexist branch from 2989e2f to 080de9a Compare May 14, 2025 23:04

Matan-B previously requested changes May 15, 2025

View reviewed changes

Matan-B added the crimson backport tentacle label May 15, 2025

liu-chunmei force-pushed the omap_noexist branch from 080de9a to 88c874f Compare June 25, 2025 00:38

Matan-B reviewed Jun 25, 2025

View reviewed changes

Matan-B self-requested a review June 25, 2025 06:57

liu-chunmei added 2 commits June 25, 2025 07:04

crimson/osd/pg_backend: needn't check if os.exist for

1002889

omap_get_vals_by_keys fix ./bin/ceph_test_cls_cmpomap error for noexist tests since the test case expect false value return, not error code. Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>

qa: add test_cls_cmpomap.sh to crimson-rados basic tasks

4461f23

Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>

liu-chunmei force-pushed the omap_noexist branch from 88c874f to 4461f23 Compare June 25, 2025 07:18

github-actions bot added the tests label Jun 25, 2025

Matan-B approved these changes Jun 25, 2025

View reviewed changes

Matan-B moved this from In Progress to Needs QA in Crimson Jun 25, 2025

Matan-B added needs-crimson-qa and removed crimson backport tentacle labels Jun 25, 2025

Matan-B added the wip-matan-testing label Jul 6, 2025

Matan-B merged commit 9323ea2 into ceph:main Jul 7, 2025
13 checks passed

Matan-B moved this from Needs QA to Merged (Main) in Crimson Jul 7, 2025

Matan-B mentioned this pull request Jul 13, 2025

crimson: implement crimson Omap iterate interface #62530

Merged

11 tasks

Conversation

liu-chunmei commented May 6, 2025

Contribution Guidelines

Checklist

Uh oh!

Matan-B left a comment

Choose a reason for hiding this comment

Uh oh!

Matan-B May 7, 2025

Choose a reason for hiding this comment

Uh oh!

liu-chunmei May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Matan-B May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Matan-B May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athanatos May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Matan-B May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Matan-B left a comment

Choose a reason for hiding this comment

Uh oh!

Matan-B May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liu-chunmei commented Jun 24, 2025

Uh oh!

liu-chunmei commented Jun 24, 2025

Uh oh!

liu-chunmei commented Jun 24, 2025

Uh oh!

liu-chunmei commented Jun 25, 2025

Uh oh!

liu-chunmei commented Jun 25, 2025

Uh oh!

Matan-B left a comment

Choose a reason for hiding this comment

Uh oh!

liu-chunmei commented Jun 25, 2025

Uh oh!

Matan-B commented Jul 6, 2025

Uh oh!

liu-chunmei commented Jul 6, 2025

Uh oh!

Matan-B commented Jul 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Matan-B May 15, 2025 •

edited

Loading

Matan-B May 15, 2025 •

edited

Loading

athanatos May 20, 2025 •

edited

Loading

Matan-B May 21, 2025 •

edited

Loading

Matan-B May 15, 2025 •

edited

Loading