Skip to content

crimson/osd/pg_backend: needn't check if os.exist #63144

Merged
Matan-B merged 2 commits intoceph:mainfrom
liu-chunmei:omap_noexist
Jul 7, 2025
Merged

crimson/osd/pg_backend: needn't check if os.exist #63144
Matan-B merged 2 commits intoceph:mainfrom
liu-chunmei:omap_noexist

Conversation

@liu-chunmei
Copy link
Contributor

for omap_get_vals_by_keys

fix ./bin/ceph_test_cls_cmpomap error for noexist tests

https://tracker.ceph.com/issues/71225

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@liu-chunmei liu-chunmei requested a review from a team as a code owner May 6, 2025 23:15
@liu-chunmei liu-chunmei requested review from a team and Matan-B and removed request for a team May 6, 2025 23:15
@Matan-B Matan-B added this to Crimson May 7, 2025
@Matan-B Matan-B moved this to Awaits review in Crimson May 7, 2025
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ceph_test_cls_cmpomap is being run from suites/rgw/verify/tasks/cls.yaml and therefore was never tested with Crimson.

Can we add ceph_test_cls_cmpomap either into makecheck unit tests or alternatively into our suite so we would avoid regressions?

object_stat_sum_t& delta_stats) const
{
if (!os.exists || os.oi.is_whiteout()) {
logger().debug("{}: object does not exist: {}", __func__, os.oi.soid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why is it expected to operate on non-exsiting objects in the commit messsage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Copy link
Contributor

@Matan-B Matan-B May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the test case expect false value return, not error code.

What do you mean by "false value"? Error codes are used internally - the raw error code value (int) is returned as an answer to the client eventually.

Looking at the cmp_set_vals_noexist_str test:

  EXPECT_EQ(do_cmp_set_vals(oid, Mode::String, Op::EQ, {{"eq", value}}), 0);

0 is expected while Crimson returns -2.


I'm not sure I understand why Classic is returning 0 here since the object store actually returns -2 as well in omap_get_values:

  if (!o || !o->exists) {
    r = -ENOENT;
    goto out;
  }

I think that either:

  1. Classic should also return -2 and the test itself should be fixed.
  2. CEPH_OSD_OP_FLAG_FAILOK is set, which hides the error and returns 0 instead - (why?)

Going back to my previous comment:

Can you please explain why is it expected to operate on non-existing objects in the commit message?

We should not change the behavior because the tests expects so as the test itself might possibly be wrong.

Lastly, I think that my comment regarding adding ceph_test_cls_cmpomap into makecheck or the crimson suite was missed. What do you think?

Copy link
Contributor

@Matan-B Matan-B May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is returned (when it shouldn't) for non-existing objects for classic:

	if (oi.is_omap()) {
	  osd->store->omap_get_values(ch, ghobject_t(soid), keys_to_get, &out);
	} // else return empty omap entries
        ...

The non-existing object is not an omap so no error is returned.

For Crimson:

  if (oi.is_omap()) {
    return store->omap_get_values(coll, ghobject_t{oi.soid}, keys_to_get);
  } else {
    return crimson::ct_error::enodata::make();
  }

and then:

    }).handle_error_interruptible(
      crimson::ct_error::enodata::handle([&osd_op] {
        ..
        osd_op.rval = 0;
        return ll_read_errorator::now();
      }),

We actually return enodata since this object is not an omap - but then we change the error code to zero (probably to align with classic).

I actually think we should return ENOENT here in both cases and change the test itself.
What do you think?

Copy link
Contributor

@athanatos athanatos May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matan-B Can't change the behavior without auditing users first. OMAPGETVALS, OMAPGETHEADER, and OMAPGETKEYS all seem to behave the same way -- empty, but successful return.

Short term, I'd suggest matching classic's behavior since this is a public interface other components already rely on.

Longer term, I do agree that the behavior is inconsistent with most of the other read ops, but changing the behavior would mean auditing rgw and rbd to make sure it won't break anything.

Copy link
Contributor

@Matan-B Matan-B May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so until the longer term solution is there we could do something like this:

  if (!os.exists || os.oi.is_whiteout()) {
    logger().debug("{}: object does not exist: {} returning empty result", __func__, os.oi.soid);
    // Although an error should be expected here since the object doesn't exist,
    // we want to match classic's behavior as other components already rely on it.
    // Return an empty, but successful return instead.
    //return crimson::ct_error::enoent::make();
    return ll_read_errorator::now();
  }

Instead of removing the check completely, we could keep it and explain what's going on to avoid future confusion here. Since the error "removal" is not trivial.
@athanatos, @liu-chunmei - What do you think?

As suggested earlier, let's also add this to our regular testing.

@Matan-B Matan-B moved this from Awaits review to In Progress in Crimson May 7, 2025
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Crimson is actually might be correct here.

object_stat_sum_t& delta_stats) const
{
if (!os.exists || os.oi.is_whiteout()) {
logger().debug("{}: object does not exist: {}", __func__, os.oi.soid);
Copy link
Contributor

@Matan-B Matan-B May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is returned (when it shouldn't) for non-existing objects for classic:

	if (oi.is_omap()) {
	  osd->store->omap_get_values(ch, ghobject_t(soid), keys_to_get, &out);
	} // else return empty omap entries
        ...

The non-existing object is not an omap so no error is returned.

For Crimson:

  if (oi.is_omap()) {
    return store->omap_get_values(coll, ghobject_t{oi.soid}, keys_to_get);
  } else {
    return crimson::ct_error::enodata::make();
  }

and then:

    }).handle_error_interruptible(
      crimson::ct_error::enodata::handle([&osd_op] {
        ..
        osd_op.rval = 0;
        return ll_read_errorator::now();
      }),

We actually return enodata since this object is not an omap - but then we change the error code to zero (probably to align with classic).

I actually think we should return ENOENT here in both cases and change the test itself.
What do you think?

@liu-chunmei
Copy link
Contributor Author

jenkins test make check

@liu-chunmei
Copy link
Contributor Author

jenkins test make check arm64

@liu-chunmei
Copy link
Contributor Author

jenkins retest this please

@liu-chunmei
Copy link
Contributor Author

@Matan-B and @athanatos remove that 2 line check and the remaining logic just like matan said as follows is reasonable for me.
For Crimson: (

maybe_get_omap_vals_by_keys(
  crimson::os::FuturizedStore::Shard* store,
  const crimson::os::CollectionRef& coll,
  const object_info_t& oi,
  const std::set<std::string>& keys_to_get)
{
  if (oi.is_omap()) {
    return store->omap_get_values(coll, ghobject_t{oi.soid}, keys_to_get);
  } else {
    return crimson::ct_error::enodata::make();
  }
}

and then:

PGBackend::omap_get_vals_by_keys(
...
return maybe_get_omap_vals_by_keys(store, coll, os.oi, keys_to_get)
...   
 }).handle_error_interruptible(
      crimson::ct_error::enodata::handle([&osd_op] {
        ..
        osd_op.rval = 0;
        return ll_read_errorator::now();
      }),

@liu-chunmei
Copy link
Contributor Author

@Matan-B I remember we return osd_op.rval = 0; just to meet the test requirements to keep the same as classic return.

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, two comments:

The test will pass now due to:

    }).handle_error_interruptible(
      crimson::ct_error::enodata::handle([&osd_op] {
        ..
        osd_op.rval = 0;
        return ll_read_errorator::now();
      }),

Can you please add a comment above:

        // Although an error should be expected here since the object doesn't exist,
        // we want to match classic's behavior as clients possibly rely on it.
        // Return an empty, but successful return instead.
        osd_op.rval = 0;

Lastly, can we please add that to the crimson-rados suite?
Thanks!

@Matan-B Matan-B dismissed their stale review June 25, 2025 06:57

addressed

@Matan-B Matan-B self-requested a review June 25, 2025 06:57
omap_get_vals_by_keys

fix ./bin/ceph_test_cls_cmpomap error for noexist tests
since the test case expect false value return, not error code.

Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>
Signed-off-by: Chunmei Liu <chunmei.liu@ibm.com>
@liu-chunmei
Copy link
Contributor Author

Can you please add a comment above:

        // Although an error should be expected here since the object doesn't exist,
        // we want to match classic's behavior as clients possibly rely on it.
        // Return an empty, but successful return instead.
        osd_op.rval = 0;

Lastly, can we please add that to the crimson-rados suite? Thanks!

done, add test_cls_cmpomap.sh to basic/tasks/cls.yaml

@Matan-B Matan-B moved this from In Progress to Needs QA in Crimson Jun 25, 2025
@Matan-B
Copy link
Contributor

Matan-B commented Jul 6, 2025

@liu-chunmei
Copy link
Contributor Author

@Matan-B the teuthology test result is at : https://pulpito.ceph.com/liucm-2025-07-02_21:32:59-crimson-rados-wip-liucm-omap-noexist-crimson-only-distro-crimson-debug-smithi/
seems no new errors, just Command failed on smithi045 with status 1: 'sudo chmod 777 /var/log/ceph' and slow op, please take a look, Thanks!

@Matan-B
Copy link
Contributor

Matan-B commented Jul 7, 2025

2025-07-04T19:26:39.996 INFO:tasks.workunit.client.0.smithi055.stdout:[----------] 31 tests from CmpOmap (1858 ms total)
2025-07-04T19:26:39.997 INFO:tasks.workunit.client.0.smithi055.stdout:
2025-07-04T19:26:39.997 INFO:tasks.workunit.client.0.smithi055.stdout:[----------] Global test environment tear-down
2025-07-04T19:26:40.169 INFO:tasks.workunit.client.0.smithi055.stdout:[==========] 31 tests from 1 test suite ran. (3445 ms total)
2025-07-04T19:26:40.169 INFO:tasks.workunit.client.0.smithi055.stdout:[  PASSED  ] 31 tests.

@Matan-B Matan-B merged commit 9323ea2 into ceph:main Jul 7, 2025
13 checks passed
@Matan-B Matan-B moved this from Needs QA to Merged (Main) in Crimson Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Merged (Main)

Development

Successfully merging this pull request may close these issues.

3 participants