Skip to content

crimson/osd/recovery_backend: scan_for_backill tries to load the metadata even if the obc is already in the cache while its existed is false#60989

Closed
xxhdx1985126 wants to merge 1 commit intoceph:mainfrom
xxhdx1985126:wip-69154

Conversation

@xxhdx1985126
Copy link
Contributor

Fixes: https://tracker.ceph.com/issues/69154
Signed-off-by: Xuehan Xu xuxuehan@qianxin.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@xxhdx1985126
Copy link
Contributor Author

jenkins test api

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR contains 2 changes, the first is the avoiding the obc reload (which lgtm) while the second commit is the removal of the !existed case. Previously, we returned in that case and now we reload the metadata. Can you either explain why or separate the commits? Thank you!

Comment on lines -246 to -250
} else {
// if the object does not exist here, it must have been removed
// between the collection_list_partial and here. This can happen
// for the first item in the range, which is usually last_backfill.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why this case is no longer relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case still exists, in which the later "load_metadata" also returns an empty obc and the object won't be put into the backfill interval. So it looks to me that this PR doesn't conflict with the main branch in this case, am I right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will still hit this case when trying to reload although it's less explicit now.

obc = pg.obc_registry.maybe_get_cached_obc(object);
}
if (obc) {
if (obc->obs.exists) {
Copy link
Contributor

@Matan-B Matan-B Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_for_backill tries to load the metadata even if the obc is already in the cache while its existed is false

I might be missing something, when the above if case (obc->obs.exists) will be false and we will return in line 251, without loading the metadata. Right?

Meaning we have 3 cases (before the patch):

  1. obc found in cache && exists is true -> emplace it.
  2. obc found in cache && exists is false -> return
  3. obc not found -> load_metadata

With this patch the 3 cases are:

  1. obc found in cache && exists is true -> emplace it.
  2. obc found in cache && exists is false -> load_metadata
  3. obc not found -> load_metadata

This change is not reflected in the commit message/title, am I right? Is this the intended behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is correct:


crimson/osd/recovery_backend: fix scan_for_backill metadata reload

Currently there are 3 cases (before the patch):

obc found in cache && exists is true -> emplace it.
obc found in cache && exists is false -> return
obc not found -> load_metadata

We should reload the metadata **also** when exists is false in order to
exclude the possibility that there's an ongoing obc load.

Fixes: https://tracker.ceph.com/issues/69154
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right:-) Will update the commit message.

Comment on lines -246 to -250
} else {
// if the object does not exist here, it must have been removed
// between the collection_list_partial and here. This can happen
// for the first item in the range, which is usually last_backfill.
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will still hit this case when trying to reload although it's less explicit now.

metadata even if the obc is already in the cache while its existed
is false

Currently there are 3 cases (before the patch):

obc found in cache && exists is true -> emplace it.
obc found in cache && exists is false -> return
obc not found -> load_metadata

We should reload the metadata **also** when exists is false in order to
exclude the possibility that there's an ongoing obc load.

Fixes: https://tracker.ceph.com/issues/69154
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
@xxhdx1985126
Copy link
Contributor Author

@Matan-B I've just updated the commit message, please take a look again, thanks:-)

@Matan-B Matan-B self-requested a review January 23, 2025 10:23
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work. However, looking again, I think that we can reuse the object loader logic here and avoid being responsible for the reload/load.
What do you think about: #61536?

@athanatos
Copy link
Contributor

I also think it's probably better to use the new obc manager machinery rather than explicitly checking loaded.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@xxhdx1985126
Copy link
Contributor Author

closing this PR in favour of #61536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants