Skip to content

osd: return appropriate error if the object is not manifest#45137

Merged
athanatos merged 1 commit intoceph:masterfrom
myoungwon:wip-51627-2
Apr 11, 2022
Merged

osd: return appropriate error if the object is not manifest#45137
athanatos merged 1 commit intoceph:masterfrom
myoungwon:wip-51627-2

Conversation

@myoungwon
Copy link
Member

@myoungwon myoungwon commented Feb 24, 2022

Unlike existing object, manifest object requires valid adjacent
clones to calculate references.

If a user sends reference_chunk()---this is only for manifest object---
to the original object (not manifest object) which has not recovered snapshots,
the OSD triggers assert() because reference_chunk() try to
find adjacent clones, resulting in the assert(). This is because the original object
does not need to wait the recovery of snapshot.

So, this PR return an error if the object is not manifeset.

fixes: https://tracker.ceph.com/issues/54509

Signed-off-by: Myoungwon Oh myoungwon.oh@samsung.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@neha-ojha
Copy link
Member

jenkins test make check

@neha-ojha
Copy link
Member

@myoungwon could you please take a look at https://tracker.ceph.com/issues/51627#note-9?

@myoungwon
Copy link
Member Author

Sure

@myoungwon
Copy link
Member Author

ceph_assert(ctx->obc);
if (!ctx->obc->obs.oi.has_manifest() || !ctx->obc->obs.oi.manifest.is_chunked()) {
return -ENOLINK;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be in the implementation of get_manifest_ref_count?

@athanatos
Copy link
Contributor

LGTM pending test run

@neha-ojha
Copy link
Member

@myoungwon LibRadosTwoPoolsPP.ManifestRollbackRefcount is failing consistently in the test run+rerun of this PR, can you check what's going on? I am aware of https://tracker.ceph.com/issues/53219, but this failure was not very repeatable, AFAICT.

http://pulpito.front.sepia.ceph.com/yuriw-2022-03-18_13:48:12-rados-wip-yuri6-testing-2022-03-17-1547-distro-default-smithi/6745170/
http://pulpito.front.sepia.ceph.com/yuriw-2022-03-18_00:42:20-rados-wip-yuri6-testing-2022-03-17-1547-distro-default-smithi/6743703/

@neha-ojha
Copy link
Member

jenkins test make check

@myoungwon
Copy link
Member Author

@neha-ojha Ok, I'll check the issue.

@athanatos athanatos self-requested a review March 23, 2022 04:07
@athanatos
Copy link
Contributor

The second commit seems to revert the first. Is that intentional?

@myoungwon
Copy link
Member Author

Yes, I didn't catch a corner case before---the head is not manifest, but its snapshot has manifest. In this case, returning an error (because the head is not manifest) will not calculate reference count correctly. Therefore, I added the second commit---this commit try to search all snapshot the object ha. If this change is acceptable, I'll squash the two commits into one.

@athanatos
Copy link
Contributor

AFAICT, this is right. Please update PR description, squash the commits, and add the needs-qa flag since this needs to be retested.

If a user sends reference_chunk() to the original object
(not manifest object) which has not recovered snapshots,
the OSD triggers assert() because
reference_chunk() try to find adjacent unrecovered clones,
resulting in the assert().
This is because the original object does not wait the recovery
of snapshots.

To avoid this, this commit add a condition to check a base snapshot
is readable whether the object is manifest or not.
If the base snapshot is valid and the snapshot is manifest
, osd try to calculate reference count.

fixes: https://tracker.ceph.com/issues/54509

Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
@yuriw
Copy link
Contributor

yuriw commented Apr 11, 2022

@athanatos this passed tests, pls approve and merge at will
ref: https://trello.com/c/kCTebTFa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants