osd: add has_manifest_chunk to count chunks in snapshot#38767
osd: add has_manifest_chunk to count chunks in snapshot#38767athanatos merged 4 commits intoceph:masterfrom
Conversation
src/osd/PrimaryLogPG.cc
Outdated
| { | ||
| if (!obc->obs.oi.has_manifest()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Probably want to remove this check from cls_has_chunk as well.
|
What is cls_has_chunk suppose to do? Why do we want to return true if any snapshot has that chunk? Do we use it to validate whether a particular chunk reference is live? Wouldn't it actually have to return the number of references to a particular chunk (an object might reference the same chunk twice at different offsets within the object)? |
|
cls_has_chunk is used for chunk scrub by ceph_dedup_tool in order to find out the reference mismatch. Due to the false-positive design on reference counting,
what we should do here is to examine that CHUNKA exists in the low tier, and then check that the manifest object A contains CHUNKA's reference. If not, we should remove the CHUNKA. |
c2865cc to
fddc6a1
Compare
|
Ok, what about the multiple reference case I mention above? |
|
As your comment, there is a case as below, In this case, cl_has_chunk() should return the number of references to decrease the reference count in CHUNKA. |
|
Yeah, that's what I'm asking about. Do we record how many references each backref holds? |
|
Done. Can you take a look? |
5bc4aea to
0cc6aa0
Compare
src/osd/PrimaryLogPG.cc
Outdated
| } | ||
| } | ||
|
|
||
| int PrimaryLogPG::has_manifest_chunk(ObjectContextRef obc, std::string& fp_oid) |
0cc6aa0 to
78ed661
Compare
src/osd/PrimaryLogPG.cc
Outdated
| obc_g ? &(obc_g->obs.oi.manifest) : nullptr , | ||
| nullptr, | ||
| refs); | ||
| if (!refs.is_empty()) { |
There was a problem hiding this comment.
You don't need to do this check, the for loop before would just immediately terminate.
|
@athanatos I addressed your comment. |
|
Needs a rados suite run prior to merge. |
cls_has_chunk does not cover snapshotted manifest object. This leads to unexpected behavior during chunk scrub. Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
Signed-off-by: Myoungwon Oh <myoungwon.oh@samsung.com>
a860d7f to
1109a59
Compare
|
I just rebased this PR to run a rados suite on top of the latest master. https://pulpito.ceph.com/myoungwon-2021-01-21_09:53:20-rados-wip-manifest-chunk-scrub-distro-basic-smithi/ |
|
@myoungwon There are a ton of dead and failed runs there. |
|
Hm.. The dead and fail cases are related to this PR?
In addition, the failed case is already tracked by https://tracker.ceph.com/issues/48786, https://tracker.ceph.com/issues/48915. |
cls_has_chunk does not cover snapshotted manifest object.
This leads to unexpected behavior during chunk scrub.
Signed-off-by: Myoungwon Oh myoungwon.oh@samsung.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox