osd: Remove leaked clone objects (SnapMapper malformed key)#52971
osd: Remove leaked clone objects (SnapMapper malformed key)#52971
Conversation
67bd50a to
e31aa95
Compare
|
@ronen-fr Thank you for the comments. Since this is PR is in early WIP stage, I prefer to first verify that this approach is indeed valid and reliable (for backporting) before addressing them. Do you find any design flaws with this approach? |
Working on that... |
|
I don't want to add a librados api for this purpose. librados api commands need to be maintained long term and shouldn't be created for one-off fixes. Instead, I'd suggest a mon command. As an aside, we do probably want there to be a script which re-removes snaps in smallish batches (probably specified as an argument to the above command) which are confirmed to be trimmed before moving onto the next one. As such, the command should either take a snapshot to remove or a set (comma delimited?). That avoids needing to re-trim all of the previously removed snapshots on a cluster at once, and allows specifically re-removing as little as a single snap. |
6ddf32b to
ecf7a37
Compare
f0135a3 to
cd8ad69
Compare
Moved to a new mon command.
Not relevant anymore, see new comment.
Added a lower and upper snap id bounds. Example usage for rbd:
|
cd8ad69 to
f7d9e23
Compare
|
@athanatos, I updated the monitor command to simply iterate up to the latest Note: In earlier discussions I assumed that the purged snaps kv entries are not complete. However, that seem to be wrong as they are not being trimmed in any way. |
| @@ -3093,6 +3093,22 @@ will start to track new ops received afterwards."; | |||
| pg_recovery_stats.reset(); | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
I don't understand how this command is supposed to work. Add brief, precise comments to the header declarations of convert_malformed_key and convert_malformed in the format:
/**
* method_name
*
* Explanation of what method_name *does*, not how it works. A reader should be able
* to use this method correctly by reading this comment without needing to read the
* implementation.
*/
If you need to elaborate on something in the implementation, do it in the definition body.
From what I can tell from the implementation, it looks like the goal here is to remove the malformed key and write back (for an ec pool) all 128 possible legacy keys. What's the plan for ultimately removing the extras? The OSD has enough information to map the hobject_t to a PG. From there, if the OSD only has a single shard of the PG (which it almost always will), you can infer the shard. You could pass a lambda in to do that mapping. You could simply skip any objects in PGs that happen to have multiple shards present (lambda returns NOSHARD or something).
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
e6b7cd9 to
009fbfe
Compare
I'm not sure I understand how,
Malformed keys aren't sorted since they share the same prefix with the correct ones. Other comments were addressed. |
/** * convert_malformed * * Scans the SnapMapper for malformed keys created by a bug * during upgrade from N (and earlier) to O (up to 16.2.11). * Each detected key is replaced with a valid key instead. * See: https://tracker.ceph.com/issues/62596 * */ * TODO: use the OSD to get information to map the hobject_t to a PG * instead of adding all possible EC shard prefixes. Converts each <MAPPING_PREFIX><poolid>_<snapid>_ ("malformed key") to <LEGACY_MAPPING_PREFIX><snapid>_<shard_id>_<hobject_t::to_str()> by running: `ceph daemon osd.<id> fix_malformed_snapmapper_keys` to each (affected) osd in the acting set. Fixes: Part 1/3 - https://tracker.ceph.com/issues/62596 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
009fbfe to
adf25db
Compare
adf25db to
2979e2b
Compare
/** * remove_possible_keys * * SnapMapper::convert_malformed() inserts all the possible keys. * Remove the extra keys inserted once `force_reremove_snap` was used. * */ * TODO: bound to transaction size Fixes: Part 3/3 - https://tracker.ceph.com/issues/62596 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2979e2b to
ba6c967
Compare
|
Step 2/3 was separated into #53235 since it may stand as a standalone step which unnecessarily relates to the issue addressed in this PR. |
|
@athanatos, the malformed keys are not unique and therefore are useless. The last malformed key will override the previous ones, marking as DNM until verified. |
|
Closing, See tracker for resolving this issue. |
This PR aims to to solve a clone objects leak to clusters affected by: https://tracker.ceph.com/issues/56147
Step 1: (for each affected osd)
Using a new asock command (
fix_malformed_snapmapper_keys) that converts the malformed SnapMapper keys to the correct structure.To cover clone objects that belong to an EC pool, all possible shard prefixes are inserted. (Total of 128 keys).
Step 2: (mon command)
Separated into osd: Add force-reremove-snap mon command #53235
Since the correct key exists,
ceph osd pool force_reremove_snap <pool> <lower bound> <upper bound>mon command can be used to remove any non-existing (already removed) snapshots.Step 3: (for each affected osd)
Clean up the extra inserted keys in step 1 after re-deleting the snapshots by using (
cleanup_snapmapper_possible_keys)The leaked clone objects associated to the re-deleted snapshot will be removed once the key is fixed.
TO DO:
Fixes: https://tracker.ceph.com/issues/62596
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows