osd: Multiple fixes to optimized EC and peering#63873
Conversation
|
Thanks for the clear comments (and for the code quality in general...). |
|
LGTM. Minor notes still left to respond/fix, but not important enough to delay approval. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
The op.is_delete() function only returns true if the op is not ALSO doing something else (e.g. a clone). This causes issues with clearing the new EC extent cache. Also improve some debug and code cleanliness. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
…ized EC. The problem was that incorrect shard_versions were being detected by scrub. The comment in the code explains the detailed problem and solution. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Code changes to prevent create then erase of empty shard. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Inserting the first parity buffer was causing the ro-range within the SEM to be incorrectly calculated. Simple fix and I have added some unit tests to defend this error in the future. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Some client OPs are able to generate transactions which delete an object and then write it again. This is used by the copy-from ops. If such a write is not 4k aligned, then the new EC code was incorrectly doing a read-modify write on the misaligned 4k. This causes some garbage to be written to the backend OSD, off the end of the object. This is only a problem if the object is later extended without the end being written. Problematic sequence is: 1. Create two objects (A and B) of size X and Y where: X > Y, (Y % 4096) != 0 2. copy_from OP B -> A 3. Extend B without writing offset Y+1 This will result in a corrupt data buffer at Y+1 without this fix. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
Scrub revealed a bug whereby the non-primary shards were being given a version number in the OI which did not match the expected version in the authoritative OI. A secondary issue is that all attributes were being pushed to the non primary shards, whereas only OI is actually needed. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com> # Conflicts: # src/osd/osd_types.h
If a ReadOp from EC contains two objects where one object only reads from a single shard, but other onjects require other shards, then this bug can be hit. The fix should make it clear what the issue is Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
In old EC, the full stripe was always read and written. In new EC, we only attempt to recover the shards that were missing. If an old OSD is available, the read can be directed there. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
…ct is recovering. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
get_all_remaining_reads() is used for two purposes: 1. If a read unexpectedly fails, recover from other shards. 2. If a shard is missing, but is allowed to be missing (typically due to unequal shard sizes), we rely on this function to return no new reads, without an error. The test failure we saw case (2), but I think case (1) is important. Most of the time we probably would not notice, but if insufficient redundancy exists without the for_recovery being set, then this will result in recovery failing. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
This was a particular edge case whereby the need to do an encode and a decode as part of recovery was causing EC to attempt to do a decode off the end of the shard, despite this being unnecessary. Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
|
@yuriw @ljflores @SrinivasaBharath Fix pushed. I have sent to ceph-ci and I will see if I can figure out how to re-run some tests! |
|
@aainscow ping me if you need any help with that! I will also let Bharath know that we need to redo the batch for this. So know that it is getting tested individually here: https://tracker.ceph.com/issues/71921 |
@aainscow @bill-scales I actually just found this same issue (the Objecter issue) on a different batch, so I think that points to this being unrelated. |
Also found this one in a different batch, so likely unrelated: https://tracker.ceph.com/issues/71931 |
|
jenkins test make check arm64 |
|
jenkins test signed |
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
|
All ready to go. I want to double check with @rzarzynski that he is good with the merging. |
|
|
||
| void add(const pg_log_entry_t& e, bool applied = true) { | ||
| void add(const pg_log_entry_t& e, enum NonPrimary nonprimary, bool applied, pg_info_t *info, LogEntryHandler *h) { | ||
| mark_writeout_from(e.version); |
There was a problem hiding this comment.
the 'enum' should be removed in some future fix
This PR is a result of several weeks/months of teuthology tests for EC, running in a branch where the optimised EC is always on.
The commits themselves describe the individual problems that are being fixed and each fix is relatively self contained. Where multiple attempts at a fix were made, these have been squashed, but largely each commit reflects the result of a teuthology test failure.
There is one commit which is a collection of cosmetic/debug/typos, rather than spreading these out over other commits.
Tracker: https://tracker.ceph.com/issues/71814
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition