osd,crimson/osd: rework of replica read and related state#56677
osd,crimson/osd: rework of replica read and related state#56677
Conversation
|
Initial short test runs on EC and balanced reads test cases seem clean: |
b8e338f to
1e3d12b
Compare
1e3d12b to
d50fb42
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
d50fb42 to
1035752
Compare
|
New commits add mechanism for allowing replicas to do replica reads at last_update after a delay. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
1035752 to
20c4e77
Compare
20c4e77 to
7cb3c48
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
7cb3c48 to
072d57e
Compare
Signed-off-by: Samuel Just <sjust@redhat.com>
…_updated Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
The name last_update_ondisk is misleading as it suggests a local property like last_update_applied rather than a pg-global property. Clarify the name and add a much more specific comment. Signed-off-by: Samuel Just <sjust@redhat.com>
Avoid maintaining pg_committed_to if pg is not active. We can't guarantee that last_update won't become divergent, so it doesn't provide useful information. Signed-off-by: Samuel Just <sjust@redhat.com>
The purpose of this rename is merely to clarify that the necessary condition on ec roll-forward is that the pg has committed up to that point. Along with subsequent commits, this will clarify that both ec and replicated pools propagate pg_committed_to for related if not identical reasons. Because EC::submit_transaction already did op->roll_forward_to = std::max(min_last_complete_ondisk, rmw_pipeline.committed_to); there's no difference in behavior as rmw_pipeline.committed_to is updated immediately after the notification to the PG that the write completed. Signed-off-by: Samuel Just <sjust@redhat.com>
…last_complete_ondisk This commit updates the bulk of the interface pathways in crimson and classic to refer to pg_committed_to rather than min_last_complete_ondisk and changes the replica side to maintain pg_committed_to instead. This commit shouldn't actually cause any behavior change -- we're still passing min_last_complete_ondisk (which is a valid lower bound for pg_committed_to!). Signed-off-by: Samuel Just <sjust@redhat.com>
…nsaction This commit actually changes the bound we're propagating. This solves two bugs: - Using min_last_complete_ondisk caused replicas to be two update rounds behind rather than one - Replicas don't actually have enough information to set min_last_complete_ondisk on activation, so we couldn't serve replica reads until the first write. pg_committed_to, on the other hand, is fine as the activation last_update cannot become divergent. Moreover, last_complete won't advance past missing objects causing min_last_complete_ondisk to be blocked by any replica missing object. Note that the replica read pathway seperately checks whether the target is missing locally, so that property was not needed. Fixes: https://tracker.ceph.com/issues/65086 Fixes: https://tracker.ceph.com/issues/65085 Signed-off-by: Samuel Just <sjust@redhat.com>
…Missing and related Signed-off-by: Samuel Just <sjust@redhat.com>
…ries This matches the behavior for normal IOs. Signed-off-by: Samuel Just <sjust@redhat.com>
It wouldn't actually be wrong for the primary to trim the log right up to the pg_committed_to bound it is propagating (though it generally won't). Signed-off-by: Samuel Just <sjust@redhat.com>
See comment for details. Modifies ECBackend::submit_transaction to use the passed pg_committed_to unconditionally, adds a comment to explain, and adds a comment to RMWPipeline::pg_committed_to to clarify that it may lag PeeringState::pg_committed_to. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Fixes: https://tracker.ceph.com/issues/65299 Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
…notify_t Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
…remove We don't support balanced reads on ec pools. Additionally, the yaml actually specifies 'balanced_reads' rather than 'balance_reads' and therefore has no actual effect. Signed-off-by: Samuel Just <sjust@redhat.com>
e4515b9 to
dda683b
Compare
|
jenkins test api |
|
Previous run uncovered a bug in intrusive_timer with initialization order, fixed in new version. New test run: https://pulpito.ceph.com/sjust-2024-10-22_15:57:45-rados-wip-sjust-testing-2024-10-21-distro-default-smithi/ Remaining failures appear to be a subset of main: https://pulpito.ceph.com/teuthology-2024-10-27_20:00:17-rados-main-distro-default-smithi/ I think this is ready to merge. I'll merge later this week if there are no objections or outstanding reviews to wait for. @Matan-B @rzarzynski @cyx1231st |
|
jenkins test api |
This PR does a few related things:
The above result in addressing https://tracker.ceph.com/issues/65086 and https://tracker.ceph.com/issues/65085.
The commits are structured to break the above into steps. See commit messages and included code comments for further details.
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e