osd,crimson/osd: rework of replica read and related state by athanatos · Pull Request #56677 · ceph/ceph

athanatos · 2024-04-04T00:15:50Z

This PR does a few related things:

rename last_update_ondisk to pg_committed_to and clarify semantics
update replica (ec and replicated) to maintain pg_committed_to rather than min_last_complete_ondisk
populate primary->replica (replicated and ec) messages to propagate pg_committed_to rather than min_last_complete_ondisk

The above result in addressing https://tracker.ceph.com/issues/65086 and https://tracker.ceph.com/issues/65085.

The commits are structured to break the above into steps. See commit messages and included code comments for further details.

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

athanatos · 2024-04-04T00:17:08Z

Initial short test runs on EC and balanced reads test cases seem clean:

src/crimson/osd/osd_operations/client_request.cc

github-actions · 2024-06-09T15:30:31Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

athanatos · 2024-06-12T02:42:51Z

New commits add mechanism for allowing replicas to do replica reads at last_update after a delay.

github-actions · 2024-06-16T09:06:07Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

github-actions · 2024-07-28T08:23:20Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Signed-off-by: Samuel Just <sjust@redhat.com>

…_updated Signed-off-by: Samuel Just <sjust@redhat.com>

Signed-off-by: Samuel Just <sjust@redhat.com>

The name last_update_ondisk is misleading as it suggests a local property like last_update_applied rather than a pg-global property. Clarify the name and add a much more specific comment. Signed-off-by: Samuel Just <sjust@redhat.com>

Avoid maintaining pg_committed_to if pg is not active. We can't guarantee that last_update won't become divergent, so it doesn't provide useful information. Signed-off-by: Samuel Just <sjust@redhat.com>

The purpose of this rename is merely to clarify that the necessary condition on ec roll-forward is that the pg has committed up to that point. Along with subsequent commits, this will clarify that both ec and replicated pools propagate pg_committed_to for related if not identical reasons. Because EC::submit_transaction already did op->roll_forward_to = std::max(min_last_complete_ondisk, rmw_pipeline.committed_to); there's no difference in behavior as rmw_pipeline.committed_to is updated immediately after the notification to the PG that the write completed. Signed-off-by: Samuel Just <sjust@redhat.com>

…last_complete_ondisk This commit updates the bulk of the interface pathways in crimson and classic to refer to pg_committed_to rather than min_last_complete_ondisk and changes the replica side to maintain pg_committed_to instead. This commit shouldn't actually cause any behavior change -- we're still passing min_last_complete_ondisk (which is a valid lower bound for pg_committed_to!). Signed-off-by: Samuel Just <sjust@redhat.com>

…nsaction This commit actually changes the bound we're propagating. This solves two bugs: - Using min_last_complete_ondisk caused replicas to be two update rounds behind rather than one - Replicas don't actually have enough information to set min_last_complete_ondisk on activation, so we couldn't serve replica reads until the first write. pg_committed_to, on the other hand, is fine as the activation last_update cannot become divergent. Moreover, last_complete won't advance past missing objects causing min_last_complete_ondisk to be blocked by any replica missing object. Note that the replica read pathway seperately checks whether the target is missing locally, so that property was not needed. Fixes: https://tracker.ceph.com/issues/65086 Fixes: https://tracker.ceph.com/issues/65085 Signed-off-by: Samuel Just <sjust@redhat.com>

…Missing and related Signed-off-by: Samuel Just <sjust@redhat.com>

…ries This matches the behavior for normal IOs. Signed-off-by: Samuel Just <sjust@redhat.com>

It wouldn't actually be wrong for the primary to trim the log right up to the pg_committed_to bound it is propagating (though it generally won't). Signed-off-by: Samuel Just <sjust@redhat.com>

See comment for details. Modifies ECBackend::submit_transaction to use the passed pg_committed_to unconditionally, adds a comment to explain, and adds a comment to RMWPipeline::pg_committed_to to clarify that it may lag PeeringState::pg_committed_to. Signed-off-by: Samuel Just <sjust@redhat.com>

Signed-off-by: Samuel Just <sjust@redhat.com>

Fixes: https://tracker.ceph.com/issues/65299 Signed-off-by: Samuel Just <sjust@redhat.com>

Signed-off-by: Samuel Just <sjust@redhat.com>

…notify_t Signed-off-by: Samuel Just <sjust@redhat.com>

Signed-off-by: Samuel Just <sjust@redhat.com>

…remove We don't support balanced reads on ec pools. Additionally, the yaml actually specifies 'balanced_reads' rather than 'balance_reads' and therefore has no actual effect. Signed-off-by: Samuel Just <sjust@redhat.com>

athanatos · 2024-10-29T00:29:42Z

jenkins test api

athanatos · 2024-10-29T00:32:38Z

Previous run uncovered a bug in intrusive_timer with initialization order, fixed in new version.

New test run: https://pulpito.ceph.com/sjust-2024-10-22_15:57:45-rados-wip-sjust-testing-2024-10-21-distro-default-smithi/
Some env failures, did a rerun: https://pulpito.ceph.com/sjust-2024-10-25_19:15:47-rados-wip-sjust-testing-2024-10-21-distro-default-smithi/

Remaining failures appear to be a subset of main: https://pulpito.ceph.com/teuthology-2024-10-27_20:00:17-rados-main-distro-default-smithi/

I think this is ready to merge. I'll merge later this week if there are no objections or outstanding reviews to wait for. @Matan-B @rzarzynski @cyx1231st

athanatos · 2024-11-02T00:53:27Z

jenkins test api

athanatos requested review from gregsfortytwo and yehudasa April 4, 2024 00:15

athanatos requested review from a team as code owners April 4, 2024 00:15

github-actions bot added core crimson labels Apr 4, 2024

athanatos force-pushed the sjust/for-review/wip-replica-read branch 3 times, most recently from b8e338f to 1e3d12b Compare April 9, 2024 20:56

cyx1231st reviewed Apr 10, 2024

View reviewed changes

src/crimson/osd/osd_operations/client_request.cc Outdated Show resolved Hide resolved

athanatos force-pushed the sjust/for-review/wip-replica-read branch from 1e3d12b to d50fb42 Compare April 10, 2024 23:26

github-actions bot added the needs-rebase label Jun 9, 2024

athanatos force-pushed the sjust/for-review/wip-replica-read branch from d50fb42 to 1035752 Compare June 12, 2024 02:41

github-actions bot added common mon tests needs-rebase and removed needs-rebase labels Jun 12, 2024

athanatos force-pushed the sjust/for-review/wip-replica-read branch from 1035752 to 20c4e77 Compare June 17, 2024 18:19

github-actions bot removed the needs-rebase label Jun 17, 2024

athanatos force-pushed the sjust/for-review/wip-replica-read branch from 20c4e77 to 7cb3c48 Compare June 17, 2024 20:09

rzarzynski requested review from Matan-B and rzarzynski July 8, 2024 18:40

github-actions bot added the needs-rebase label Jul 28, 2024

athanatos force-pushed the sjust/for-review/wip-replica-read branch from 7cb3c48 to 072d57e Compare August 7, 2024 02:19

athanatos added 25 commits October 18, 2024 20:33

osd/PrimaryLogPG: cosmetic fix for long debug line

f46e469

Signed-off-by: Samuel Just <sjust@redhat.com>

osd/PeeringState: remove unused PeeringState::append_log_with_trim_to…

7eebc62

…_updated Signed-off-by: Samuel Just <sjust@redhat.com>

osd: remove support for replicas without OSD_REPOP_MLCOD

bd4fa93

Signed-off-by: Samuel Just <sjust@redhat.com>

osd/PeeringState: refine pg_committed_to semantics

9d72303

Avoid maintaining pg_committed_to if pg is not active. We can't guarantee that last_update won't become divergent, so it doesn't provide useful information. Signed-off-by: Samuel Just <sjust@redhat.com>

osd,crimson/osd: roll_forward_to->pg_committed_to for MOSDPGUpdateLog…

407350d

…Missing and related Signed-off-by: Samuel Just <sjust@redhat.com>

osd,crimson/osd: pg_committed_to rather than mlcod for submit_log_ent…

a2d4faf

…ries This matches the behavior for normal IOs. Signed-off-by: Samuel Just <sjust@redhat.com>

osd/PrimaryLogPG: adjust assert in log_operation

8116c88

It wouldn't actually be wrong for the primary to trim the log right up to the pg_committed_to bound it is propagating (though it generally won't). Signed-off-by: Samuel Just <sjust@redhat.com>

osd,crimson/osd: remove external interfaces for mlcod

ba246b1

Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/.../client_request: minor cosmetic simplification

8c31d84

Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/.../client_request: bounce replica read if missing

8c4c22c

Fixes: https://tracker.ceph.com/issues/65299 Signed-off-by: Samuel Just <sjust@redhat.com>

osd,crimson/osd: add perf counters for replica reads

990051f

Signed-off-by: Samuel Just <sjust@redhat.com>

osdc: add replica read perf counters to Objecter

d760935

Signed-off-by: Samuel Just <sjust@redhat.com>

osd/PeeringState: proc_replica_info->proc_replica_notify, pass in pg_…

d20325a

…notify_t Signed-off-by: Samuel Just <sjust@redhat.com>

osd: introduce acting set specific feature vector

8e14ce0

Signed-off-by: Samuel Just <sjust@redhat.com>

osd/osd_types: add PCT_UPDATE_DELAY pool option

f4b0589

Signed-off-by: Samuel Just <sjust@redhat.com>

messages: add MOSDPGPCT

75236e9

Signed-off-by: Samuel Just <sjust@redhat.com>

common/intrusive_timer.h: introduce intrusive_timer

aee7b30

Signed-off-by: Samuel Just <sjust@redhat.com>

osd: wire up async primary->replica pct updates

8ab313f

Signed-off-by: Samuel Just <sjust@redhat.com>

qa/tasks/rados: set pct_update_delay if balance_reads is set

87c8a9c

Signed-off-by: Samuel Just <sjust@redhat.com>

athanatos force-pushed the sjust/for-review/wip-replica-read branch from e4515b9 to dda683b Compare October 22, 2024 04:16

athanatos merged commit 048ce81 into ceph:main Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd,crimson/osd: rework of replica read and related state#56677

osd,crimson/osd: rework of replica read and related state#56677
athanatos merged 26 commits intoceph:mainfrom
athanatos:sjust/for-review/wip-replica-read

athanatos commented Apr 4, 2024

Uh oh!

athanatos commented Apr 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Jun 9, 2024

Uh oh!

athanatos commented Jun 12, 2024

Uh oh!

github-actions bot commented Jun 16, 2024

Uh oh!

github-actions bot commented Jul 28, 2024

Uh oh!

athanatos commented Oct 29, 2024

Uh oh!

athanatos commented Oct 29, 2024

Uh oh!

athanatos commented Nov 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

athanatos commented Apr 4, 2024

Uh oh!

athanatos commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 9, 2024

Uh oh!

athanatos commented Jun 12, 2024

Uh oh!

github-actions bot commented Jun 16, 2024

Uh oh!

github-actions bot commented Jul 28, 2024

Uh oh!

athanatos commented Oct 29, 2024

Uh oh!

athanatos commented Oct 29, 2024

Uh oh!

athanatos commented Nov 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

athanatos commented Apr 4, 2024 •

edited

Loading