osd: implement per-pg leases to avoid stale reads#29236
Merged
tchaikov merged 32 commits intoceph:masterfrom Sep 29, 2019
Merged
osd: implement per-pg leases to avoid stale reads#29236tchaikov merged 32 commits intoceph:masterfrom
tchaikov merged 32 commits intoceph:masterfrom
Conversation
6c1c9d2 to
c0605fc
Compare
athanatos
reviewed
Jul 25, 2019
src/messages/MOSDPing.h
Outdated
| utime_t ping_stamp; ///< when the PING was sent | ||
| ceph::signedspan mono_ping_stamp; ///< relative to sender's clock | ||
| ceph::signedspan mono_send_stamp; ///< replier's send stamp | ||
| boost::optional<ceph::time_detail::signedspan> delta_ub; ///< ping sender |
c0605fc to
af7d7ea
Compare
af7d7ea to
c7179fe
Compare
eef5dd8 to
d697e11
Compare
d697e11 to
2f7e969
Compare
Contributor
|
retest this please. |
2f7e969 to
c6b39a5
Compare
c6b39a5 to
7e908f4
Compare
578c7fa to
83ca87c
Compare
2211f62 to
f5fc93a
Compare
f5fc93a to
f6a1b5c
Compare
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
PG is laggy (unreadable) because ping(s) are delayed. Signed-off-by: Sage Weil <sage@redhat.com>
PG is waiting for previous intervals' readable intervals to expire. Signed-off-by: Sage Weil <sage@redhat.com>
This is the simplest strategy--much simpler than queueing them and waking them up again later. Signed-off-by: Sage Weil <sage@redhat.com>
e47e8f5 to
ef29f4d
Compare
Contributor
|
LGTM |
ef29f4d to
565f723
Compare
Contributor
could you please help add dummy methods for crimson? |
Signed-off-by: Sage Weil <sage@redhat.com>
Keep track of which OSDs from the prior set we care about that affect the prior_readable_until_ub. Note that it is only the *down* OSDs that we have to track here, since everything in the *probe* set we will already contact during peering (they are still up), guaranteeing that those PGs are aware of the interval change and are no longer readable in the prior interval. Signed-off-by: Sage Weil <sage@redhat.com>
If we see that a prior_readable_down_osd is known to be dead, we can remove it from the set. And if the set is empty, we can skip the rest of our waiting period and leave the WAIT state. Signed-off-by: Sage Weil <sage@redhat.com>
We want to renew before we prepeare or send activate messages so that we have the opportunity to include leases in them (coming soon!). And we do not want to send explicit lease messages until we know that the peers have activate. In particular, we want to avoid queueing a notify (via pending_activators) and then sending a lease that will arrive before it. Signed-off-by: Sage Weil <sage@redhat.com>
The lease goes out with the MOSDPGLog or info, and the ack comes back with the info. We no longer need to renew the lease explicitly in all_activated_and_committed() because we *just* piggybacked on activation. We can just wait for the normal renew event to fire. Signed-off-by: Sage Weil <sage@redhat.com>
We only do this for primary -> replica, so we only need to proc_lease() from the replica states. Signed-off-by: Sage Weil <sage@redhat.com>
The 'replica' term does not map well onto EC pools. More importantly, the implementation is often wrong for EC pools, where role may be 0 or 1 for EC pools independent of whether the OSD is the primary or not. Introduce 'nonprimary' to mean an acting osd that is not the primary. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
If there are no down OSDs from prior intervals, then the normal peering process will end up contacting all of the prior OSDs and ensuring that their prior interval is terminated during peering. Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
These are stubs; the reschule one (at minimum) probably needs a meaningful implementation in order for the PG to peer in some cases. Signed-off-by: Sage Weil <sage@redhat.com>
910d232 to
da2dc1c
Compare
Member
Author
|
@tchaikov added stubs to that it builds, but they implementations probably need to be fleshed out in order for the pg to successfully peer in many cases. |
Contributor
|
thanks @liewegas ! i will come up with a follow-up PR. |
Contributor
|
tchaikov
reviewed
Sep 29, 2019
| - \(POOL_APP_NOT_ENABLED\) | ||
| - \(SLOW_OPS\) | ||
| - \(PG_AVAILABILITY\) | ||
| - \(PG_DEGRADED\) |
Contributor
There was a problem hiding this comment.
for posterity, the PG_DEGRADED failure was addressed by the change after this PR was picked up by my batch.
14 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.