Create peer-recovery retention leases by DaveCTurner · Pull Request #43190 · elastic/elasticsearch

DaveCTurner · 2019-06-13T09:05:38Z

This creates a peer-recovery retention lease for every shard during recovery,
ensuring that the replication group retains history for future peer recoveries.
It also ensures that leases for active shard copies do not expire, and leases
for inactive shard copies expire immediately if the shard is fully-allocated.

Relates #41536

This creates a peer-recovery retention lease for every shard during recovery, ensuring that the replication group retains history for future peer recoveries. It also ensures that leases for active shard copies do not expire, and leases for inactive shard copies expire immediately if the shard is fully-allocated. Relates elastic#41536

elasticmachine · 2019-06-13T09:05:40Z

Pinging @elastic/es-distributed

…peer-recovery-retention-leases

ywelsch

I've done a first pass and left some questions, mainly to get a better understanding of the scope of the change.

ywelsch · 2019-06-17T09:27:09Z

client/rest-high-level/src/test/java/org/elasticsearch/client/CCRIT.java

            final List<?> leases = (List<?>) retentionLeasesStats.get("leases");
-            assertThat(leases, empty());
+            for (final Object lease : leases) {
+                assertThat(((Map<?, ?>) lease).get("source"), equalTo(ReplicationTracker.PEER_RECOVERY_RETENTION_LEASE_SOURCE));


can we instead assert the absence of CCR leases?

Not as robustly as I'd like, no. We could say there's no leases with source "ccr", but that's a lot weaker than saying the only remaining leases are PRRLs, similarly to how we previously asserted that there were no leases at all.

Can we use toMapExcludingPeerRecoveryRetentionLeases here?

Not very easily. Here we are the other side of the high-level REST API, and this doesn't include indices stats so we don't have access to a RetentionLeases object. It would be quite some work to build one.

ah, I did not realize that it's a rest test.

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

ywelsch · 2019-06-17T09:44:06Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

        this.pendingInSync = new HashSet<>();
        this.routingTable = null;
        this.replicationGroup = null;
+        assert Version.V_EMPTY.equals(indexSettings.getIndexVersionCreated()) == false;


is this to catch issues where tests have not been properly set up?

Yes, if this is unset then the crucial assertions are skipped, which is Very Bad™.

ywelsch · 2019-06-17T09:49:12Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+            if (retentionLeases.get(leaseId) == null) {
+                /*
+                 * We might have got here here via a rolling upgrade from an older version that doesn't create peer recovery retention
+                 * leases for every shard copy, but in this case we do not expect any leases to exist.


this might also be a recovery from store?
What about when we become primary due to a primary relocation? Do we need to do this as well?

This comment is explaining the following if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_8_0_0)). Covering the cases when the index was created in an earlier version is out of scope here.

In a primary relocation the new primary, being a tracked replica, already has a lease.

Covering the cases when the index was created in an earlier version is out of scope here.

Did you mean there will be another change here? Why don't we do it now ;). The relocating target should not have a lease if the old primary was on an old version.

This change is already a substantial +665/-185, and I think it's unwise to bring BWC into scope at this time. Note that this PR is against a feature branch, not master, so we're ok with missing features for now.

ywelsch · 2019-06-17T09:53:20Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

+                && shard.indexSettings().getIndexMetaData().getState() != IndexMetaData.State.CLOSE) {
+                runUnderPrimaryPermit(() -> {
+                    try {
+                        // blindly create the lease. TODO integrate this with the recovery process


I'm not sure what you mean by "blindly" here and what integration you're referring to.

With this change, retention leases have no impact on the recovery process, nor do we make any attempt to add a lease for history we've any hope of retaining. E.g. with a file-based recovery we add a lease for all history.

In due course the recovery process will be made more dependent on leases.

ywelsch · 2019-06-17T10:05:21Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+                        : routingTable.activeShards() + " vs " + shardAllocationId;
+                    assert replicationGroup.getReplicationTargets().equals(Collections.singletonList(primaryShard));
+
+                    // Safe to call innerAddRetentionLease() without a subsequent sync since there are no other members of this replication


we don't need a sync, but why not do one any way? This will persist the leases locally on disk

Doing a sync on the cluster applier thread isn't possible as things stand because of the reroute phase; it also would mean waiting for the sync to return, which is something we try and avoid on the applier thread.

We could explicitly persist the leases when calling activatePrimaryMode but I don't think it's necessary to do so.

ywelsch · 2019-06-17T10:16:06Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+     * Advance the peer-recovery retention lease for all tracked shard copies, for use in tests until advancing these leases is done
+     * properly. TODO remove this.
+     */
+    public synchronized void advancePeerRecoveryRetentionLeasesToGlobalCheckpoints() {


why are we not automatically advancing the leases when the global checkpoints advance? Is it because it breaks some tests right now?

Mainly because I think this change is already large enough without this feature too, and we haven't settled for definite on whether these leases should be GCP-based. Advancing the leases is needed in the tests in very few places, but I haven't tried advancing them more eagerly.

DaveCTurner

Thanks @ywelsch, I responded.

DaveCTurner · 2019-06-17T11:04:35Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+     * Advance the peer-recovery retention lease for all tracked shard copies, for use in tests until advancing these leases is done
+     * properly. TODO remove this.
+     */
+    public synchronized void advancePeerRecoveryRetentionLeasesToGlobalCheckpoints() {


Mainly because I think this change is already large enough without this feature too, and we haven't settled for definite on whether these leases should be GCP-based. Advancing the leases is needed in the tests in very few places, but I haven't tried advancing them more eagerly.

DaveCTurner · 2019-06-17T11:13:13Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

        this.pendingInSync = new HashSet<>();
        this.routingTable = null;
        this.replicationGroup = null;
+        assert Version.V_EMPTY.equals(indexSettings.getIndexVersionCreated()) == false;


Yes, if this is unset then the crucial assertions are skipped, which is Very Bad™.

DaveCTurner · 2019-06-17T11:19:56Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+            if (retentionLeases.get(leaseId) == null) {
+                /*
+                 * We might have got here here via a rolling upgrade from an older version that doesn't create peer recovery retention
+                 * leases for every shard copy, but in this case we do not expect any leases to exist.


This comment is explaining the following if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_8_0_0)). Covering the cases when the index was created in an earlier version is out of scope here.

In a primary relocation the new primary, being a tracked replica, already has a lease.

DaveCTurner · 2019-06-17T11:34:12Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

+                && shard.indexSettings().getIndexMetaData().getState() != IndexMetaData.State.CLOSE) {
+                runUnderPrimaryPermit(() -> {
+                    try {
+                        // blindly create the lease. TODO integrate this with the recovery process


With this change, retention leases have no impact on the recovery process, nor do we make any attempt to add a lease for history we've any hope of retaining. E.g. with a file-based recovery we add a lease for all history.

In due course the recovery process will be made more dependent on leases.

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

DaveCTurner · 2019-06-17T12:17:32Z

client/rest-high-level/src/test/java/org/elasticsearch/client/CCRIT.java

            final List<?> leases = (List<?>) retentionLeasesStats.get("leases");
-            assertThat(leases, empty());
+            for (final Object lease : leases) {
+                assertThat(((Map<?, ?>) lease).get("source"), equalTo(ReplicationTracker.PEER_RECOVERY_RETENTION_LEASE_SOURCE));


Not as robustly as I'd like, no. We could say there's no leases with source "ccr", but that's a lot weaker than saying the only remaining leases are PRRLs, similarly to how we previously asserted that there were no leases at all.

DaveCTurner · 2019-06-17T12:20:37Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+                        : routingTable.activeShards() + " vs " + shardAllocationId;
+                    assert replicationGroup.getReplicationTargets().equals(Collections.singletonList(primaryShard));
+
+                    // Safe to call innerAddRetentionLease() without a subsequent sync since there are no other members of this replication


Doing a sync on the cluster applier thread isn't possible as things stand because of the reroute phase; it also would mean waiting for the sync to return, which is something we try and avoid on the applier thread.

We could explicitly persist the leases when calling activatePrimaryMode but I don't think it's necessary to do so.

dnhatn

Thanks @DaveCTurner. I left some comments.

dnhatn · 2019-06-17T14:08:44Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+    /**
+     * Source for peer recovery retention leases; see {@link ReplicationTracker#addPeerRecoveryRetentionLease}.
+     */
+    public static final String PEER_RECOVERY_RETENTION_LEASE_SOURCE = "peer recovery";


How about moving this constant and two related static methods to RetentionLease class instead?

I don't think RetentionLease should know about this special kind of retention lease.

dnhatn · 2019-06-17T14:12:19Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

                .stream()
-                .collect(Collectors.groupingBy(lease -> currentTimeMillis - lease.timestamp() > retentionLeaseMillis));
+                .collect(Collectors.groupingBy(lease -> {
+                    if (lease.source().equals(PEER_RECOVERY_RETENTION_LEASE_SOURCE)) {


Can we make this check a method of RetentionLease?

As in #43190 (comment) I don't think RetentionLease should know about this special kind of retention lease.

dnhatn · 2019-06-17T14:17:17Z

server/src/main/java/org/elasticsearch/index/seqno/RetentionLeases.java

     */
-    static Map<String, RetentionLease> toMap(final RetentionLeases retentionLeases) {
-        return retentionLeases.leases;
+    public static Map<String, RetentionLease> toMapExcludingPeerRecoveryRetentionLeases(final RetentionLeases retentionLeases) {


Can we move this method to test? Maybe test framework?

Sure, I pushed 78dd210.

dnhatn · 2019-06-17T14:24:36Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+     * containing the persistent node ID calculated by {@link ReplicationTracker#getPeerRecoveryRetentionLeaseId}, and retain operations
+     * with sequence numbers strictly greater than the given global checkpoint.
+     */
+    public void addPeerRecoveryRetentionLease(String nodeId, long globalCheckpoint, ActionListener<ReplicationResponse> listener) {


Can we remove this method and prepare these parameters in IndexShard instead?

We could but I think it's appropriate to do this here given that you need to do this when working with the ReplicationTracker in isolation, e.g. PeerRecoveryRetentionLeaseExpiryTests.

dnhatn · 2019-06-17T14:32:30Z

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

+            if (retentionLeases.get(leaseId) == null) {
+                /*
+                 * We might have got here here via a rolling upgrade from an older version that doesn't create peer recovery retention
+                 * leases for every shard copy, but in this case we do not expect any leases to exist.


Covering the cases when the index was created in an earlier version is out of scope here.

Did you mean there will be another change here? Why don't we do it now ;). The relocating target should not have a lease if the old primary was on an old version.

dnhatn · 2019-06-17T14:40:52Z

x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/IndexFollowingIT.java

+            .flatMap(n -> StreamSupport.stream(getLeaderCluster().getInstance(IndicesService.class, n).spliterator(), false))
+            .flatMap(n -> StreamSupport.stream(n.spliterator(), false))
+            .filter(indexShard -> indexShard.shardId().getIndexName().equals("index1"))
+            .filter(indexShard -> indexShard.routingEntry().primary())


This is used in tests only but we should make it more robust. See #40386 (comment).

This is here to call the temporary advancePeerRecoveryRetentionLeasesToGlobalCheckpoints method, pending implementation of the proper way to advance the leases. Once that happens, it'll be gone. Are you saying that this test sometimes fails? I don't expect the primaries on the leader to move around during this test.

dnhatn · 2019-06-17T15:06:17Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

+                runUnderPrimaryPermit(() -> {
+                    try {
+                        // blindly create the lease. TODO integrate this with the recovery process
+                        shard.addPeerRecoveryRetentionLease(request.targetNode().getId(), startingSeqNo - 1, establishRetentionLeaseStep);


The second parameter of addPeerRecoveryRetentionLease is "global checkpoint" which does not match startingSeqNo - 1.

Indeed, but it will be once we only copy operations that are above the GCP :)

I pushed a clarification in 47b6b42.

dnhatn · 2019-06-17T15:13:01Z

client/rest-high-level/src/test/java/org/elasticsearch/client/CCRIT.java

            final List<?> leases = (List<?>) retentionLeasesStats.get("leases");
-            assertThat(leases, empty());
+            for (final Object lease : leases) {
+                assertThat(((Map<?, ?>) lease).get("source"), equalTo(ReplicationTracker.PEER_RECOVERY_RETENTION_LEASE_SOURCE));


Can we use toMapExcludingPeerRecoveryRetentionLeases here?

…peer-recovery-retention-leases

dnhatn

LGTM given this PR will go into a feature branch. Thanks @DaveCTurner.

…peer-recovery-retention-leases

This creates a peer-recovery retention lease for every shard during recovery, ensuring that the replication group retains history for future peer recoveries. It also ensures that leases for active shard copies do not expire, and leases for inactive shard copies expire immediately if the shard is fully-allocated. Relates #41536

DaveCTurner added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jun 13, 2019

DaveCTurner requested review from dnhatn and ywelsch June 13, 2019 09:05

DaveCTurner mentioned this pull request Jun 13, 2019

Retain history for peer recovery using leases #41536

Closed

10 tasks

DaveCTurner added 11 commits June 13, 2019 10:41

Fix ShardChangesTests

6f446a2

Fix CCRIT

4716a9c

Advance leases on replicas too

0f2f2b7

Unnecessary method, inline it

9a2f984

Imports

043771a

Fix CcrRetentionLeaseIT

32837be

Line length

4085c74

Expect leases

e89051a

ffs

e5c46a6

Merge branch 'peer-recovery-retention-leases' into 2019-06-13-create-…

8c30a5c

…peer-recovery-retention-leases

Merge branch 'peer-recovery-retention-leases' into 2019-06-13-create-…

3266d90

…peer-recovery-retention-leases

ywelsch reviewed Jun 17, 2019

View reviewed changes

DaveCTurner commented Jun 17, 2019

View reviewed changes

DaveCTurner requested a review from ywelsch June 17, 2019 12:30

dnhatn reviewed Jun 17, 2019

View reviewed changes

DaveCTurner added 5 commits June 18, 2019 10:56

Merge branch 'peer-recovery-retention-leases' into 2019-06-13-create-…

9451133

…peer-recovery-retention-leases

No need to check relocation targets specially since elastic#43276

d7fa139

Clarify why we use startingSeqNo - 1

47b6b42

Assert correct source

64481dc

Move toMapExcludingPeerRecoveryRetentionLeases to RetentionLeaseUtils

78dd210

DaveCTurner requested a review from dnhatn June 18, 2019 11:09

Line length

09027bc

dnhatn approved these changes Jun 18, 2019

View reviewed changes

Merge branch 'peer-recovery-retention-leases' into 2019-06-13-create-…

2edd640

…peer-recovery-retention-leases

DaveCTurner merged commit 2ec1483 into elastic:peer-recovery-retention-leases Jun 19, 2019

DaveCTurner deleted the 2019-06-13-create-peer-recovery-retention-leases branch June 19, 2019 16:39

Conversation

DaveCTurner commented Jun 13, 2019

Uh oh!

elasticmachine commented Jun 13, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!