Retrying replication requests on replica doesn't call `onRetry` by bleskes · Pull Request #21189 · elastic/elasticsearch

bleskes · 2016-10-30T18:18:52Z

Replication request may arrive at a replica before the replica's node has processed a required mapping update. In these cases the TransportReplicationAction will retry the request once a new cluster state arrives. Sadly that retry logic failed to call ReplicationRequest#onRetry, causing duplicates in the append only use case.

This PR fixes this and also the test which missed the check. I also added an assertion which would have helped finding the source of the duplicates.

This was discovered by https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=opensuse/174/

The test also surfaces an issue with mapping updates on the master (they are potentially performed on a live index :( ) but this will be fixed in another PR.

Relates #20211

…n engine.

s1monw · 2016-10-30T18:56:07Z

bummer that we missed that one... glad we have it fixed now.

s1monw · 2016-10-30T18:57:30Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            throw new AssertionError("doc [" + index.type() + "][" + index.id() + "] exists in version map (version " + versionValue + ")");
+        }
+        try (final Searcher searcher = acquireSearcher("assert doc doesn't exist")) {
+            final long docsWithId = searcher.reader().totalTermFreq(index.uid());


this is tricky TTF will not respect deletes, I think you need to fetch the doc though. It's not much overhead compared to this.

s1monw · 2016-10-30T18:57:41Z

.../test/java/org/elasticsearch/action/support/replication/TransportReplicationActionTests.java

+            (TransportReplicationAction.ConcreteShardRequest<Request>) capturedRequest.request;
+        assertThat(concreteShardRequest.getRequest(), equalTo(request));
+        assertThat(concreteShardRequest.getRequest().isRetrySet.get(), equalTo(true));
+        assertThat(concreteShardRequest.getTargetAllocationID(),


s1monw · 2016-10-30T18:57:58Z

LGTM except of the assert

bleskes · 2016-10-30T19:21:41Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            throw new AssertionError("doc [" + index.type() + "][" + index.id() + "] exists in version map (version " + versionValue + ")");
+        }
+        try (final Searcher searcher = acquireSearcher("assert doc doesn't exist")) {
+            final long docsWithId = searcher.searcher().count(new TermQuery(index.uid()));


@s1monw like this?

yeah this looks better but I think we can still have a doc in the index when we haven't refreshed AND the doc is deleted? I think we should not check the searcher if versionValue.delete() == true?

@s1monw good one. I hardened the check

s1monw · 2016-10-31T10:04:08Z

LGTM

s1monw · 2016-10-31T10:04:16Z

test this please

bleskes · 2016-10-31T12:44:13Z

thx @s1monw . I'll give it a few hours on CI before back porting.

s1monw · 2016-10-31T14:25:00Z

++ thanks @bleskes

Replication request may arrive at a replica before the replica's node has processed a required mapping update. In these cases the TransportReplicationAction will retry the request once a new cluster state arrives. Sadly that retry logic failed to call `ReplicationRequest#onRetry`, causing duplicates in the append only use case. This commit fixes this and also the test which missed the check. I also added an assertion which would have helped finding the source of the duplicates. This was discovered by https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=opensuse/174/ Relates #20211

bleskes · 2016-11-01T10:09:03Z

This is now pushed to 5.0.1 & 5.1.0 as well

When processing a mapping updates, the master current creates an `IndexService` and uses its mapper service to do the hard work. However, if the master is also a data node and it already has an instance of `IndexService`, we currently reuse the the `MapperService` of that instance. Sadly, since mapping updates are change the in memory objects, this means that a mapping change that can rejected later on during cluster state publishing will leave a side effect on the index in question, bypassing the cluster state safety mechanism. This commit removes this optimization and replaces the `IndexService` creation with a direct creation of a `MapperService`. Also, this fixes an issue multiple from multiple shards for the same field caused unneeded cluster state publishing as the current code always created a new cluster state. This were discovered while researching #21189

fix retry marker when retrying on replica and add better assertions i…

2d380ec

…n engine.

bleskes added >bug v6.0.0-alpha1 v5.1.1 v5.0.1 labels Oct 30, 2016

bleskes assigned s1monw Oct 30, 2016

s1monw reviewed Oct 30, 2016

View reviewed changes

use search instead of term frequency

064a343

bleskes commented Oct 30, 2016

View reviewed changes

better deal with deletes in assertDocDoesNotExist

6f597e9

bleskes merged commit e7cfe10 into elastic:master Oct 31, 2016

bleskes deleted the retry_on_replica branch October 31, 2016 12:43

s1monw added critical :Engine labels Oct 31, 2016

bleskes mentioned this pull request Nov 3, 2016

Uncommitted mapping updates should not efect existing indices #21306

Merged

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Engine labels Feb 13, 2018

clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 13, 2018

itiyamas mentioned this pull request Mar 21, 2019

Optimize append-only indexing throughput by avoiding term searches for document id. #40290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrying replication requests on replica doesn't call `onRetry`#21189

Retrying replication requests on replica doesn't call `onRetry`#21189
bleskes merged 3 commits intoelastic:masterfrom
bleskes:retry_on_replica

bleskes commented Oct 30, 2016

Uh oh!

s1monw commented Oct 30, 2016

Uh oh!

s1monw Oct 30, 2016

Uh oh!

s1monw Oct 30, 2016

Uh oh!

s1monw commented Oct 30, 2016

Uh oh!

bleskes Oct 30, 2016

Uh oh!

s1monw Oct 30, 2016

Uh oh!

bleskes Oct 31, 2016

Uh oh!

s1monw commented Oct 31, 2016

Uh oh!

s1monw commented Oct 31, 2016

Uh oh!

bleskes commented Oct 31, 2016

Uh oh!

s1monw commented Oct 31, 2016

Uh oh!

bleskes commented Nov 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bleskes commented Oct 30, 2016

Uh oh!

s1monw commented Oct 30, 2016

Uh oh!

s1monw Oct 30, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw Oct 30, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw commented Oct 30, 2016

Uh oh!

bleskes Oct 30, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw Oct 30, 2016

Choose a reason for hiding this comment

Uh oh!

bleskes Oct 31, 2016

Choose a reason for hiding this comment

Uh oh!

s1monw commented Oct 31, 2016

Uh oh!

s1monw commented Oct 31, 2016

Uh oh!

bleskes commented Oct 31, 2016

Uh oh!

s1monw commented Oct 31, 2016

Uh oh!

bleskes commented Nov 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants