ShardBulkAction ignore primary response on primary#38901
Merged
henningandersen merged 3 commits intoelastic:masterfrom Feb 15, 2019
Merged
Conversation
Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.
Collaborator
|
Pinging @elastic/es-distributed |
ywelsch
approved these changes
Feb 14, 2019
| assertThat(response.getSeqNo(), equalTo(13L)); | ||
| } | ||
|
|
||
| private void randomSetIgnoredPrimaryResponse(BulkItemRequest primaryRequest) { |
|
|
||
|
|
||
| // once this has proven to work out fine in all cases, we can revert this to randomly picking the conflict mode. | ||
| public void testAckedIndexCreateOnly() throws Exception { |
Contributor
There was a problem hiding this comment.
nit: testAckedIndexingWithCreateOpType
| testAckedIndexing(ConflictMode.create); | ||
| } | ||
|
|
||
| public void testAckedIndexExternalVersioning() throws Exception { |
Contributor
There was a problem hiding this comment.
nit: testAckedIndexingWithExternalVersioning
| final List<Exception> exceptedExceptions = new CopyOnWriteArrayList<>(); | ||
|
|
||
| logger.info("starting indexers"); | ||
| // final ConflictMode conflictMode = ConflictMode.randomMode(); |
Contributor
There was a problem hiding this comment.
I think it's ok to chose this randomly instead of having three separate tests. Especially as this test typically takes a bit of time to run.
Contributor
Author
There was a problem hiding this comment.
I wanted my PR builds to have all 3 variants running. Will change to randomMode before merging to master.
| .setTimeout(timeout); | ||
|
|
||
| if (conflictMode == ConflictMode.external) { | ||
| indexRequestBuilder.setVersion(10).setVersionType(VersionType.EXTERNAL); |
Contributor
There was a problem hiding this comment.
randomly chose a version, e.g. randomIntBetween(1, 10)?
Better naming of test methods and use a random external version.
Collapse 3 tests into one and pick the mode randomly instead.
Contributor
Author
|
@elasticmachine run elasticsearch-ci/1 |
henningandersen
added a commit
that referenced
this pull request
Feb 15, 2019
Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was made in the past as an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. It could then fail on its own update. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.
henningandersen
added a commit
that referenced
this pull request
Feb 15, 2019
Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was made in the past as an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. It could then fail on its own update. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.
henningandersen
added a commit
that referenced
this pull request
Feb 15, 2019
Previously, if a version conflict occurred and a previous primary response was present, the original primary response would be used both for sending to replica and back to client. This was made in the past as an attempt to fix issues with conflicts after relocations where a bulk request would experience a closed shard half way through and thus have to retry on the new primary. It could then fail on its own update. With sequence numbers, this leads to an issue, since if a primary is demoted (network partitions), it will send along the original response in the request. In case of a conflict on the new primary, the old response is sent to the replica. That data could be stale, leading to inconsistency between primary and replica. Relocations now do an explicit hand-off from old to new primary and ensures that no operations are active while doing this. Above is thus no longer necessary. This change removes the special handling of conflicts and ignores primary responses when executing shard bulk requests on the primary.
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Feb 15, 2019
* elastic/master: Avoid double term construction in DfsPhase (elastic#38716) Fix typo in DateRange docs (yyy → yyyy) (elastic#38883) Introduced class reuses follow parameter code between ShardFollowTasks (elastic#38910) Ensure random timestamps are within search boundary (elastic#38753) [CI] Muting method testFollowIndex in IndexFollowingIT Update Lucene snapshot repo for 7.0.0-beta1 (elastic#38946) SQL: Doc on syntax (identifiers in particular) (elastic#38662) Upgrade to Gradle 5.2.1 (elastic#38880) Tie break search shard iterator comparisons on cluster alias (elastic#38853) Also mmap cfs files for hybridfs (elastic#38940) Build: Fix issue with test status logging (elastic#38799) Adapt FullClusterRestartIT on master (elastic#38856) Fix testAutoFollowing test to use createLeaderIndex() helper method. Migrate muted auto follow rolling upgrade test and unmute this test (elastic#38900) ShardBulkAction ignore primary response on primary (elastic#38901) Recover peers from translog, ignoring soft deletes (elastic#38904) Fix NPE on Stale Index in IndicesService (elastic#38891) Smarter CCR concurrent file chunk fetching (elastic#38841) Fix intermittent failure in ApiKeyIntegTests (elastic#38627) re-enable SmokeTestWatcherWithSecurityIT (elastic#38814)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, if a version conflict occurred and a previous primary
response was present, the original primary response would be used both
for sending to replica and back to client. This was an attempt to fix
issues with conflicts after relocations where a bulk request would
experience a closed shard half way through and thus have to retry on the
new primary.
With sequence numbers, this leads to an issue, since if a primary is
demoted (network partitions), it will send along the original response
in the request. In case of a conflict on the new primary, the old
response is sent to the replica. That data could be stale, leading to
inconsistency between primary and replica.
Relocations now do an explicit hand-off from old to new primary and
ensures that no operations are active while doing this. Above is thus no
longer necessary. This change removes the special handling of conflicts
and ignores primary responses when executing shard bulk requests on the
primary.
In a follow up PR, we should consider removing the mutation of the request and
thus not send along the old primary response to the new primary.