Skip to content

[Segment Replication] Update PrimaryShardAllocator to prefer replicas with higher replication checkpoint#4041

Merged
dreamer-89 merged 11 commits intoopensearch-project:mainfrom
dreamer-89:segrep_failover_primary_selection
Aug 18, 2022
Merged

[Segment Replication] Update PrimaryShardAllocator to prefer replicas with higher replication checkpoint#4041
dreamer-89 merged 11 commits intoopensearch-project:mainfrom
dreamer-89:segrep_failover_primary_selection

Conversation

@dreamer-89
Copy link
Copy Markdown
Member

@dreamer-89 dreamer-89 commented Jul 29, 2022

Signed-off-by: Suraj Singh surajrider@gmail.com

Description

Update the PrimaryShardAllocator (PSA) to include highest ReplicationCheckpoint (RC) info for evaluating shard to be promoted as primary.

Today, PSA orders shard copies from nodes holding active copy of unassigned primary shard and orders using below comparators and selects the first the first shard (satisfying existing allocation deciders).

  1. Readable shards with no exception on index open
  2. Existing primary. Shards which are marked primary are preferred over replica copies.

As part of this change, we are adding another comparator (runs after above two) which orders shard copies based on highest RC (a > b or a == b && c > d here a, b are primary terms & c,d are segment info version fields of RC. Please check #4112 for details). This is to ensure in case of failover master always chooses the shard copy which is furthest ahead (after satisfying first 2 comparators) and keeps the translog operations to rerun at minimum.

Please note, there are two places where shard promotion logic is used. This PR handles the PrimaryShardAllocator. The other one is RoutingNodes.failShard which will be taken up in follow up PR. Please check #3988 for more details.

Pending

Integration tests which needs #3989 changes (performs InternalEngine <-> NRTEngineReplication) in main.

Issues Resolved

#3988

Related

#2212
#4112

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dreamer-89 dreamer-89 requested review from a team and reta as code owners July 29, 2022 00:21
@dreamer-89 dreamer-89 marked this pull request as draft July 29, 2022 00:21
@dreamer-89 dreamer-89 force-pushed the segrep_failover_primary_selection branch 8 times, most recently from ebd18e5 to 5b5b36d Compare August 3, 2022 21:49
@dreamer-89 dreamer-89 force-pushed the segrep_failover_primary_selection branch from 5b5b36d to f0e264c Compare August 4, 2022 00:08
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Aug 4, 2022

Gradle Check (Jenkins) Run Completed with:

@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 4, 2022

Codecov Report

❌ Patch coverage is 62.85714% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.61%. Comparing base (6993ac9) to head (1f07d13).
⚠️ Report is 4749 commits behind head on main.

Files with missing lines Patch % Lines
...ateway/TransportNodesListGatewayStartedShards.java 47.61% 8 Missing and 3 partials ⚠️
.../opensearch/test/gateway/TestGatewayAllocator.java 71.42% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4041      +/-   ##
============================================
- Coverage     70.68%   70.61%   -0.07%     
+ Complexity    57125    57041      -84     
============================================
  Files          4603     4603              
  Lines        274535   274563      +28     
  Branches      40209    40216       +7     
============================================
- Hits         194058   193886     -172     
- Misses        64196    64355     +159     
- Partials      16281    16322      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
@opensearch-project opensearch-project deleted a comment from github-actions bot Aug 4, 2022
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make sure we update this to V_2_3_0 when this is backported?

@dreamer-89 dreamer-89 force-pushed the segrep_failover_primary_selection branch from dd79408 to de07dac Compare August 9, 2022 18:24
@dreamer-89
Copy link
Copy Markdown
Member Author

dreamer-89 commented Aug 9, 2022

@Bukhtawar @andrross @mch2 : Addressed the review comments, please have a look.

@dreamer-89
Copy link
Copy Markdown
Member Author

dreamer-89 commented Aug 9, 2022

Trying to write an IT which simulate no active shard copies (to ensure RoutingNodes#failShard doesn't select any random active replica) during primary failure but at the same time shard copies should be part of in-sync allocation ids. This is not a blocker for this PR.

https://gist.github.com/dreamer-89/9e6a781e57dbf1008c97034103b6a9a8

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Copy Markdown
Member Author

Failure looks unrelated and not reproducible locally. Refiring

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone" -Dtests.seed=18F9CB5CD66407EA -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ms -Dtests.timezone=Europe/Sofia -Druntime.java=17

org.opensearch.indices.replication.SegmentReplicationTargetServiceTests > testReplicationOnDone FAILED
    org.mockito.exceptions.verification.TooFewActualInvocations: 
    segmentReplicationTargetService.onNewCheckpoint(
        ReplicationCheckpoint{shardId=[index][0], primaryTerm=94, segmentsGen=3, seqNo=-1, version=9},
        <any>
    );
    Wanted 2 times:
    -> at org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone(SegmentReplicationTargetServiceTests.java:236)
    But was 1 time:
    -> at org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone(SegmentReplicationTargetServiceTests.java:230)
        at __randomizedtesting.SeedInfo.seed([18F9CB5CD66407EA:2371B911C98E848]:0)
        at app//org.opensearch.indices.replication.SegmentReplicationTargetServiceTests.testReplicationOnDone(SegmentReplicationTargetServiceTests.java:236)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Aug 9, 2022

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Copy Markdown
Member Author

@Bukhtawar @andrross @mch2 : Addressed the review comments, please have a look.

Copy link
Copy Markdown
Contributor

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dreamer-89 dreamer-89 merged commit d308a29 into opensearch-project:main Aug 18, 2022
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Aug 18, 2022
… with higher replication checkpoint (opensearch-project#4041)

* [Segment Replication] Update PrimaryShardAllocator to prefer replicas having higher replication checkpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use empty replication checkpoint to avoid NPE

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update NodeGatewayStartedShards to optionally wire in/out ReplicationCheckpoint field

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use default replication checkpoint causing EOF errors on empty checkpoint

* Add indexSettings to GatewayAllocator to allow ReplicationCheckpoint comparator only for segrep enabled indices

* Add unit tests for primary term first replica promotion & comparator fix

* Fix NPE on empty IndexMetadata

* Remove settings from AllocationService and directly inject in GatewayAllocator

* Add more unit tests and minor code clean up

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments & integration test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix comparator on null ReplicationCheckpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>
dreamer-89 added a commit that referenced this pull request Aug 18, 2022
… with higher replication checkpoint (#4041) (#4252)

* [Segment Replication] Update PrimaryShardAllocator to prefer replicas having higher replication checkpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use empty replication checkpoint to avoid NPE

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update NodeGatewayStartedShards to optionally wire in/out ReplicationCheckpoint field

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Use default replication checkpoint causing EOF errors on empty checkpoint

* Add indexSettings to GatewayAllocator to allow ReplicationCheckpoint comparator only for segrep enabled indices

* Add unit tests for primary term first replica promotion & comparator fix

* Fix NPE on empty IndexMetadata

* Remove settings from AllocationService and directly inject in GatewayAllocator

* Add more unit tests and minor code clean up

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments & integration test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix comparator on null ReplicationCheckpoint

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants