Skip to content

Segment Replication - Add additional unit tests for update & delete ops.#4237

Merged
dreamer-89 merged 2 commits intoopensearch-project:mainfrom
mch2:units
Aug 17, 2022
Merged

Segment Replication - Add additional unit tests for update & delete ops.#4237
dreamer-89 merged 2 commits intoopensearch-project:mainfrom
mch2:units

Conversation

@mch2
Copy link
Copy Markdown
Member

@mch2 mch2 commented Aug 16, 2022

Signed-off-by: Marc Handalian handalm@amazon.com

Description

This adds index shard level tests for update & delete ops while using segrep.

Issues Resolved

related #4216

Check List

  • [ x] New functionality includes testing.
    • [ x] All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • [x ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@mch2 mch2 requested review from a team and reta as code owners August 16, 2022 21:46
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can remove _doc mapping as mapping types are deprecated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can count number of docs deleted and assert docsAfterDelete.size() == numDocs-deletedDocs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After update & Delete i'm asserting the equality of each doc id not just counts.

@dreamer-89
Copy link
Copy Markdown
Member

dreamer-89 commented Aug 16, 2022

@mch2 : You will need to rebase from main

1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':distribution:bwc:staged:buildBwcLinuxTar'.
> Building 2.2.0 didn't generate expected file /var/jenkins/workspace/gradle-check/search/distribution/bwc/staged/build/bwc/checkout-2.2/distribution/archives/linux-tar/build/distributions/opensearch-min-2.2.0-SNAPSHOT-linux-x64.tar.gz

mch2 added 2 commits August 16, 2022 16:18
…perations.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@mch2
Copy link
Copy Markdown
Member Author

mch2 commented Aug 17, 2022

org.opensearch.index.fielddata.SortedSetDVStringFieldDataTests > testSortMissingLast FAILED
    java.lang.AssertionError: expected:<1796> but was:<1766>
        at __randomizedtesting.SeedInfo.seed([96E3585621F3ED7A:7E9CD7DF9752B87B]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at 

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 17, 2022

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.67%. Comparing base (5f2e66b) to head (505765a).
⚠️ Report is 4712 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4237      +/-   ##
============================================
- Coverage     70.78%   70.67%   -0.11%     
+ Complexity    57218    57121      -97     
============================================
  Files          4605     4606       +1     
  Lines        274695   274707      +12     
  Branches      40228    40228              
============================================
- Hits         194441   194154     -287     
- Misses        63955    64289     +334     
+ Partials      16299    16264      -35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dreamer-89 dreamer-89 merged commit 3a97d4c into opensearch-project:main Aug 17, 2022
Rishikesh1159 pushed a commit to Rishikesh1159/OpenSearch that referenced this pull request Aug 17, 2022
…ps. (opensearch-project#4237)

* Segment Replication - Add additional unit tests for update & delete operations.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix spotless.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Marc Handalian <handalm@amazon.com>
Rishikesh1159 added a commit that referenced this pull request Aug 17, 2022
…aining segment replication changes (#4243)

* [Segment Replication] Checkpoint Replay on Replica Shard (#3658)

* Adding Latest Recevied checkpoint, replay checkpoint logic along with tests

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* [Segment Replication] Wire up segment replication with peer recovery and add ITs. (#3743)

* Add null check when computing max segment version.

With segment replication enabled it is possible Lucene does not set the SegmentInfos
min segment version, leaving the default value as null.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Update peer recovery to set the translogUUID of replicas to the UUID generated on the primary.

This change updates the UUID when the translog is created to the value stored in the passed segment userdata.
This is to ensure during failover scenarios that the replica can be promoted and not have a uuid mismatch with the value stored in user data.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Wire up Segment Replication under the feature flag.

This PR wires up segment replication and adds some initial integration tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test to ensure replicas use primary translog uuid with segrep.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Update SegmentReplicationIT to assert previous commit points are valid and SegmentInfos can be built.
Fix nitpicks in PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix test with Assert.fail to include a message.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Disable shard idle with segment replication. (#4118)

This change disables shard idle when segment replication is enabled.
Primary shards will only push out new segments on refresh, we do not want to block this based on search behavior.
Replicas will only refresh on an externally provided SegmentInfos, so we do not want search requests to hang waiting for a refresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix isAheadOf logic for ReplicationCheckpoint comparison (#4112)

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix waitUntil refresh policy for segrep enabled indices. (#4137)

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add IndexShard#getLatestReplicationCheckpoint behind segrep enable feature flag (#4163)

* Add IndexShard#getLatestReplicationCheckpoint behind segrep enable feature flag

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comment. Move tests to SegmentReplicationIndexShardTests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add segrep enbaled index settings in TargetServiceTests, SourceHandlerTests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* [Segment Replication] Fix OngoingSegmentReplications to key by allocation ID instead of DiscoveryNode. (#4182)

* Fix OngoingSegmentReplications to key by allocation ID instead of DiscoveryNode.

This change fixes segrep to work with multiple shards per node by keying ongoing replications on
allocation ID.  This also updates cancel methods to ensure state is properly cleared on shard cancel.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Clean up cancel methods.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Marc Handalian <handalm@amazon.com>

* [Bug] [Segment Replication] Update store metadata recovery diff logic to ignore missing files causing exception (#4185)

* Update Store for segment replication dif

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

Signed-off-by: Poojita Raj <poojiraj@amazon.com>

* Update recoveryDiff logic to ingore missing files causing exception on replica during copy

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Co-authored-by: Poojita Raj <poojiraj@amazon.com>

* [Segment Replication] Adding PrimaryMode check before publishing checkpoint and processing a received checkpoint. (#4157)

* Adding PrimaryMode check before publishing checkpoint.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Applying spotless check

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Moving segrep specific tests to SegmentReplicationIndexShardTests.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Adding logic and tests for rejecting checkpoints if shard is in PrimaryMode.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Applying ./gradlew :server:spotlessApply.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Applying ./gradlew :server:spotlessApply

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Changing log level to warn in shouldProcessCheckpoint() of IndexShard.java class.

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* Removing unnecessary lazy logging in shouldProcessCheckpoint().

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>

* [Segment Replication] Wait for documents to replicate to replica shards (#4236)

* [Segment Replication] Add thread sleep to account for replica lag in delete operations test

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Address review comments, assertBusy on doc count rather than sleep

Signed-off-by: Suraj Singh <surajrider@gmail.com>

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Segment Replication - Remove unnecessary call to markAllocationIdAsInSync. (#4224)

This PR Removes an unnecessary call to markAllocationIdAsInSync on the primary shard when replication events complete.
Recovery will manage this initial call.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Segment Replication - Add additional unit tests for update & delete ops. (#4237)

* Segment Replication - Add additional unit tests for update & delete operations.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix spotless.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Rishikesh1159 <rishireddy1159@gmail.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Co-authored-by: Marc Handalian <handalm@amazon.com>
Co-authored-by: Suraj Singh <surajrider@gmail.com>
Co-authored-by: Poojita Raj <poojiraj@amazon.com>
@mch2 mch2 deleted the units branch January 11, 2023 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants