Skip to content

Batch translog sync/upload per x ms for remote-backed indexes#5854

Merged
gbbafna merged 17 commits intoopensearch-project:mainfrom
ashking94:5692
Jan 29, 2023
Merged

Batch translog sync/upload per x ms for remote-backed indexes#5854
gbbafna merged 17 commits intoopensearch-project:mainfrom
ashking94:5692

Conversation

@ashking94
Copy link
Copy Markdown
Member

@ashking94 ashking94 commented Jan 13, 2023

Description

Translog sync takes care of local fsync and translog upload onto remote store. Currently, there is implicit buffering that happens as the remote store upload is a time consuming operation. However, every upload adds extra cost of network interaction along with the actual file upload. If we can buffer for a pareto-optimal duration, then we can save on the additional network interaction costs and overall achieve lower latencies and higher indexing throughput in comparison to non-buffered approach. There is also a delay optimisation in place that make sure that if the upload took considerable time, then schedule the next run with a decreased interval maintaining the overall buffer interval in check.

Credits for some of the code - Ashwin Pankaj, Laxman Muttineni

Issues Resolved

This solves #5692

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@ashking94 ashking94 self-assigned this Jan 13, 2023
@ashking94 ashking94 added Storage:Durability Issues and PRs related to the durability framework Performance This is for any performance related enhancements or bugs v2.6.0 'Issues and PRs related to version v2.6.0' distributed framework skip-changelog labels Jan 13, 2023
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testCancellation

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWhileContinuouslyIndexingAndWaitingForRefresh
      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 16, 2023

Codecov Report

❌ Patch coverage is 75.60976% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.72%. Comparing base (715ff72) to head (105cbc5).
⚠️ Report is 4290 commits behind head on main.

Files with missing lines Patch % Lines
...mmon/util/concurrent/BufferedAsyncIOProcessor.java 68.96% 9 Missing ⚠️
...earch/common/util/concurrent/AsyncIOProcessor.java 66.66% 4 Missing ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 78.94% 4 Missing ⚠️
...org/opensearch/cluster/metadata/IndexMetadata.java 76.92% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5854      +/-   ##
============================================
- Coverage     70.75%   70.72%   -0.04%     
+ Complexity    58720    58704      -16     
============================================
  Files          4771     4772       +1     
  Lines        280818   280887      +69     
  Branches      40568    40572       +4     
============================================
- Hits         198704   198663      -41     
- Misses        65824    65860      +36     
- Partials      16290    16364      +74     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@ashking94 ashking94 changed the title Batch translog upload per x ms to allow high index throughput Batch translog upload per x ms for remote-backed indexes to allow high index throughput Jan 18, 2023
@ashking94 ashking94 changed the title Batch translog upload per x ms for remote-backed indexes to allow high index throughput Batch translog upload per x ms for remote-backed indexes Jan 18, 2023
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testDeleteOperations

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testWriteRead
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotAndRestore
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testReadNonExistingPath
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testMultipleSnapshotAndRollback
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testList
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testIndicesDeletedFromRepository
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testContainerCreationAndDeletion
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testDeleteOperations
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermark

ashking94 and others added 8 commits January 25, 2023 13:54
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Co-authored-by: Ashwin Pankaj <appankaj@amazon.com>
Co-authored-by: Laxman Muttineni <muttil@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWithAReadOnlyBlock
      1 org.opensearch.action.admin.cluster.tasks.PendingTasksBlocksIT.testPendingTasksWithClusterNotRecoveredBlock

Copy link
Copy Markdown
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, otherwise looks good.

Signed-off-by: Ashish Singh <ssashish@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@gbbafna gbbafna merged commit af566e1 into opensearch-project:main Jan 29, 2023
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Jan 29, 2023
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5854-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 af566e156fefdba192343ba8f7fce84f17d2a07a
# Push it to GitHub
git push --set-upstream origin backport/backport-5854-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5854-to-2.x.

@ashking94 ashking94 added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch labels Jan 30, 2023
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5854-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 af566e156fefdba192343ba8f7fce84f17d2a07a
# Push it to GitHub
git push --set-upstream origin backport/backport-5854-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5854-to-2.x.

ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Jan 30, 2023
…arch-project#5854)

* Batch translog upload per x ms to allow high index throughput

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Co-authored-by: Ashwin Pankaj <appankaj@amazon.com>
Co-authored-by: Laxman Muttineni <muttil@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Jan 30, 2023
…arch-project#5854)

* Batch translog upload per x ms to allow high index throughput

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Co-authored-by: Ashwin Pankaj <appankaj@amazon.com>
Co-authored-by: Laxman Muttineni <muttil@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
gbbafna pushed a commit that referenced this pull request Jan 30, 2023
…indexes (#5854) (#6066)

* Batch translog sync/upload per x ms for remote-backed indexes (#5854)

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Co-authored-by: Ashwin Pankaj <appankaj@amazon.com>
Co-authored-by: Laxman Muttineni <muttil@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x Backport to 2.x branch distributed framework Performance This is for any performance related enhancements or bugs skip-changelog Storage:Durability Issues and PRs related to the durability framework v2.6.0 'Issues and PRs related to version v2.6.0'

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants