Skip to content

Make sure listener is resolved when file queue is cleared#89929

Merged
pxsalehi merged 7 commits intoelastic:mainfrom
pxsalehi:ps-220908-handle-cleared-queue
Sep 9, 2022
Merged

Make sure listener is resolved when file queue is cleared#89929
pxsalehi merged 7 commits intoelastic:mainfrom
pxsalehi:ps-220908-handle-cleared-queue

Conversation

@pxsalehi
Copy link
Copy Markdown
Member

@pxsalehi pxsalehi commented Sep 8, 2022

Handle a bug introduced in #88209 while refactoring how file upload tasks run in a shard snapshot. The corner case where the queue of files to snapshot gets cleared when a file snapshot runs into an exception is not addressed in that PR.

Closes #89927
Closes #89956

@pxsalehi pxsalehi added the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Sep 8, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Sep 8, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @pxsalehi, I've created a changelog YAML for you.

@pxsalehi
Copy link
Copy Markdown
Member Author

pxsalehi commented Sep 8, 2022

I would like to also somehow refactor that part a bit, and add a test for this corner case. Whether here or in another PR. Since there is a failing test, we might want to merge this. I have added a unit test for this which fails consistently w/o the fix (https://github.com/pxsalehi/elasticsearch/tree/ps-220908-reprod-cleared-queue has the test but not the fix).

@pxsalehi pxsalehi added >test Issues or PRs that are addressing/adding tests >bug and removed >bug labels Sep 8, 2022
@pxsalehi pxsalehi force-pushed the ps-220908-handle-cleared-queue branch from e8ab227 to d9c3ca6 Compare September 8, 2022 15:03
Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this so quickly @pxsalehi ! The fix looks fine, just one complaint about a test detail :)

files.add(ShardSnapshotTaskRunnerTests.dummyFileInfo());
}
repository.snapshotFiles(context, files, allFilesUploadListener);
assertBusy(() -> assertTrue(listenerCalled.get()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of dislike this pattern in tests, it causes test to be needlessly slow in aggregate because of the sleeps in the busy polling from assertTrue. Maybe just use a PlainActionFuture for the listener above and then wait for its completion here?

@pxsalehi pxsalehi merged commit 7164e35 into elastic:main Sep 9, 2022
@pxsalehi
Copy link
Copy Markdown
Member Author

pxsalehi commented Sep 9, 2022

Thanks, Armin!

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Sep 9, 2022
* main: (176 commits)
  Fix RandomSamplerAggregatorTests testAggregationSamplingNestedAggsScaled test failure (elastic#89958)
  [Downsampling] Replace document map with SMILE encoded doc (elastic#89495)
  Remove full cluster state from error logging in MasterService (elastic#89960)
  [ML] Truncate categorization fields (elastic#89827)
  [TSDB] Removed `summary` and `histogram` metric types (elastic#89937)
  Update testNodeSelectorRouting so that it does not depend on iteration order (elastic#89879)
  Make sure listener is resolved when file queue is cleared (elastic#89929)
  [Stable plugin api] Extensible annotation (elastic#89903)
  Fix double sending of response in TransportOpenIdConnectPrepareAuthenticationAction (elastic#89930)
  Make sure ivy repo directory exists before downloading artifacts
  Use 'file://' scheme for local repository URL
  Use DRA artifacts for release build CI jobs
  Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241)
  Script: Write Field API path manipulation (elastic#89889)
  Fetch health info action (elastic#89820)
  Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935)
  [ML] Performance improvements for categorization jobs (elastic#89824)
  [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931)
  Fix deadlock bug exposed by a test (elastic#89934)
  [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team. >test Issues or PRs that are addressing/adding tests v8.5.0

Projects

None yet

3 participants