Skip to content

Fix SnapshotStatus Transport Action Doing IO on Transport Thread#68023

Merged
original-brownbear merged 1 commit intoelastic:masterfrom
original-brownbear:improve-snapshot-status-api
Jan 28, 2021
Merged

Fix SnapshotStatus Transport Action Doing IO on Transport Thread#68023
original-brownbear merged 1 commit intoelastic:masterfrom
original-brownbear:improve-snapshot-status-api

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

There is a small chance here that #67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch SnapshotInfo.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).

There is a small chance here that #67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch `SnapshotInfo`.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.12.0 labels Jan 26, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Jan 26, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Sorry for the oversight in #67947 @fcofdez (I thought I had all spots covered but I missed the RepositoriesService indirection :) Now we should be good.

Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch!

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks Francisco!

@original-brownbear original-brownbear merged commit f5c64af into elastic:master Jan 28, 2021
@original-brownbear original-brownbear deleted the improve-snapshot-status-api branch January 28, 2021 05:58
original-brownbear added a commit that referenced this pull request Jan 28, 2021
) (#68092)

There is a small chance here that #67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch `SnapshotInfo`.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).
original-brownbear added a commit that referenced this pull request Feb 3, 2021
Same as #68023 but even less likely (couldn't really find a quick way
to write a test for it for that reason).
Fix is the same, fork off to the generic pool for listener handling.
Also, this allows removing the forking in the transport action since we don't do any long
runnning work on the calling thread any longer in the restore method.
original-brownbear added a commit that referenced this pull request Feb 3, 2021
Same as #68023 but even less likely (couldn't really find a quick way
to write a test for it for that reason).
Fix is the same, fork off to the generic pool for listener handling.
Also, this allows removing the forking in the transport action since we don't do any long
runnning work on the calling thread any longer in the restore method.
original-brownbear added a commit that referenced this pull request May 18, 2021
…#73196)

The callback to loading the repository-data may not run on generic in the uncached case
because of the repo data deduplication logic.
The same issue was fixed for the snapshot status API in #68023
original-brownbear added a commit that referenced this pull request Jun 29, 2021
…#73196) (#74695)

The callback to loading the repository-data may not run on generic in the uncached case
because of the repo data deduplication logic.
The same issue was fixed for the snapshot status API in #68023
@original-brownbear original-brownbear restored the improve-snapshot-status-api branch April 18, 2023 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed Meta label for distributed team. v7.12.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants