Fix SnapshotStatus Transport Action Doing IO on Transport Thread by original-brownbear · Pull Request #68023 · elastic/elasticsearch

original-brownbear · 2021-01-26T19:40:32Z

There is a small chance here that #67947 would cause the callback
for the repository data to run on a transport or CS updater thread
and do a lot of IO to fetch SnapshotInfo.

Fixed by always forking to the generic pool for the callback.
Added test that triggers lots of deserializing repository data from
cache on the transport thread concurrently which triggers this bug
relatively reliable (more than half the runs) but is still reasonably
fast (under 5s).

There is a small chance here that #67947 would cause the callback for the repository data to run on a transport or CS updater thread and do a lot of IO to fetch `SnapshotInfo`. Fixed by always forking to the generic pool for the callback. Added test that triggers lots of deserializing repository data from cache on the transport thread concurrently which triggers this bug relatively reliable (more than half the runs) but is still reasonably fast (under 5s).

elasticmachine · 2021-01-26T19:40:35Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2021-01-26T19:41:23Z

Sorry for the oversight in #67947 @fcofdez (I thought I had all spots covered but I missed the RepositoriesService indirection :) Now we should be good.

fcofdez

LGTM, good catch!

original-brownbear · 2021-01-28T05:58:50Z

Thanks Francisco!

) (#68092) There is a small chance here that #67947 would cause the callback for the repository data to run on a transport or CS updater thread and do a lot of IO to fetch `SnapshotInfo`. Fixed by always forking to the generic pool for the callback. Added test that triggers lots of deserializing repository data from cache on the transport thread concurrently which triggers this bug relatively reliable (more than half the runs) but is still reasonably fast (under 5s).

Same as #68023 but even less likely (couldn't really find a quick way to write a test for it for that reason). Fix is the same, fork off to the generic pool for listener handling. Also, this allows removing the forking in the transport action since we don't do any long runnning work on the calling thread any longer in the restore method.

…#73196) The callback to loading the repository-data may not run on generic in the uncached case because of the repo data deduplication logic. The same issue was fixed for the snapshot status API in #68023

…#73196) (#74695) The callback to loading the repository-data may not run on generic in the uncached case because of the repo data deduplication logic. The same issue was fixed for the snapshot status API in #68023

original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.12.0 labels Jan 26, 2021

elasticmachine added the Team:Distributed Meta label for distributed team. label Jan 26, 2021

original-brownbear requested a review from fcofdez January 26, 2021 19:40

fcofdez approved these changes Jan 27, 2021

View reviewed changes

original-brownbear merged commit f5c64af into elastic:master Jan 28, 2021

original-brownbear deleted the improve-snapshot-status-api branch January 28, 2021 05:58

original-brownbear mentioned this pull request Jan 28, 2021

Fix SnapshotStatus Transport Action Doing IO on Transport Thread (#68023) #68092

Merged

original-brownbear mentioned this pull request Feb 2, 2021

Fix Threading in Snapshot Restore #68390

Merged

original-brownbear mentioned this pull request Feb 3, 2021

Fix Threading in Snapshot Restore (#68390) #68438

Merged

original-brownbear mentioned this pull request May 18, 2021

Fix Edge-Case Threading Bug in TransportMountSearchableSnapshotAction #73196

Merged

original-brownbear mentioned this pull request Jun 29, 2021

Fix Edge-Case Threading Bug in TransportMountSearchableSnapshotAction (#73196) #74695

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

original-brownbear restored the improve-snapshot-status-api branch April 18, 2023 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SnapshotStatus Transport Action Doing IO on Transport Thread#68023

Fix SnapshotStatus Transport Action Doing IO on Transport Thread#68023
original-brownbear merged 1 commit intoelastic:masterfrom
original-brownbear:improve-snapshot-status-api

original-brownbear commented Jan 26, 2021

Uh oh!

elasticmachine commented Jan 26, 2021

Uh oh!

original-brownbear commented Jan 26, 2021

Uh oh!

fcofdez left a comment

Uh oh!

original-brownbear commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

original-brownbear commented Jan 26, 2021

Uh oh!

elasticmachine commented Jan 26, 2021

Uh oh!

original-brownbear commented Jan 26, 2021

Uh oh!

fcofdez left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants