Skip to content

Simplify Shard Snapshot Upload Code#48155

Merged
original-brownbear merged 8 commits intoelastic:masterfrom
original-brownbear:better-concurrency-snapshot-upload
Oct 22, 2019
Merged

Simplify Shard Snapshot Upload Code#48155
original-brownbear merged 8 commits intoelastic:masterfrom
original-brownbear:better-concurrency-snapshot-upload

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.

The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.5.0 labels Oct 16, 2019
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Copy Markdown
Contributor Author

@tlrx as complained about here #47560 (comment) :)

@original-brownbear original-brownbear removed the request for review from tlrx October 16, 2019 18:21
@Override
public Info info(String name) {
throw new UnsupportedOperationException();
return infos.computeIfAbsent(name, n -> new Info(n, ThreadPoolType.FIXED, random.nextInt(10) + 1));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation for the map complication here: by using a random size for this pool here we still get the same kind of upload-ordering coverage we had before in the SnapshotResiliencyTests

} finally {
store.decRef();
}
} else if (snapshotStatus.isAborted()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check this before incrementing the store refcount? And avoid to have to open the file and read a first byte to detect the snapshot got aborted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it doesn't matter really. With the changed execution from this it's super unlikely that another upload worker is started and actually polls a task from the queue for an aborted snapshot (since one of the workers will run into the abort anyway and drain the files queue). I don't think it's worth the additional code and complication.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Armin!

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks Tanguy!

@original-brownbear original-brownbear merged commit 5b3ebea into elastic:master Oct 22, 2019
@original-brownbear original-brownbear deleted the better-concurrency-snapshot-upload branch October 22, 2019 11:15
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 22, 2019
* elastic/master:
  [Docs] Fix opType options in IndexRequest API example. (elastic#48290)
  Simplify Shard Snapshot Upload Code (elastic#48155)
  Mute ClassificationIT tests (elastic#48338)
  Reenable azure repository tests and remove some randomization in http servers  (elastic#48283)
  Use an env var for the classpath of jar hell task (elastic#48240)
  Refactor FIPS BootstrapChecks to simple checks (elastic#47499)
  Add "format" to "range" queries resulted from optimizing a logical AND (elastic#48073)
  [DOCS][Transform] document limitation regarding rolling upgrade with 7.2, 7.3 (elastic#48118)
  Fail with a better error when if there are no ingest nodes (elastic#48272)
  Fix executing enrich policies stats (elastic#48132)
  Use MultiFileTransfer in CCR remote recovery (elastic#44514)
  Make BytesReference an interface (elastic#48171)
  Also validate source index at put enrich policy time. (elastic#48254)
  Add 'javadoc' task to lifecycle check tasks (elastic#48214)
  Remove option to enable direct buffer pooling (elastic#47956)
  [DOCS] Add 'Selecting gateway and seed nodes' section to CCS docs (elastic#48297)
  Add Enrich Origin (elastic#48098)
  fix incorrect comparison (elastic#48208)
original-brownbear added a commit that referenced this pull request Oct 22, 2019
The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.
@original-brownbear original-brownbear restored the better-concurrency-snapshot-upload branch January 6, 2021 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v7.6.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants