Simplify Shard Snapshot Upload Code#48155
Simplify Shard Snapshot Upload Code#48155original-brownbear merged 8 commits intoelastic:masterfrom original-brownbear:better-concurrency-snapshot-upload
Conversation
The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
|
@tlrx as complained about here #47560 (comment) :) |
| @Override | ||
| public Info info(String name) { | ||
| throw new UnsupportedOperationException(); | ||
| return infos.computeIfAbsent(name, n -> new Info(n, ThreadPoolType.FIXED, random.nextInt(10) + 1)); |
There was a problem hiding this comment.
Motivation for the map complication here: by using a random size for this pool here we still get the same kind of upload-ordering coverage we had before in the SnapshotResiliencyTests
| } finally { | ||
| store.decRef(); | ||
| } | ||
| } else if (snapshotStatus.isAborted()) { |
There was a problem hiding this comment.
Should we check this before incrementing the store refcount? And avoid to have to open the file and read a first byte to detect the snapshot got aborted?
There was a problem hiding this comment.
I would say it doesn't matter really. With the changed execution from this it's super unlikely that another upload worker is started and actually polls a task from the queue for an aborted snapshot (since one of the workers will run into the abort anyway and drain the files queue). I don't think it's worth the additional code and complication.
|
Thanks Tanguy! |
* elastic/master: [Docs] Fix opType options in IndexRequest API example. (elastic#48290) Simplify Shard Snapshot Upload Code (elastic#48155) Mute ClassificationIT tests (elastic#48338) Reenable azure repository tests and remove some randomization in http servers (elastic#48283) Use an env var for the classpath of jar hell task (elastic#48240) Refactor FIPS BootstrapChecks to simple checks (elastic#47499) Add "format" to "range" queries resulted from optimizing a logical AND (elastic#48073) [DOCS][Transform] document limitation regarding rolling upgrade with 7.2, 7.3 (elastic#48118) Fail with a better error when if there are no ingest nodes (elastic#48272) Fix executing enrich policies stats (elastic#48132) Use MultiFileTransfer in CCR remote recovery (elastic#44514) Make BytesReference an interface (elastic#48171) Also validate source index at put enrich policy time. (elastic#48254) Add 'javadoc' task to lifecycle check tasks (elastic#48214) Remove option to enable direct buffer pooling (elastic#47956) [DOCS] Add 'Selecting gateway and seed nodes' section to CCS docs (elastic#48297) Add Enrich Origin (elastic#48098) fix incorrect comparison (elastic#48208)
The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.
The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.
Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.