Snapshot of a searchable snapshot should be empty#66162
Snapshot of a searchable snapshot should be empty#66162DaveCTurner merged 12 commits intoelastic:masterfrom
Conversation
Today if you take a snapshot of a searchable snapshot index then we treat it like a normal index and copy (any unchanged parts of) its contents the the repository. This is often a complete copy, doubling the snapshot storage required, since a searchable snapshot index typically has a different name from the original index; it may also be that we are taking a snapshot into a different repository. The content of a searchable snapshot is already held in a snapshot, and its index metadata indicates how to find this content, so it is wasteful to copy anything new into the repository. This commit adjusts things so that a searchable snapshot shard presents itself to the snapshotter as if it contained no segments, and adjusts things to account for the consequent mismatch at restore time. Closes elastic#66110
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
I can't decide whether I think this is an elegant solution or an awful hack :) Probably best to assume the latter when reviewing it, and think about ways this might come back to haunt us in the future. |
| final Directory directory = engineConfig.getStore().directory(); | ||
| final String oldestSegmentsFile = Arrays.stream(directory.listAll()) | ||
| .filter(s -> s.startsWith(IndexFileNames.SEGMENTS + "_")) | ||
| .min(Comparator.naturalOrder()) |
There was a problem hiding this comment.
Oops, this will almost always work except for cases where we add another character to the encoded generation (think "1" < "10" < "2"). I'll address this.
There was a problem hiding this comment.
I think if we want to go for a hack this is the shortest possible one pretty much :)
I'm +1 to this approach, it's somewhat similar to how we hack around things in the recovery and I like how it's pretty risk free by not changing anything of substance about the snapshot process.
Just one open question + tests seem to fail
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SnapshotFiles.java
Outdated
Show resolved
Hide resolved
...hable-snapshots/src/main/java/org/elasticsearch/index/store/InMemoryNoOpCommitDirectory.java
Show resolved
Hide resolved
...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java
Outdated
Show resolved
Hide resolved
original-brownbear
left a comment
There was a problem hiding this comment.
LGTM thanks David!
henningandersen
left a comment
There was a problem hiding this comment.
Did an initial read and have a couple of suggestions.
server/src/main/java/org/elasticsearch/repositories/blobstore/FileRestoreContext.java
Show resolved
Hide resolved
| @Override | ||
| public IndexCommitRef acquireIndexCommitForSnapshot() throws EngineException { | ||
| store.incRef(); | ||
| return new IndexCommitRef(emptyIndexCommit, store::decRef); |
There was a problem hiding this comment.
If we can skip the restore completely (see previous comment), could we then instead return null here and also skip the snapshotting completely?
There was a problem hiding this comment.
Not trivially, no, we want the snapshot of this shard to succeed (without having done much) so we can't just bail out, and we use a null commit to indicate "find the latest commit" further down the line.
|
I'm not merging this quite yet because it doesn't account for frozen indices: in the frozen case, we don't use the |
|
darn it, also doesn't account for closed indices, which we hard-code to have a |
needs more work for the frozen/closed case
|
I have adjusted how this PR injects the searchable-snapshots-specific snapshotting behaviour to avoid having it depend on the engine implementation directly, since we cannot control this in the frozen/closed case. I've elected to re-use the |
|
@elasticmachine please run elasticsearch-ci/2
|
Today if you take a snapshot of a searchable snapshot index then we treat it like a normal index and copy (any unchanged parts of) its contents the the repository. This is often a complete copy, doubling the snapshot storage required, since a searchable snapshot index typically has a different name from the original index; it may also be that we are taking a snapshot into a different repository. The content of a searchable snapshot is already held in a snapshot, and its index metadata indicates how to find this content, so it is wasteful to copy anything new into the repository. This commit adjusts things so that a searchable snapshot shard presents itself to the snapshotter as if it contained no segments, and adjusts things to account for the consequent mismatch at restore time. Closes #66110
* elastic/master: (33 commits) Add searchable snapshot cache folder to NodeEnvironment (elastic#66297) [DOCS] Add dynamic runtime fields to docs (elastic#66194) Add HDFS searchable snapshot integration (elastic#66185) Support canceling cross-clusters search requests (elastic#66206) Mute testCacheSurviveRestart (elastic#66289) Fix cat tasks api params in spec and handler (elastic#66272) Snapshot of a searchable snapshot should be empty (elastic#66162) [ML] DFA _explain API should not fail when none field is included (elastic#66281) Add action to decommission legacy monitoring cluster alerts (elastic#64373) move rollup_index param out of RollupActionConfig (elastic#66139) Improve FieldFetcher retrieval of fields (elastic#66160) Remove unsed fields in `RestAnalyzeAction` (elastic#66215) Simplify searchable snapshot CacheKey (elastic#66263) Autoscaling remove feature flags (elastic#65973) Improve searchable snapshot mount time (elastic#66198) [ML] Report cause when datafeed extraction encounters error (elastic#66167) Remove suggest reference in some API specs (elastic#66180) Fix warning when installing a plugin for different ESversion (elastic#66146) [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132) [DOCS] Add `require_alias` to Bulk API (elastic#66259) ...
Today if you take a snapshot of a searchable snapshot index then we
treat it like a normal index and copy (any unchanged parts of) its
contents the the repository. This is often a complete copy, doubling the
snapshot storage required, since a searchable snapshot index typically
has a different name from the original index; it may also be that we are
taking a snapshot into a different repository. The content of a
searchable snapshot is already held in a snapshot, and its index
metadata indicates how to find this content, so it is wasteful to copy
anything new into the repository.
This commit adjusts things so that a searchable snapshot shard presents
itself to the snapshotter as if it contained no segments, and adjusts
things to account for the consequent mismatch at restore time.
Closes #66110