Trigger explicit loading of blob container and snapshot in the pre-recovery hook#54729
Conversation
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
|
ML related failures + Docker build failure @elasticmachine run elasticsearch-ci/2 |
|
Running tests again after an unrelated failure |
DaveCTurner
left a comment
There was a problem hiding this comment.
Looks good, I left a few small suggestions.
...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java
Outdated
Show resolved
Hide resolved
|
|
||
| private synchronized boolean invariant() { | ||
| assert loaded ^ snapshot == null; | ||
| assert loaded ^ blobContainer == null; |
There was a problem hiding this comment.
Technically ok, but I think a logical XOR is clearer than a bitwise XOR:
| assert loaded ^ blobContainer == null; | |
| assert loaded != (blobContainer == null); |
...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java
Show resolved
Hide resolved
...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java
Show resolved
Hide resolved
| final BlobContainer blobContainer = this.blobContainer.get(); | ||
| assert blobContainer != null; | ||
| final BlobContainer blobContainer = this.blobContainer; | ||
| assert invariant(); |
There was a problem hiding this comment.
I don't think we need to assert this here since we're not changing any state?
| final BlobStoreIndexShardSnapshot snapshot = this.snapshot.get(); | ||
| assert snapshot != null; | ||
| final BlobStoreIndexShardSnapshot snapshot = this.snapshot; | ||
| assert invariant(); |
There was a problem hiding this comment.
I don't think we need to assert this here since we're not changing any state?
| } | ||
|
|
||
| private List<BlobStoreIndexShardSnapshot.FileInfo> files() { | ||
| final BlobStoreIndexShardSnapshot snapshot = snapshot(); |
There was a problem hiding this comment.
Relates my earlier comment - let's return an empty list if loaded == false, then we can be sure that snapshot() will return non-null.
There was a problem hiding this comment.
Sure. I must admit I find this code confusing today :/
...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java
Show resolved
Hide resolved
|
Thanks David! |
Today the shards of searchable snapshots are allocated with a naive `ExistingShardsAllocator` which selects the first valid node for each shard. Thanks to elastic#54729 we can now allow these shards to fall through to the balanced shards allocator so that they are allocated in a more balanced fashion. Relates elastic#50999
Today the shards of searchable snapshots are allocated with a naive `ExistingShardsAllocator` which selects the first valid node for each shard. Thanks to #54729 we can now allow these shards to fall through to the balanced shards allocator so that they are allocated in a more balanced fashion. Relates #50999
Today the shards of searchable snapshots are allocated with a naive `ExistingShardsAllocator` which selects the first valid node for each shard. Thanks to #54729 we can now allow these shards to fall through to the balanced shards allocator so that they are allocated in a more balanced fashion. Relates #50999
In #53961 we defer the construction of Searchable Snapshot Directory's
BlobContainerand aBlobStoreIndexShardSnapshotinstances so that these objects are created when they are first accessed, which we expect to be during the recovery process. At the same time, assertions were added to ensure that the construction is effectively executed under a generic or snapshot thread.Sadly, this is not always the case because there is always a short window of time between the creation of the
IndexShardand the time the objects are created during the recovery process. It is also possible that other components of Elasticsearch trigger the creation of the blob container and snapshot.We identified the following places:
This commit changes the way
BlobContainerand aBlobStoreIndexShardSnapshotinstances are created so that it does not happen on first access anymore but the objects are now created using a specificloadSnapshot()method.This method is explicitly called during the
pre-recoveryphase.Until this method is called, the
SearchableSnapshotDirectoryacts as if it was empty: the listAll() method returns an empty array. Having this behavior allows the identified access points to not fail and not trigger the snapshot loading before we explicitly load it in the pre-recovery hook.