Skip to content

Trigger explicit loading of blob container and snapshot in the pre-recovery hook#54729

Merged
tlrx merged 3 commits intoelastic:feature/searchable-snapshotsfrom
tlrx:load-snapshot
Apr 6, 2020
Merged

Trigger explicit loading of blob container and snapshot in the pre-recovery hook#54729
tlrx merged 3 commits intoelastic:feature/searchable-snapshotsfrom
tlrx:load-snapshot

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Apr 3, 2020

In #53961 we defer the construction of Searchable Snapshot Directory's BlobContainer and a
BlobStoreIndexShardSnapshot instances so that these objects are created when they are first accessed, which we expect to be during the recovery process. At the same time, assertions were added to ensure that the construction is effectively executed under a generic or snapshot thread.

Sadly, this is not always the case because there is always a short window of time between the creation of the IndexShard and the time the objects are created during the recovery process. It is also possible that other components of Elasticsearch trigger the creation of the blob container and snapshot.

We identified the following places:

  • computing avg shard size of index shards in IndexService.createShard() (this is triggered when relocating a primary shard under the cluster state applier thread)
  • computing indices stats in TransportIndicesStatsAction which calls indexShard.storeStats() which calls estimateSizeInBytes() (this is triggered by InternalClusterInfoService under the management thread pool)
  • computing shard store metadata in IndexShard.snapshotStoreMetadata while calls failIfCorrupted(Directory) (this is triggered by TransportNodesListShardStoreMetadata, executed under the fetch_shard_store thread pool)
  • TransportNodesListGatewayStartedShards should also use failIfCorrupted(Directory) at some point (triggered by the GatewayAllocator and executed under the fetch_shard_started thread pool)

This commit changes the way BlobContainer and a BlobStoreIndexShardSnapshot instances are created so that it does not happen on first access anymore but the objects are now created using a specific loadSnapshot() method.

This method is explicitly called during the pre-recovery phase.

Until this method is called, the SearchableSnapshotDirectory acts as if it was empty: the listAll() method returns an empty array. Having this behavior allows the identified access points to not fail and not trigger the snapshot loading before we explicitly load it in the pre-recovery hook.

@tlrx tlrx added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Apr 3, 2020
@tlrx tlrx requested a review from DaveCTurner April 3, 2020 16:13
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Apr 3, 2020

ML related failures + Docker build failure

@elasticmachine run elasticsearch-ci/2
@elasticmachine run elasticsearch-ci/packaging-sample-unix-docker

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Apr 6, 2020

Running tests again after an unrelated failure
@elasticmachine run elasticsearch-ci/1

@tlrx tlrx changed the title Load snapshot in pre-recovery Trigger explicit loading of blob container and snapshot in the pre-recovery hook Apr 6, 2020
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a few small suggestions.


private synchronized boolean invariant() {
assert loaded ^ snapshot == null;
assert loaded ^ blobContainer == null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically ok, but I think a logical XOR is clearer than a bitwise XOR:

Suggested change
assert loaded ^ blobContainer == null;
assert loaded != (blobContainer == null);

final BlobContainer blobContainer = this.blobContainer.get();
assert blobContainer != null;
final BlobContainer blobContainer = this.blobContainer;
assert invariant();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to assert this here since we're not changing any state?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

final BlobStoreIndexShardSnapshot snapshot = this.snapshot.get();
assert snapshot != null;
final BlobStoreIndexShardSnapshot snapshot = this.snapshot;
assert invariant();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to assert this here since we're not changing any state?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

}

private List<BlobStoreIndexShardSnapshot.FileInfo> files() {
final BlobStoreIndexShardSnapshot snapshot = snapshot();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates my earlier comment - let's return an empty list if loaded == false, then we can be sure that snapshot() will return non-null.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I must admit I find this code confusing today :/

@tlrx tlrx requested a review from DaveCTurner April 6, 2020 09:41
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit 104ba40 into elastic:feature/searchable-snapshots Apr 6, 2020
@tlrx tlrx deleted the load-snapshot branch April 6, 2020 10:25
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Apr 6, 2020

Thanks David!

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 7, 2020
Today the shards of searchable snapshots are allocated with a naive
`ExistingShardsAllocator` which selects the first valid node for each shard.
Thanks to elastic#54729 we can now allow these shards to fall through to the balanced
shards allocator so that they are allocated in a more balanced fashion.

Relates elastic#50999
DaveCTurner added a commit that referenced this pull request Apr 8, 2020
Today the shards of searchable snapshots are allocated with a naive
`ExistingShardsAllocator` which selects the first valid node for each shard.
Thanks to #54729 we can now allow these shards to fall through to the balanced
shards allocator so that they are allocated in a more balanced fashion.

Relates #50999
DaveCTurner added a commit that referenced this pull request Apr 8, 2020
Today the shards of searchable snapshots are allocated with a naive
`ExistingShardsAllocator` which selects the first valid node for each shard.
Thanks to #54729 we can now allow these shards to fall through to the balanced
shards allocator so that they are allocated in a more balanced fashion.

Relates #50999
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants