Skip to content

Skip translog creation and Lucene commits when recovering searchable snapshot shards#118606

Merged
tlrx merged 3 commits intoelastic:mainfrom
tlrx:2024/12/12/skip-translog-creation-for-searchable-snapshots
Dec 16, 2024
Merged

Skip translog creation and Lucene commits when recovering searchable snapshot shards#118606
tlrx merged 3 commits intoelastic:mainfrom
tlrx:2024/12/12/skip-translog-creation-for-searchable-snapshots

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Dec 12, 2024

In order to leverage Lucene N-2 version support for searchable snapshots, we'd like to avoid executing Lucene commits during searchable snapshots shards recovery. This is because Lucene commits require to open an IndexWriter, something that Lucene does not support for N-2 versions.

Today when searchable snapshot shards are recovering they create a translog on disk as well as a Lucene commit:

  • the translog is created as an empty translog with a new UUID and an initial global checkpoint value that is the same as the LOCAL_CHECKPOINT_KEY stored in the last Lucene commit data from the snapshot.
  • a Lucene commit is executed to associate the translog with the Lucene index by storing new translog UUID in the Lucene commit data.
  • later during recovery, the replication tracker is initialized with a global checkpoint value equals to the LOCAL_CHECKPOINT_KEY stored in the Lucene commit.

We can skip the creation of the translog because searchable snapshot shard do not need one, and it's only use to store the local checkpoint locally to be read later during recovery. If we don't have a translog then we don't need to associate it with the Lucene index, so we can skip the commit too.

This change introduce an hasTranslog method that is used to know when it is safe to NOT create a translog, in which case the global checkpoint is read from the last Lucene commit during primary shard recovery from snapshot, peer-recovery and recovery from existing store.

In case an existing translog exist on disk, it will be cleaned up.

They are also few discoveries around some assertions introduced with snapshot based recoveries, as well as a cached estimation of the size of directories that was refreshed due to Lucene commit but now requires to be "marked as stale".

@tlrx tlrx added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v9.0.0 labels Dec 12, 2024
@tlrx tlrx force-pushed the 2024/12/12/skip-translog-creation-for-searchable-snapshots branch 2 times, most recently from d029b12 to a19e312 Compare December 13, 2024 12:53
@tlrx tlrx changed the title [Draft] Skip translog creation when recovering searchable snapshot shards Skip translog creation and Lucene commits when recovering searchable snapshot shards Dec 13, 2024
@tlrx tlrx marked this pull request as ready for review December 13, 2024 14:20
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Dec 13, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

*/
public boolean hasTranslog() {
// Expect no translog files to exist for searchable snapshots
return false == indexSettings.getIndexMetadata().isSearchableSnapshot();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I limited the scope to searchable snapshot shards (including closed searchable snapshot shards).

try {
final var translogLocation = indexShard.shardPath().resolveTranslog();
if (indexShard.hasTranslog() == false) {
Translog.deleteAll(translogLocation);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to relax this for regular read-only indices later.

@tlrx tlrx force-pushed the 2024/12/12/skip-translog-creation-for-searchable-snapshots branch from a19e312 to 5e0224b Compare December 13, 2024 16:15
Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Left a few minor comments, but no need for an additional round.

assertThat(shardRouting.toString(), unwrappedDir, instanceOf(ByteBuffersDirectory.class));

final ByteBuffersDirectory inMemoryDir = (ByteBuffersDirectory) unwrappedDir;
assertThat(inMemoryDir.listAll(), arrayWithSize(1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we leave this in place but verify there are no files modified - at least until we possibly remove the in-memory-no-op-commit-directory?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense yes, that's a good reason to keep this.

@tlrx tlrx force-pushed the 2024/12/12/skip-translog-creation-for-searchable-snapshots branch from 5e0224b to 7af707c Compare December 16, 2024 10:33
@tlrx tlrx merged commit b461baf into elastic:main Dec 16, 2024
@tlrx tlrx deleted the 2024/12/12/skip-translog-creation-for-searchable-snapshots branch December 16, 2024 11:46
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Dec 16, 2024

Thanks Henning & Luca!

tlrx added a commit that referenced this pull request Feb 4, 2025
Since #118606 searchable snapshots shards are not expected to 
write files on disk, with the exception of archives indices mounted 
as searchable snapshots which require to rewrite the segment infos 
file in a newer version.

Ideally we should be able to remove the usage of the 
InMemoryNoOpCommitDirectory for non-archives searchable 
snapshots indices and only rely on SearchableSnapshotDirectory 
that throws on write operations. Similarly, starting 9.0 searchable 
snapshots shards do not write files on disk and therefore should 
be able to use a Directory implementation that forbids writes. 
Searchable snapshots shards for indices created before 9.0 
require a mutable directory for peer-recoveries.

In this change, we only allow writes for archives indices and 
searchable snapshots created before 9.0.

Relates ES-10438
fzowl pushed a commit to voyage-ai/elasticsearch that referenced this pull request Feb 4, 2025
…1210)

Since elastic#118606 searchable snapshots shards are not expected to 
write files on disk, with the exception of archives indices mounted 
as searchable snapshots which require to rewrite the segment infos 
file in a newer version.

Ideally we should be able to remove the usage of the 
InMemoryNoOpCommitDirectory for non-archives searchable 
snapshots indices and only rely on SearchableSnapshotDirectory 
that throws on write operations. Similarly, starting 9.0 searchable 
snapshots shards do not write files on disk and therefore should 
be able to use a Directory implementation that forbids writes. 
Searchable snapshots shards for indices created before 9.0 
require a mutable directory for peer-recoveries.

In this change, we only allow writes for archives indices and 
searchable snapshots created before 9.0.

Relates ES-10438
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >non-issue Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants