Add recovery infrastructure hook to work with older Lucene indices#81056
Add recovery infrastructure hook to work with older Lucene indices#81056ywelsch merged 30 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
server/build.gradle
Outdated
| ignoreClasses 'org.apache.lucene.index.LazySoftDeletesDirectoryReaderWrapper' | ||
|
|
||
| // only used temporarily, will be removed in a follow-up | ||
| ignoreClasses 'org.apache.lucene.index.SegmentInfosHelper' |
There was a problem hiding this comment.
The SegmentInfosHelper class is used by this PR to load ES 6 (= Lucene 7) indices, which allows us to test that the recovery infrastructure as added by the PR works as intended. The class here will be removed in a follow-up step that will take a different, but more involved approach to load older indices (it's out of focus for this PR).
| } else if ("mappings".equals(currentFieldName)) { | ||
| // don't try to parse these for now | ||
| parser.skipChildren(); | ||
| } else if ("in_sync_allocations".equals(currentFieldName)) { |
There was a problem hiding this comment.
This is just copy paste from the regular IndexMetadata parsing code. in_sync_allocations are required to exist in IndexMetadata for a restore to go through.
| // store can still have existing files but they will be deleted just before being | ||
| // restored. | ||
| recoveryTargetMetadata = store.getMetadata(null, true); | ||
| if (isSearchableSnapshotStore(store.indexSettings().getSettings())) { |
There was a problem hiding this comment.
We skip some of these steps here for searchable snapshots (as they are effectively useless, and require opening up the index, which we can defer to a later stage by special-casing this here).
| private void afterRestore(SnapshotFiles snapshotFiles, Store store, StoreFileMetadata restoredSegmentsFile) { | ||
| try { | ||
| if (isSearchableSnapshotStore(store.indexSettings().getSettings()) == false) { | ||
| Lucene.pruneUnreferencedFiles(restoredSegmentsFile.name(), store.directory()); |
There was a problem hiding this comment.
This code does not look to do anything, and is actually in the way as it opens the Lucene index before it's converted. No tests are breaking with this code, and it's unclear whether it still serves a purpose (even after some historic digging)
There was a problem hiding this comment.
Yes, it looks like we clean up according to the snapshot contents anyway - this second cleanup was added later on in #8969 but there's no indication why it was needed. I think you're right anyway, no need to open the index here.
server/src/main/java/org/elasticsearch/snapshots/RestoreService.java
Outdated
Show resolved
Hide resolved
| * 2.0 and the Server Side Public License, v 1; you may not use this file except | ||
| * in compliance with, at your election, the Elastic License 2.0 or the Server | ||
| * Side Public License, v 1. | ||
| * 2.0; you may not use this file except in compliance with the Elastic License |
There was a problem hiding this comment.
this code got moved from qa to x-pack:qa
DaveCTurner
left a comment
There was a problem hiding this comment.
Mainly looked at the changes to snapshots-of-searchable-snapshots code, and I like this change. It's so much easier now that we can write searchable-snapshot-specific code in :server.
I left a couple of other comments too.
server/src/main/java/org/elasticsearch/snapshots/RestoreService.java
Outdated
Show resolved
Hide resolved
| */ | ||
| @SuppressWarnings("CheckStyle") | ||
| @SuppressForbidden(reason = "Lucene class") | ||
| public class OldSegmentInfos implements Cloneable, Iterable<SegmentCommitInfo> { |
There was a problem hiding this comment.
This class is just temporarily here until we have proper shaded versions of Lucene. It just helps embedding ES 6 indices in order to let the tests pass.
No need for a review on this one.
server/src/main/java/org/elasticsearch/index/CompositeIndexEventListener.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/FileRestoreContext.java
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public void onIndexModule(IndexModule indexModule) { | ||
| indexModule.addIndexEventListener(new IndexEventListener() { |
There was a problem hiding this comment.
I suppose that we can be more restrictive here and only register the IndexEventListener once for snapshot builds with ALLOW_BWC_INDICES_SETTING setting?
There was a problem hiding this comment.
We only ran the listener for snapshot builds. I've now moved the isSnapshotBuild check in 3357ec3 to the registration of the listener. At this point, we don't know anything about the repository, so making it conditional on the ALLOW_BWC_INDICES_SETTING won't work. As this conditional logic will go away (and be replaced by license checks etc. in follow-ups) I think we're good here.
| final ImmutableOpenMap.Builder<ShardId, ShardRestoreStatus> shardsBuilder = ImmutableOpenMap.builder(); | ||
|
|
||
| final Version minIndexCompatibilityVersion = currentState.getNodes().getMaxNodeVersion().minimumIndexCompatibilityVersion(); | ||
| final Version minIndexCompatibilityVersion = skipVersionChecks(repositoryMetadata) |
There was a problem hiding this comment.
We also need some special handling in allocation deciders to only restore old Lucene indices on nodes >= 8.1
There was a problem hiding this comment.
This PR is not addressing allocation-related bits, focusing on recovery bits only. There are quite a few more things to do for these indices to be properly handled at the allocation level. For example, restarting a node with a BWC index will currently fail (will detect old index and refuse to start up). This will be addressed in follow-ups.
| // so far only added basic infrastructure for reading 6.0.0+ indices | ||
| if (Build.CURRENT.isSnapshot() && oldVersion.onOrAfter(Version.fromString("6.0.0"))) { | ||
| // restore index | ||
| RestoreSnapshotResponse restoreSnapshotResponse = client.snapshot() |
There was a problem hiding this comment.
Can we also test the restore and mount of old Lucene indices over closed indices too?
There was a problem hiding this comment.
I've extended the test to cover this.
tlrx
left a comment
There was a problem hiding this comment.
LGTM - I only made minor comments that you can feel free to address or not.
I did not review the pure Lucene parts, I'd expect someone more knowledgeable to review it.
server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/FileRestoreContext.java
Show resolved
Hide resolved
| import java.util.Iterator; | ||
| import java.util.List; | ||
|
|
||
| public class BWCLucene70Codec extends Codec { |
There was a problem hiding this comment.
Can we add some documentation about what this class does?
There was a problem hiding this comment.
The Lucene bits are temporary and will be completely reworked in a follow-up. Will address it there.
| final String segmentFileName = SegmentInfos.getLastCommitSegmentsFileName(directory); | ||
| if (segmentFileName != null) { | ||
| long generation = SegmentInfos.generationFromSegmentsFileName(segmentFileName); | ||
| try (ChecksumIndexInput input = directory.openChecksumInput(segmentFileName, IOContext.READ)) { |
There was a problem hiding this comment.
I'm not sure why a ChecksumIndexInput is used here? It seems that a regular IndexInput can be used, or maybe there's something I miss.
There was a problem hiding this comment.
This method is going away in a follow-up.
| CodecUtil.checkIndexHeaderSuffix(input, Long.toString(generation, Character.MAX_RADIX)); | ||
|
|
||
| Version luceneVersion = Version.fromBits(input.readVInt(), input.readVInt(), input.readVInt()); | ||
| int indexCreatedVersion = input.readVInt(); |
There was a problem hiding this comment.
going away in a follow-up.
| int indexCreatedVersion = input.readVInt(); | ||
| return luceneVersion; | ||
| } catch (Exception e) { | ||
| // ignore |
There was a problem hiding this comment.
Should we fail the shard if we cannot read the Lucene version?
There was a problem hiding this comment.
addressed in follow-up
| public class InMemoryNoOpCommitDirectory extends FilterDirectory { | ||
|
|
||
| private final Directory realDirectory; | ||
| private final Set<String> deletedFiles = new HashSet<>(); |
| assertThat(snapshotStatus.getStats().getTotalFileCount(), greaterThan(0)); | ||
|
|
||
| // so far only added basic infrastructure for reading 6.0.0+ indices | ||
| if (Build.CURRENT.isSnapshot() && oldVersion.onOrAfter(Version.fromString("6.0.0"))) { |
There was a problem hiding this comment.
At some point we could reintroduce the Version constants for old Elasticsearch versions.
There was a problem hiding this comment.
This logic will completely change in the follow-up.
| // mount as searchable snapshot | ||
| RestoreSnapshotResponse mountSnapshotResponse = client.searchableSnapshots() | ||
| .mountSnapshot( | ||
| new MountSnapshotRequest("testrepo", "snap1", "test").storage(MountSnapshotRequest.Storage.FULL_COPY) |
There was a problem hiding this comment.
Can we randomize this between full/partial?
There was a problem hiding this comment.
needed another small tweak. I have this already in the follow-up.
|
Thanks @tlrx and @DaveCTurner ! |
This is a simple infrastructure PR for more interesting follow-ups.
We add a simple hook to the snapshot recovery infrastructure that allows making changes to the incoming Lucene files before they are processed the core system. This allows a plugin to make changes to the incoming Lucene files before they are exposed to the rest of the system, e.g. upgrade old Lucene version files to a newer version.
We also add a setting for snapshot repositories to allow restoring older indices (only available in snapshot builds).