index shard should be able to cancel check index on close. by bleskes · Pull Request #18839 · elastic/elasticsearch

bleskes · 2016-06-13T15:08:39Z

If someone sets index.shard.check_on_startup, indexing start up time can be slow (by design, it diligently goes and checks all data). If for some reason the shard is closed in that time, the store ref is kept around and prevents a new shard copy to be allocated to this node via the shard level locks. This is especially tricky if the shard is close due to a cancelled recovery which may re-restart soon.

This PR adds a cancellable threads instance to each IndexShard and perform index checking underneath it, so it can be cancelled on close. This assumes that:

Interrupting a checkIndex is safe.
Interrupting a checkIndex will actually make it stop.

Relates to #12011

PS. We are also discussing not doing a full index check on peer recovery but rather just do a checksum check.

bleskes · 2016-06-13T15:08:51Z

@s1monw wdyt?

s1monw · 2016-06-20T07:59:31Z

LGTM

``` > Throwable #1: java.lang.RuntimeException: file handle leaks: [SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1), SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1)] > at __randomizedtesting.SeedInfo.seed([518545A20D129C8C]:0) > at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63) > at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77) > at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception > at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46) > at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81) > at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:271) > at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212) > at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:240) > at java.nio.file.Files.newByteChannel(Files.java:361) > at java.nio.file.Files.newByteChannel(Files.java:407) > at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77) > at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94) > at org.apache.lucene.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:2695) > at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:737) > at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94) > at org.elasticsearch.common.lucene.Lucene$1.doBody(Lucene.java:237) > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:685) > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:637) > at org.elasticsearch.common.lucene.Lucene.checkSegmentInfoIntegrity(Lucene.java:242) > at org.elasticsearch.index.store.Store$MetadataSnapshot.loadMetadata(Store.java:847) > at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:740) > at org.elasticsearch.index.store.Store.getMetadata(Store.java:260) > at org.elasticsearch.index.store.Store.getMetadata(Store.java:240) > at org.elasticsearch.index.shard.IndexShard.doCheckIndex(IndexShard.java:1310) > at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:102) > at org.elasticsearch.index.shard.IndexShard.checkIndex(IndexShard.java:1288) > at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:921) > at org.elasticsearch.index.shard.IndexShard.skipTranslogRecovery(IndexShard.java:964) > at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:297) > at ```

index shard should be able to cancel check index on close.

51606b9

bleskes added resiliency v5.0.0-alpha4 labels Jun 13, 2016

clintongormley added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jun 14, 2016

bleskes merged commit 0cb4e57 into elastic:master Jun 20, 2016

bleskes deleted the index_shard_cancel_on_close branch June 20, 2016 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index shard should be able to cancel check index on close.#18839

index shard should be able to cancel check index on close.#18839
bleskes merged 1 commit intoelastic:masterfrom
bleskes:index_shard_cancel_on_close

bleskes commented Jun 13, 2016

Uh oh!

bleskes commented Jun 13, 2016

Uh oh!

s1monw commented Jun 20, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bleskes commented Jun 13, 2016

Uh oh!

bleskes commented Jun 13, 2016

Uh oh!

s1monw commented Jun 20, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants