Skip to content

snapshots fail if HDFS is not available when Elasticsearch starts #997

@inqueue

Description

@inqueue

Elasticsearch 2.4.4 + repository-hdfs

Snapshots fail if the HDFS repo is not available when Elasticsearch starts, even though the repo was brought up while the Elasticsearch service is running.

Repro:

  1. Stopped ES
  2. Stopped Hadoop HDFS
  3. Started ES while HDFS is down
  4. Started HDFS
  5. Attempted to GET /_snapshot/repo and then attempted to PUT /_snapshot/repo

The API command fails with:

{
  "error": {
    "root_cause": [
      {
        "type": "repository_missing_exception",
        "reason": "[may2017] missing"
      }
    ],
    "type": "repository_missing_exception",
    "reason": "[may2017] missing"
  },
  "status": 404
}

Full cluster log is available upon request.

[2017-05-16 19:28:14,286][WARN ][repositories             ] [cops-4] failed to create repository [hdfs][may2017]
org.elasticsearch.common.inject.CreationException: Guice creation errors:

1) Error injecting constructor, java.net.ConnectException: Call From node-4.dev.domain.com/10.237.50.1 to node-2.dev.domain.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
  at org.elasticsearch.repositories.hdfs.HdfsRepository.<init>(Unknown Source)
  while locating org.elasticsearch.repositories.hdfs.HdfsRepository
  while locating org.elasticsearch.repositories.Repository

1 error
	at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:360)
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:178)
	at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
	at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:154)
	at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
	at org.elasticsearch.repositories.RepositoriesService.createRepositoryHolder(RepositoriesService.java:404)
	at org.elasticsearch.repositories.RepositoriesService.clusterChanged(RepositoriesService.java:299)
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:622)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:784)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Call From node-4.dev.domain.com/10.237.50.1 to node-2.dev.domain.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
	at org.apache.hadoop.ipc.Client.call(Client.java:1480)
	at org.apache.hadoop.ipc.Client.call(Client.java:1407)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
	at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1450)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.doGetFileSystem(HdfsRepository.java:125)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.access$000(HdfsRepository.java:57)
	at org.elasticsearch.repositories.hdfs.HdfsRepository$2.run(HdfsRepository.java:102)
	at org.elasticsearch.repositories.hdfs.HdfsRepository$2.run(HdfsRepository.java:99)
	at java.security.AccessController.doPrivileged(Native Method)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.getFileSystem(HdfsRepository.java:99)
	at org.elasticsearch.repositories.hdfs.SecurityUtils.execute(SecurityUtils.java:34)
	at org.elasticsearch.hadoop.hdfs.blobstore.HdfsBlobStore.mkdirs(HdfsBlobStore.java:58)
	at org.elasticsearch.hadoop.hdfs.blobstore.HdfsBlobStore.<init>(HdfsBlobStore.java:54)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.<init>(HdfsRepository.java:90)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
	at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
	at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
	at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:54)
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
	at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
	at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
	at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:201)
	at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:879)
	at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
	... 12 more
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
	at org.apache.hadoop.ipc.Client.call(Client.java:1446)
	... 57 more

Is there any way to try establishing a connection to the repo after Elasticsearch has started? Does this belong in https://github.com/elastic/elasticsearch?

@imotov We spoke about this issue back in late April. Please let me know whatever additional information is needed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions