Skip to content

snapshots fail if HDFS is not available when Elasticsearch starts #25119

@inqueue

Description

@inqueue

cloned from elastic/elasticsearch-hadoop#997

Elasticsearch 2.4.4 + repository-hdfs

Snapshots fail if the HDFS repo is not available when Elasticsearch starts, even though the repo was brought up while the Elasticsearch service is running.

Repro:

  1. Stopped ES
  2. Stopped Hadoop HDFS
  3. Started ES while HDFS is down
  4. Started HDFS
  5. Attempted to GET /_snapshot/repo and then attempted to PUT /_snapshot/repo

The API command fails with:

{
  "error": {
    "root_cause": [
      {
        "type": "repository_missing_exception",
        "reason": "[may2017] missing"
      }
    ],
    "type": "repository_missing_exception",
    "reason": "[may2017] missing"
  },
  "status": 404
}

Full cluster log is available upon request.

[2017-05-16 19:28:14,286][WARN ][repositories             ] [cops-4] failed to create repository [hdfs][may2017]
org.elasticsearch.common.inject.CreationException: Guice creation errors:

1) Error injecting constructor, java.net.ConnectException: Call From node-4.dev.domain.com/10.237.50.1 to node-2.dev.domain.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
  at org.elasticsearch.repositories.hdfs.HdfsRepository.<init>(Unknown Source)
  while locating org.elasticsearch.repositories.hdfs.HdfsRepository
  while locating org.elasticsearch.repositories.Repository

1 error
	at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:360)
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:178)
	at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
	at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:154)
	at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
	at org.elasticsearch.repositories.RepositoriesService.createRepositoryHolder(RepositoriesService.java:404)
	at org.elasticsearch.repositories.RepositoriesService.clusterChanged(RepositoriesService.java:299)
	at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:622)
	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:784)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Call From node-4.dev.domain.com/10.237.50.1 to node-2.dev.domain.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
	at org.apache.hadoop.ipc.Client.call(Client.java:1480)
	at org.apache.hadoop.ipc.Client.call(Client.java:1407)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
	at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1450)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.doGetFileSystem(HdfsRepository.java:125)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.access$000(HdfsRepository.java:57)
	at org.elasticsearch.repositories.hdfs.HdfsRepository$2.run(HdfsRepository.java:102)
	at org.elasticsearch.repositories.hdfs.HdfsRepository$2.run(HdfsRepository.java:99)
	at java.security.AccessController.doPrivileged(Native Method)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.getFileSystem(HdfsRepository.java:99)
	at org.elasticsearch.repositories.hdfs.SecurityUtils.execute(SecurityUtils.java:34)
	at org.elasticsearch.hadoop.hdfs.blobstore.HdfsBlobStore.mkdirs(HdfsBlobStore.java:58)
	at org.elasticsearch.hadoop.hdfs.blobstore.HdfsBlobStore.<init>(HdfsBlobStore.java:54)
	at org.elasticsearch.repositories.hdfs.HdfsRepository.<init>(HdfsRepository.java:90)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
	at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
	at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
	at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:54)
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
	at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
	at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
	at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:201)
	at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:879)
	at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
	... 12 more
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
	at org.apache.hadoop.ipc.Client.call(Client.java:1446)
	... 57 more

Is there any way to try establishing a connection to the repo after Elasticsearch has started?

@imotov @jbaiera

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions