Skip to content

Java minimal rest client hangs with default SniffOnFailure listener enabled #25701

@ssesha

Description

@ssesha

Elasticsearch version: 2.4.0

Elasticsearch java rest client version: 5.2.2

Plugins installed: []

JVM version (java -version): 1.8.0_131

OS version (uname -a if on a Unix-like system): Ubuntu 16.04

Description of the problem including expected versus actual behavior:
Default SniffonfailureListener on rest client blocks the HTTPAsyncClient reactor thread when request encounters a java.net.ConnectException
Steps to reproduce:

  1. Have two es nodes and let sniffer pick them up
  2. Shut down one node
  3. Client tries to connect to that node --> fails --> tries to sniff and hangs till maxRetryTimeoutMillis

The failed callback triggers the sniffer https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L374

However, the failed callback is being handled by the reactor thread of the underlying HttpAsyncClient. Since, the sniffer does a blocking performRequest using the same client instance and the HttpClient can't handle the request because the reactor thread is blocked, its effectively a deadlock till the SyncResponselistener timeout of maxRetryTimeoutMillis and no requests can be served at all during this time period. 😰

I found a similar issue https://issues.apache.org/jira/browse/HTTPCLIENT-1805 where the suggestion is to avoid potentially blocking or long running operations in the callbacks and more so in the failed callback since it could block the reactor thread.

I guess the solution would be to trigger the retries as well as sniffer on a separate threadpool internal to the RestClient so that the HttpClient's dispatcher and reactor threads are freed up asap.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions