-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Memory leak in RestClient when elasticsearch takes more than 30 seconds to respond #33342
Description
Describe the feature:
Elasticsearch version (bin/elasticsearch --version):
Version: 6.1.2, Build: 5b1fea5/2018-01-10T02:35:59.208Z, JVM: 1.8.0_181
Plugins installed: [] None
JVM version (java -version):
1.8.0_181
OS version (uname -a if on a Unix-like system):
CentOS Linux release 7.5.1804 (Core)
Description of the problem including expected versus actual behavior:
org.apache.http.concurrent.BasicFuture objects keep building up in jvm heap consuming most of the heap space eventually leading to OutOfMemory error when elasticsearch is under load and does not respond to performRequest call or takes more than 30 seconds to respond.
Error reported by the library in this case is
java.io.IOException: listener timeout after waiting for [30000] ms
Steps to reproduce:
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
- To simulate loaded elasticsearch server, start a webserver to respond to any query with a delay of 200 seconds
Ex: nodejs server with following script
var http = require('http');
var timeout = 100000; //sleep 100 seconds
http.createServer(function (req, res) {
setTimeout((function() {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end("Hello I am awake");
}), timeout);
}).listen(9200);
- Create a java threadpool of 32 threads and continuously submit tasks which should be picked up by each of this thread in the pool
- Each thread upon getting the request should make a performRequest call with a StringEntity of about 1MB size on a single instance of RestClient shared across all the threads. In our scenario, we make a _bulk request to elasticsearch with 500 update operations.
- Monitor the heap objects after every 15 minutes and you will notice buildup of BasicFuture objects with strong references from the GC root which do not get released.
Provide logs (if relevant):
Workaround:
This issues is not seen when RestClient.setMaxRetryTimeoutMillis(100000) is used which results in SocketTimeout instead of listener timeout as shown below
Socket timeout Exception:
11:11:17.216 [I/O dispatcher 1] DEBUG
org.apache.http.impl.nio.client.InternalIODispatch - http-outgoing-0 [ACTIVE]
Timeout
11:11:17.216 [I/O dispatcher 1] DEBUG
org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-0
127.0.0.1:52928<->127.0.0.1:9200[ACTIVE] [rw:w]: Shutdown
11:11:17.218 [I/O dispatcher 1] DEBUG
org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 1]
connection aborted
11:11:17.218 [I/O dispatcher 1] DEBUG
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Releasing
connection: [id: http-outgoing-0][route: {}->http://127.0.0.1:9200][total kept
alive: 0; route allocated: 1 of 10; total allocated: 1 of 30]
11:11:17.218 [I/O dispatcher 1] DEBUG
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection
released: [id: http-outgoing-0][route: {}->http://127.0.0.1:9200][total kept
alive: 0; route allocated: 0 of 10; total allocated: 0 of 30]
11:11:17.222 [I/O dispatcher 1] DEBUG org.elasticsearch.client.RestClient -
request [POST http://127.0.0.1:9200/_bulk] failed
java.net.SocketTimeoutException: null
at
org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375)
at
org.apache.http.impl.nio.client.InternalRequestExecutor.timeout(InternalRequestExecutor.java:116)
at
org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
at
org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
at
org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
at
org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263)
at
org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492)
at
org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213)
at
org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
at
org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at
org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
at java.lang.Thread.run(Thread.java:748)
11:11:17.223 [I/O dispatcher 1] DEBUG org.elasticsearch.client.RestClient -
added host [http://127.0.0.1:9200] to blacklist
11:11:17.224 [I/O dispatcher 1] DEBUG
org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl - http-outgoing-0
0.0.0.0:52928<->127.0.0.1:9200[CLOSED][]: Shutdown
11:11:17.224 [I/O dispatcher 1] DEBUG
org.apache.http.impl.nio.client.InternalIODispatch - http-outgoing-0 [CLOSED]:
Disconnected