Skip to content

OOM due to large number of requests in TransportService.clientHandlers #50241

@njustyq

Description

@njustyq

Describe the feature:

Elasticsearch version (bin/elasticsearch --6.3.2):

Plugins installed: [repository-hdfs]

JVM version (java -version):10.0.2

OS version (uname -a if on a Unix-like system):centos7.2

Description of the problem including expected versus actual behavior:
our cluster is 24 data nodes with 31G heap and 1.7T*4 ssd disk , 3 master with 8G heap ,about 8000tps for write. we rolling upgrade(6.3.2 to 6.3.2) the cluster as the following steps:
①set allocation to none
②restart the data node
③set allocation to all
wait for the health from yellow to green.
When i finished upgrade part of the data nodes,i had waited a while,i found the master node old gc and run OOM later.
So I loaded heap dump from master node into Eclipse MemoryAnalyzer and found that 87.57% of memory is used by TransportService.clientHandlers hash map,
most of the RequestHolder was consist of like the action:indices:monitor/stats[n]、indices:monitor/recovery[n] or cluster:monitor/stats[n],below is the pic of heap dump:
client
1
2

and i use the OQL
SELECT toString(action) FROM org.elasticsearch.transport.TransportService$RequestHolder
to statistic the action,result is as follows:

action

So,are there any bugs here or master overload?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions