Elasticsearch version (bin/elasticsearch --version): 7.10.1
Plugins installed: Cloud
JVM version (java -version): 15.0.1+9
OS version (uname -a if on a Unix-like system): Cloud
Description of the problem including expected versus actual behavior:
All generic threads are waiting with the following stack trace:
at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:211)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;IZZZJ)I (AbstractQueuedSynchronizer.java:714)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(I)V (AbstractQueuedSynchronizer.java:937)
at java.util.concurrent.locks.ReentrantLock$Sync.lock()V (ReentrantLock.java:153)
at java.util.concurrent.locks.ReentrantLock.lock()V (ReentrantLock.java:322)
at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(Lorg/elasticsearch/action/DocWriteRequest;)V (BulkProcessor.java:379)
at org.elasticsearch.action.bulk.BulkProcessor.add(Lorg/elasticsearch/action/DocWriteRequest;)Lorg/elasticsearch/action/bulk/BulkProcessor; (BulkProcessor.java:361)
at org.elasticsearch.action.bulk.BulkProcessor.add(Lorg/elasticsearch/action/index/IndexRequest;)Lorg/elasticsearch/action/bulk/BulkProcessor; (BulkProcessor.java:347)
at org.elasticsearch.xpack.ilm.history.ILMHistoryStore.lambda$putAsync$0(Lorg/elasticsearch/action/index/IndexRequest;Lorg/elasticsearch/xpack/ilm/history/ILMHistoryItem;)V (ILMHistoryStore.java:150)
at org.elasticsearch.xpack.ilm.history.ILMHistoryStore$$Lambda$8091+0x0000000801eda370.run()V (Unknown Source)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run()V (ThreadContext.java:678)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:630)
at java.lang.Thread.run()V (Thread.java:832)
Meanwhile one of the scheduler threads is holding the lock on which they're waiting, and is blocked here:
jdk.internal.misc.Unsafe.park(ZJ)V (Native Method)
java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:211)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;IZZZJ)I (AbstractQueuedSynchronizer.java:714)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(I)V (AbstractQueuedSynchronizer.java:1046)
java.util.concurrent.Semaphore.acquire()V (Semaphore.java:318)
org.elasticsearch.action.bulk.BulkRequestHandler.execute(Lorg/elasticsearch/action/bulk/BulkRequest;J)V (BulkRequestHandler.java:59)
org.elasticsearch.action.bulk.BulkProcessor.execute(Lorg/elasticsearch/action/bulk/BulkRequest;J)V (BulkProcessor.java:454)
org.elasticsearch.action.bulk.BulkProcessor.execute()V (BulkProcessor.java:463)
org.elasticsearch.action.bulk.BulkProcessor.access$400(Lorg/elasticsearch/action/bulk/BulkProcessor;)V (BulkProcessor.java:54)
org.elasticsearch.action.bulk.BulkProcessor$Flush.run()V (BulkProcessor.java:503)
org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun()V (Scheduler.java:213)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run()V (AbstractRunnable.java:37)
java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Executors.java:515)
java.util.concurrent.FutureTask.run()V (FutureTask.java:264)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:304)
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:630)
java.lang.Thread.run()V (Thread.java:832)
The semaphore on which it is waiting is held, apparently, by another ongoing flush. I haven't chased this any further but I could believe that the ongoing flush needs a generic thread to make progress.
Steps to reproduce:
Unknown.
Provide logs (if relevant):
Not available, but I can share a heap dump privately.
Workaround:
The deadlocked node must be restarted, it will not recover on its own. If the issue persists then the problematic component can be disabled by setting indices.lifecycle.history_index_enabled: false in the elasticsearch.yml file on each master-eligible node and then restarting them all.
Elasticsearch version (
bin/elasticsearch --version): 7.10.1Plugins installed: Cloud
JVM version (
java -version): 15.0.1+9OS version (
uname -aif on a Unix-like system): CloudDescription of the problem including expected versus actual behavior:
All generic threads are waiting with the following stack trace:
Meanwhile one of the scheduler threads is holding the lock on which they're waiting, and is blocked here:
The semaphore on which it is waiting is held, apparently, by another ongoing flush. I haven't chased this any further but I could believe that the ongoing flush needs a generic thread to make progress.
Steps to reproduce:
Unknown.
Provide logs (if relevant):
Not available, but I can share a heap dump privately.
Workaround:
The deadlocked node must be restarted, it will not recover on its own. If the issue persists then the problematic component can be disabled by setting
indices.lifecycle.history_index_enabled: falsein theelasticsearch.ymlfile on each master-eligible node and then restarting them all.