Generic threads deadlock related to ILMHistoryStore indexing

**Elasticsearch version** (`bin/elasticsearch --version`): 7.10.1

**Plugins installed**: Cloud

**JVM version** (`java -version`): 15.0.1+9

**OS version** (`uname -a` if on a Unix-like system): Cloud

**Description of the problem including expected versus actual behavior**:

All generic threads are waiting with the following stack trace:

```
at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:211)                                                                                                             
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;IZZZJ)I (AbstractQueuedSynchronizer.java:714)                             
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(I)V (AbstractQueuedSynchronizer.java:937)                                                                                             
at java.util.concurrent.locks.ReentrantLock$Sync.lock()V (ReentrantLock.java:153)                                                                                                                      
at java.util.concurrent.locks.ReentrantLock.lock()V (ReentrantLock.java:322)                                                                                                                           
at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(Lorg/elasticsearch/action/DocWriteRequest;)V (BulkProcessor.java:379)                                                                       
at org.elasticsearch.action.bulk.BulkProcessor.add(Lorg/elasticsearch/action/DocWriteRequest;)Lorg/elasticsearch/action/bulk/BulkProcessor; (BulkProcessor.java:361)                                   
at org.elasticsearch.action.bulk.BulkProcessor.add(Lorg/elasticsearch/action/index/IndexRequest;)Lorg/elasticsearch/action/bulk/BulkProcessor; (BulkProcessor.java:347)                                
at org.elasticsearch.xpack.ilm.history.ILMHistoryStore.lambda$putAsync$0(Lorg/elasticsearch/action/index/IndexRequest;Lorg/elasticsearch/xpack/ilm/history/ILMHistoryItem;)V (ILMHistoryStore.java:150)
at org.elasticsearch.xpack.ilm.history.ILMHistoryStore$$Lambda$8091+0x0000000801eda370.run()V (Unknown Source)                                                                                         
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run()V (ThreadContext.java:678)                                                                                    
at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1130)                                                                 
at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:630)                                                                                                                 
at java.lang.Thread.run()V (Thread.java:832)                                                                                                                                                           
```

Meanwhile one of the scheduler threads is holding the lock on which they're waiting, and is blocked here:

```
jdk.internal.misc.Unsafe.park(ZJ)V (Native Method)                                                                                                                     
java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:211)                                                                                
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;IZZZJ)I (AbstractQueuedSynchronizer.java:714)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(I)V (AbstractQueuedSynchronizer.java:1046)                                            
java.util.concurrent.Semaphore.acquire()V (Semaphore.java:318)                                                                                                         
org.elasticsearch.action.bulk.BulkRequestHandler.execute(Lorg/elasticsearch/action/bulk/BulkRequest;J)V (BulkRequestHandler.java:59)                                   
org.elasticsearch.action.bulk.BulkProcessor.execute(Lorg/elasticsearch/action/bulk/BulkRequest;J)V (BulkProcessor.java:454)                                            
org.elasticsearch.action.bulk.BulkProcessor.execute()V (BulkProcessor.java:463)                                                                                        
org.elasticsearch.action.bulk.BulkProcessor.access$400(Lorg/elasticsearch/action/bulk/BulkProcessor;)V (BulkProcessor.java:54)                                         
org.elasticsearch.action.bulk.BulkProcessor$Flush.run()V (BulkProcessor.java:503)                                                                                      
org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun()V (Scheduler.java:213)                                                                              
org.elasticsearch.common.util.concurrent.AbstractRunnable.run()V (AbstractRunnable.java:37)                                                                            
java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Executors.java:515)                                                                           
java.util.concurrent.FutureTask.run()V (FutureTask.java:264)                                                                                                           
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:304)                                                     
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1130)                                    
java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:630)                                                                                    
java.lang.Thread.run()V (Thread.java:832)                                                                                                                              
```

The semaphore on which it is waiting is held, apparently, by another ongoing flush. I haven't chased this any further but I could believe that the ongoing flush needs a generic thread to make progress.

**Steps to reproduce**:

Unknown.

**Provide logs (if relevant)**:

Not available, but I can share a heap dump privately.

**Workaround**:

The deadlocked node must be restarted, it will not recover on its own. If the issue persists then the problematic component can be disabled by setting `indices.lifecycle.history_index_enabled: false` in the `elasticsearch.yml` file on each master-eligible node and then restarting them all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic threads deadlock related to ILMHistoryStore indexing #68468

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Generic threads deadlock related to ILMHistoryStore indexing #68468

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions