Skip to content

DisMax query takes CPU hostage until the heat death of the universe #130239

@benwtrent

Description

@benwtrent

Elasticsearch Version

8.15

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

When executing a significantly complicated dismax query, its possible that when iterating impacts, the iteration seemingly gets "stuck".

A CPU thread gets take hostage at 100%, and iterates forever. Task cancellation does nothing as the CPU is stuck in a busy loop working without doing any IO.

(stuck means running for hours and hours, requiring a server restart to stop)

It in particular, gets stuck in this loop:

https://github.com/apache/lucene/blob/42d5806fd69400bb42b7d15f6311ac02d3104efe/lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java#L90..L108

            private int advanceImpacts(int target) throws IOException {
              if (target > upTo) {
                moveToNextBlock(target);
              }


              while (true) {
                if (maxScore >= minScore) {
                  return target;
                }


                if (upTo == NO_MORE_DOCS) {
                  return NO_MORE_DOCS;
                }


                target = upTo + 1;


                moveToNextBlock(target);
              }
            }

Then in moveToNextBlock this executes the ES812ScoreSkipReader impacts check and possibly, this adversely sets the target resulting in a loop.

Steps to Reproduce

@softwaredoug discovered this, I will defer to him.

Logs (if relevant)

/tmp/jstack.4.log:"elasticsearch[eck-elasticsearch-es-default-5][search_worker][T#2]" #78 [163] daemon prio=5 os_prio=0 cpu=771958.29ms elapsed=13461.64s tid=0x00007f5f28013750 nid=163 runnable  [0x00007f5e5d1fd000]
/tmp/jstack.4.log-   java.lang.Thread.State: RUNNABLE
/tmp/jstack.4.log-	at org.apache.lucene.search.DisjunctionScoreBlockBoundaryPropagator.advanceShallow(org.apache.lucene.core@9.11.1/DisjunctionScoreBlockBoundaryPropagator.java:79)
/tmp/jstack.4.log-	at org.apache.lucene.search.DisjunctionMaxScorer.advanceShallow(org.apache.lucene.core@9.11.1/DisjunctionMaxScorer.java:79)
/tmp/jstack.4.log-	at org.apache.lucene.search.ConjunctionScorer.advanceShallow(org.apache.lucene.core@9.11.1/ConjunctionScorer.java:80)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer.advanceShallow(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:274)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.moveToNextBlock(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:82)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.advanceImpacts(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:106)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.advanceInternal(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:129)
/tmp/jstack.4.log-	at org.apache.lucene.search.ReqOptSumScorer$1.nextDoc(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:112)
/tmp/jstack.4.log-	at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(org.apache.lucene.core@9.11.1/Weight.java:298)
/tmp/jstack.4.log-	at org.apache.lucene.search.Weight$DefaultBulkScorer.score(org.apache.lucene.core@9.11.1/Weight.java:236)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.CancellableBulkScorer.score(org.elasticsearch.server@8.15.0/CancellableBulkScorer.java:45)
/tmp/jstack.4.log-	at org.apache.lucene.search.BulkScorer.score(org.apache.lucene.core@9.11.1/BulkScorer.java:38)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(org.elasticsearch.server@8.15.0/ContextIndexSearcher.java:436)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher.search(org.elasticsearch.server@8.15.0/ContextIndexSearcher.java:365)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$3(org.elasticsearch.server@8.15.0/ContextIndexSearcher.java:350)
/tmp/jstack.4.log-	at org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda/0x00007f5fec574000.call(org.elasticsearch.server@8.15.0/Unknown Source)
/tmp/jstack.4.log-	at org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(org.apache.lucene.core@9.11.1/TaskExecutor.java:117)
/tmp/jstack.4.log-	at org.apache.lucene.search.TaskExecutor$TaskGroup$$Lambda/0x00007f5fec519c80.call(org.apache.lucene.core@9.11.1/Unknown Source)
/tmp/jstack.4.log-	at java.util.concurrent.FutureTask.run(java.base@22.0.1/FutureTask.java:317)

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions