Elasticsearch Version
8.15
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
When executing a significantly complicated dismax query, its possible that when iterating impacts, the iteration seemingly gets "stuck".
A CPU thread gets take hostage at 100%, and iterates forever. Task cancellation does nothing as the CPU is stuck in a busy loop working without doing any IO.
(stuck means running for hours and hours, requiring a server restart to stop)
It in particular, gets stuck in this loop:
https://github.com/apache/lucene/blob/42d5806fd69400bb42b7d15f6311ac02d3104efe/lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java#L90..L108
private int advanceImpacts(int target) throws IOException {
if (target > upTo) {
moveToNextBlock(target);
}
while (true) {
if (maxScore >= minScore) {
return target;
}
if (upTo == NO_MORE_DOCS) {
return NO_MORE_DOCS;
}
target = upTo + 1;
moveToNextBlock(target);
}
}
Then in moveToNextBlock this executes the ES812ScoreSkipReader impacts check and possibly, this adversely sets the target resulting in a loop.
Steps to Reproduce
@softwaredoug discovered this, I will defer to him.
Logs (if relevant)
/tmp/jstack.4.log:"elasticsearch[eck-elasticsearch-es-default-5][search_worker][T#2]" #78 [163] daemon prio=5 os_prio=0 cpu=771958.29ms elapsed=13461.64s tid=0x00007f5f28013750 nid=163 runnable [0x00007f5e5d1fd000]
/tmp/jstack.4.log- java.lang.Thread.State: RUNNABLE
/tmp/jstack.4.log- at org.apache.lucene.search.DisjunctionScoreBlockBoundaryPropagator.advanceShallow(org.apache.lucene.core@9.11.1/DisjunctionScoreBlockBoundaryPropagator.java:79)
/tmp/jstack.4.log- at org.apache.lucene.search.DisjunctionMaxScorer.advanceShallow(org.apache.lucene.core@9.11.1/DisjunctionMaxScorer.java:79)
/tmp/jstack.4.log- at org.apache.lucene.search.ConjunctionScorer.advanceShallow(org.apache.lucene.core@9.11.1/ConjunctionScorer.java:80)
/tmp/jstack.4.log- at org.apache.lucene.search.ReqOptSumScorer.advanceShallow(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:274)
/tmp/jstack.4.log- at org.apache.lucene.search.ReqOptSumScorer$1.moveToNextBlock(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:82)
/tmp/jstack.4.log- at org.apache.lucene.search.ReqOptSumScorer$1.advanceImpacts(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:106)
/tmp/jstack.4.log- at org.apache.lucene.search.ReqOptSumScorer$1.advanceInternal(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:129)
/tmp/jstack.4.log- at org.apache.lucene.search.ReqOptSumScorer$1.nextDoc(org.apache.lucene.core@9.11.1/ReqOptSumScorer.java:112)
/tmp/jstack.4.log- at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(org.apache.lucene.core@9.11.1/Weight.java:298)
/tmp/jstack.4.log- at org.apache.lucene.search.Weight$DefaultBulkScorer.score(org.apache.lucene.core@9.11.1/Weight.java:236)
/tmp/jstack.4.log- at org.elasticsearch.search.internal.CancellableBulkScorer.score(org.elasticsearch.server@8.15.0/CancellableBulkScorer.java:45)
/tmp/jstack.4.log- at org.apache.lucene.search.BulkScorer.score(org.apache.lucene.core@9.11.1/BulkScorer.java:38)
/tmp/jstack.4.log- at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(org.elasticsearch.server@8.15.0/ContextIndexSearcher.java:436)
/tmp/jstack.4.log- at org.elasticsearch.search.internal.ContextIndexSearcher.search(org.elasticsearch.server@8.15.0/ContextIndexSearcher.java:365)
/tmp/jstack.4.log- at org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$3(org.elasticsearch.server@8.15.0/ContextIndexSearcher.java:350)
/tmp/jstack.4.log- at org.elasticsearch.search.internal.ContextIndexSearcher$$Lambda/0x00007f5fec574000.call(org.elasticsearch.server@8.15.0/Unknown Source)
/tmp/jstack.4.log- at org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(org.apache.lucene.core@9.11.1/TaskExecutor.java:117)
/tmp/jstack.4.log- at org.apache.lucene.search.TaskExecutor$TaskGroup$$Lambda/0x00007f5fec519c80.call(org.apache.lucene.core@9.11.1/Unknown Source)
/tmp/jstack.4.log- at java.util.concurrent.FutureTask.run(java.base@22.0.1/FutureTask.java:317)
Elasticsearch Version
8.15
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
When executing a significantly complicated dismax query, its possible that when iterating impacts, the iteration seemingly gets "stuck".
A CPU thread gets take hostage at 100%, and iterates forever. Task cancellation does nothing as the CPU is stuck in a busy loop working without doing any IO.
(stuck means running for hours and hours, requiring a server restart to stop)
It in particular, gets stuck in this loop:
https://github.com/apache/lucene/blob/42d5806fd69400bb42b7d15f6311ac02d3104efe/lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java#L90..L108
Then in
moveToNextBlockthis executes theES812ScoreSkipReaderimpacts check and possibly, this adversely sets the target resulting in a loop.Steps to Reproduce
@softwaredoug discovered this, I will defer to him.
Logs (if relevant)