Queries using many Lucene automatons can quickly run OOM, as each automaton takes substantial amount of memory. We had a user with a bool query that had 2000 wildcard subqueries, each of which was matching on a long string, resulting in a 6GB+ in-memory representation of all automata. There are other variations of this problem (e.g. bool query with many subclauses that are doing fuzziness, or regexp matching).
We would ideally have a way to approximate memory usage up-front here and fail the query early before letting the node run into OOMs.
Queries using many Lucene automatons can quickly run OOM, as each automaton takes substantial amount of memory. We had a user with a bool query that had 2000 wildcard subqueries, each of which was matching on a long string, resulting in a 6GB+ in-memory representation of all automata. There are other variations of this problem (e.g. bool query with many subclauses that are doing fuzziness, or regexp matching).
We would ideally have a way to approximate memory usage up-front here and fail the query early before letting the node run into OOMs.