Skip to content

Reduce max clause count and account for per-term memory usage #139662

@jimczi

Description

@jimczi

This issue concerns the maximum number of clauses a Lucene query is allowed to execute.
This limit is not user-configurable and is currently computed based on the available heap, with an attempt to account for the memory cost of a single clause. In practice, this estimation is too permissive and implicitly assumes that a single query can consume most of the heap.

Clause count remains a poor proxy for real memory usage, especially for term and phrase queries. Queries that expand into large numbers of terms can consume significant heap due to per-term, per-segment, and per-thread allocations. In practice, term and phrase queries can require ~3–6 KB of heap per segment iterator per term, which quickly multiplies across segments and concurrent execution. Large term sets should instead be handled via the terms query, which has more predictable memory characteristics.

Todo

  • Reduce the maximum allowed clause count to a more conservative value where appropriate.
  • Track memory estimation for term-based queries (including phrase queries).
  • Reject excessive term expansion early and guide users toward terms queries where appropriate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search Foundations/SearchCatch all for Search FoundationsSupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:Search FoundationsMeta label for the Search Foundations team in Elasticsearch

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions