Skip to content

Look into integrating IndexSortDocValuesRangeQuery. #48665

@jtibshirani

Description

@jtibshirani

We recently added the range query IndexSortDocValuesRangeQuery to Lucene, which takes advantage of index sorting to run only on doc values (https://issues.apache.org/jira/browse/LUCENE-7714). When the following conditions hold, the query is able to run efficiently using only doc values, otherwise it must fall back to a delegate range query:

  • The index is sorted, and its primary sort is on the same field as the query.
  • The query field has SortedNumericDocValues.
  • The segment has at most one field value per document.

We should investigate whether it could be helpful to integrate this query into Elasticsearch. In particular, it may allow us to avoid indexing the field used for the primary index sort, while still supporting range queries. There are a few open questions around the integration:

  • If the field is not indexed, then there is no efficient range query to fall back to. Could we somehow ensure that the documents contain only one field value, to make sure we can run efficiently on doc values?
  • The query has only been benchmarked against dense doc values, and may not show good performance on sparse doc values. Is there anything we can do to improve the sparse case, or at least guard against it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categories>enhancementTeam:SearchMeta label for search team

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions