Optimize `count()` for BooleanQuery disjunction

### Description

Context: we (Amazon customer facing product search team, and also AWS) are attempting to understand the amazing performance Tantivy (Rust search engine) has over Lucene, iterating in [this GitHub repo](https://github.com/Tony-X/search-benchmark-game).  That repo is sort of a merger of Lucene's benchmarking code ([luceneutil](https://github.com/mikemccand/luceneutil)), including its tasks and `enwiki` corpus, and the [open source Tantivy benchmark](https://github.com/quickwit-oss/search-benchmark-game).  Tantivy is impressively fast :)

This issue is a spinoff from [this fascinating comment](https://github.com/Tony-X/search-benchmark-game/issues/30#issuecomment-1579761787) by @fulmicoton, creator and maintainer of [Tantivy](https://github.com/quickwit-oss/tantivy).

Tantivy optimizes `count()` for `BooleanQuery` disjunctions much like Lucene's `BooleanScorer`, by scoring in a windowed bitset of N docs at once, and then pop-counting the set bits in each window.  This is not technically a sub-linear implementation: it is still linear, but I suspect with a smaller constant factor than the default `count()` fallback Lucene implements.

Perhaps, for all cases where `BooleanQuery` uses the windowed `BooleanScorer`, we could also implement this `count()` optimization.

From my read of Lucene's `BooleanWeight.count`, I don't think Lucene has this optimization?  Maybe we should port over Tantivy's optimization?  It should make disjunctive counting quite a bit faster?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `count()` for BooleanQuery disjunction #12358

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize count() for BooleanQuery disjunction #12358

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimize `count()` for BooleanQuery disjunction #12358