Skip to content

Add storage and query optimization roadmap#79

Merged
tjgreen42 merged 2 commits intomainfrom
optimization-roadmap
Dec 17, 2025
Merged

Add storage and query optimization roadmap#79
tjgreen42 merged 2 commits intomainfrom
optimization-roadmap

Conversation

@tjgreen42
Copy link
Copy Markdown
Collaborator

@tjgreen42 tjgreen42 commented Dec 17, 2025

Summary

  • Adds design doc (OPTIMIZATION_ROADMAP.md) covering planned optimizations:

    • Block-Max WAND algorithm for O(k log n) top-k queries (vs current O(n))
    • Block-aligned posting storage with skip lists
    • FOR/PFOR compression targeting 50%+ space reduction
    • Fieldnorm quantization using Lucene's SmallFloat scheme
    • Phased implementation plan (v0.0.4 through v0.0.7)
  • Updates README to clarify "not yet recommended" production status

The roadmap draws from analysis of Tantivy and Lucene implementations, prioritizing asymptotic gains (BMW) over constant-factor gains (compression).

@tjgreen42 tjgreen42 force-pushed the optimization-roadmap branch 23 times, most recently from cd5cf26 to 99ec41c Compare December 17, 2025 06:04
Design doc covering:
- Block-Max WAND for O(k log n) top-k queries
- Block-aligned posting storage with skip lists
- FOR/PFOR compression for 50%+ space reduction
- Fieldnorm quantization (Lucene SmallFloat scheme)
- Phased implementation plan (v0.0.4 through v0.0.7)

Also updates README to clarify "not yet recommended" status.
@tjgreen42 tjgreen42 force-pushed the optimization-roadmap branch from 99ec41c to 29d056c Compare December 17, 2025 06:10
- Add multi-segment query execution section (per-segment BMW, merge results)
- Add WAND vs MAXSCORE comparison (Lucene switched to MAXSCORE in 2022)
- Clarify block structure: skip index entries separate from posting data
- Add IDF computation note (sum df across segments)
- Add DELETE handling section (punt to Postgres visibility for now)
- Simplify migration: require REINDEX, no backward compatibility
- Document buffer manager overhead benchmark results (validated on-disk hash)
- Update references with Elastic and ECIR 2020 papers
@tjgreen42 tjgreen42 force-pushed the optimization-roadmap branch from f44d1ea to 3600888 Compare December 17, 2025 20:05
@tjgreen42 tjgreen42 merged commit 0ca3751 into main Dec 17, 2025
1 check passed
@tjgreen42 tjgreen42 deleted the optimization-roadmap branch December 17, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant