Skip to content

Add public benchmark suite with MS MARCO and Wikipedia#66

Merged
tjgreen42 merged 13 commits intomainfrom
add-benchmark-suite
Dec 15, 2025
Merged

Add public benchmark suite with MS MARCO and Wikipedia#66
tjgreen42 merged 13 commits intomainfrom
add-benchmark-suite

Conversation

@tjgreen42
Copy link
Copy Markdown
Collaborator

@tjgreen42 tjgreen42 commented Dec 13, 2025

Summary

Adds benchmark suite with historical tracking and regression detection:

  • MS MARCO Passage Ranking: 8.8M passages with query latency benchmarks
  • Wikipedia: Configurable size (10K, 100K, 1M, or full ~6M articles)
  • Historical tracking: Performance graphs published to GitHub Pages
  • Regression alerts: PR comments and release gates

Performance Tracking

Event Dataset Threshold Action
PR Cranfield 150% Comment on PR
Weekly MS MARCO (full) 150% Alert, update baseline
Release Cranfield 120% Block release

Dashboard URL (after first run): https://timescale.github.io/pg_textsearch/benchmarks/

Latest Results (full MS MARCO, 8.8M passages)

Metric Value
Index build 9.1 minutes
Index size 7.7 GB
Short query (1 word) 62 ms
Medium query (3 words) 68 ms
Long query (question) 74 ms
Avg throughput 14 QPS

Version Bump

Includes version bump to 0.1.1-dev to start the next development cycle.

Files Added/Modified

  • .github/workflows/benchmark.yml - CI workflow with tracking
  • .github/workflows/release.yml - Added benchmark gate
  • benchmarks/runner/format_for_action.sh - Metrics JSON converter
  • benchmarks/datasets/msmarco/ - Download, load, and query scripts
  • benchmarks/datasets/wikipedia/ - Download, load, and query scripts
  • CONTRIBUTING.md - Documented benchmark dashboard and alerts
  • Version bump files (control, SQL, test expected outputs)

Testing

  • Full MS MARCO benchmark completed in ~13 minutes
  • All regression tests pass with new version
  • Verify GitHub Pages setup after merge

One-Time Setup Required

After merging, enable GitHub Pages:

  1. Settings > Pages
  2. Source: Deploy from branch
  3. Branch: gh-pages, folder: / (root)

- MS MARCO Passage Ranking (8.8M passages, 10K queries)
- Wikipedia dataset loader (configurable: 10K to full 6M articles)
- Benchmark runner script with timing and reporting
- GitHub Actions workflow for weekly benchmarks

Initial local benchmark on release build:
- MS MARCO data load: 76 seconds
- BM25 index build (8.8M docs): 4 min 45 sec
- Add disk cleanup step to free ~25GB before running benchmarks
- Add msmarco_size input option (100K, 500K, 1M, full)
- Default to 1M passages for weekly runs (vs 8.8M full)
- Update download.sh to support subset creation
- Rewrite queries.sql to use EXPLAIN ANALYZE for proper index scan timing
- Add queries for different query types: short, medium, long, common, rare
- Add throughput benchmark (20 queries in batch)
- Add extract_metrics.sh to parse results into JSON format
- Add GitHub job summary with key metrics table
- Include benchmark_metrics.json in artifacts for historical comparison
- Fix grep patterns to extract index build and load times
- Fix RAISE NOTICE format specifiers for throughput output
@tjgreen42 tjgreen42 marked this pull request as draft December 15, 2025 01:46
- Change default msmarco_size from 1M to full (8.8M passages)
- Add Benchmarks section to CONTRIBUTING.md with:
  - On-demand benchmark commands using gh CLI
  - How to view and download results
  - Local benchmark instructions
  - Dataset descriptions
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 15, 2025

CLA assistant check
All committers have signed the CLA.

Benchmark tracking:
- Add github-action-benchmark integration for historical graphs
- Publish performance dashboard to GitHub Pages
- Alert on PRs when performance regresses >150%
- Add benchmark gate to releases (fails on >120% regression)
- Add format_for_action.sh to convert metrics JSON

Version bump to 0.1.1-dev:
- Update pg_textsearch.control default_version
- Add sql/pg_textsearch--0.1.1-dev.sql
- Add upgrade path sql/pg_textsearch--0.1.0--0.1.1-dev.sql
- Update test expected output files
- Update README status and CHANGELOG
The previous benchmark disabled index scans (enable_indexscan = off),
which meant it only tested sequential scan + scoring performance and
would not catch BM25 index regressions.

Changes:
- Remove enable_indexscan/enable_bitmapscan = off settings
- Add EXPLAIN ANALYZE for 5 representative queries to measure index
  scan latency (short, medium, long, common terms, rare terms)
- Add throughput test running all 225 standard Cranfield queries
- Add search quality validation (Precision@10 against relevance judgments)
- Add index statistics output

Local test results (1400 documents, 225 queries):
- Query latency: 3.5-6.6ms per query
- Throughput: 5.89 ms/query average (225 queries in 1.3s)
- Precision@10: 0.60
The gh-pages branch doesn't exist yet, so PR benchmarks and release
gates would fail trying to fetch it. Add skip-fetch-gh-pages: true
since these jobs don't save data anyway - they only compare/alert.
The gh-pages branch doesn't exist yet and the benchmark action fails
trying to switch to it. Switch to using GitHub Actions cache to store
benchmark history:

- PR benchmarks: restore from cache, compare, don't save (PRs don't
  update baseline)
- Full benchmarks (weekly/manual): restore from cache, compare, save
  new baseline to cache
- Release gate: restore from cache, compare with strict threshold

This approach:
1. Doesn't require gh-pages branch setup
2. Keeps benchmark history in Actions cache (persists across runs)
3. Still enables regression detection and PR comments
@tjgreen42 tjgreen42 marked this pull request as ready for review December 15, 2025 20:30
@tjgreen42 tjgreen42 merged commit 4322267 into main Dec 15, 2025
13 checks passed
@tjgreen42 tjgreen42 deleted the add-benchmark-suite branch December 15, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants