Add public benchmark suite with MS MARCO and Wikipedia by tjgreen42 · Pull Request #66 · timescale/pg_textsearch

tjgreen42 · 2025-12-13T05:48:59Z

Summary

Adds benchmark suite with historical tracking and regression detection:

MS MARCO Passage Ranking: 8.8M passages with query latency benchmarks
Wikipedia: Configurable size (10K, 100K, 1M, or full ~6M articles)
Historical tracking: Performance graphs published to GitHub Pages
Regression alerts: PR comments and release gates

Performance Tracking

Event	Dataset	Threshold	Action
PR	Cranfield	150%	Comment on PR
Weekly	MS MARCO (full)	150%	Alert, update baseline
Release	Cranfield	120%	Block release

Dashboard URL (after first run): https://timescale.github.io/pg_textsearch/benchmarks/

Latest Results (full MS MARCO, 8.8M passages)

Metric	Value
Index build	9.1 minutes
Index size	7.7 GB
Short query (1 word)	62 ms
Medium query (3 words)	68 ms
Long query (question)	74 ms
Avg throughput	14 QPS

Version Bump

Includes version bump to 0.1.1-dev to start the next development cycle.

Files Added/Modified

.github/workflows/benchmark.yml - CI workflow with tracking
.github/workflows/release.yml - Added benchmark gate
benchmarks/runner/format_for_action.sh - Metrics JSON converter
benchmarks/datasets/msmarco/ - Download, load, and query scripts
benchmarks/datasets/wikipedia/ - Download, load, and query scripts
CONTRIBUTING.md - Documented benchmark dashboard and alerts
Version bump files (control, SQL, test expected outputs)

Testing

Full MS MARCO benchmark completed in ~13 minutes
All regression tests pass with new version
Verify GitHub Pages setup after merge

One-Time Setup Required

After merging, enable GitHub Pages:

Settings > Pages
Source: Deploy from branch
Branch: gh-pages, folder: / (root)

- MS MARCO Passage Ranking (8.8M passages, 10K queries) - Wikipedia dataset loader (configurable: 10K to full 6M articles) - Benchmark runner script with timing and reporting - GitHub Actions workflow for weekly benchmarks Initial local benchmark on release build: - MS MARCO data load: 76 seconds - BM25 index build (8.8M docs): 4 min 45 sec

…paths

- Add disk cleanup step to free ~25GB before running benchmarks - Add msmarco_size input option (100K, 500K, 1M, full) - Default to 1M passages for weekly runs (vs 8.8M full) - Update download.sh to support subset creation

- Rewrite queries.sql to use EXPLAIN ANALYZE for proper index scan timing - Add queries for different query types: short, medium, long, common, rare - Add throughput benchmark (20 queries in batch) - Add extract_metrics.sh to parse results into JSON format - Add GitHub job summary with key metrics table - Include benchmark_metrics.json in artifacts for historical comparison

- Fix grep patterns to extract index build and load times - Fix RAISE NOTICE format specifiers for throughput output

- Change default msmarco_size from 1M to full (8.8M passages) - Add Benchmarks section to CONTRIBUTING.md with: - On-demand benchmark commands using gh CLI - How to view and download results - Local benchmark instructions - Dataset descriptions

CLAassistant · 2025-12-15T02:04:42Z

All committers have signed the CLA.

Benchmark tracking: - Add github-action-benchmark integration for historical graphs - Publish performance dashboard to GitHub Pages - Alert on PRs when performance regresses >150% - Add benchmark gate to releases (fails on >120% regression) - Add format_for_action.sh to convert metrics JSON Version bump to 0.1.1-dev: - Update pg_textsearch.control default_version - Add sql/pg_textsearch--0.1.1-dev.sql - Add upgrade path sql/pg_textsearch--0.1.0--0.1.1-dev.sql - Update test expected output files - Update README status and CHANGELOG

The previous benchmark disabled index scans (enable_indexscan = off), which meant it only tested sequential scan + scoring performance and would not catch BM25 index regressions. Changes: - Remove enable_indexscan/enable_bitmapscan = off settings - Add EXPLAIN ANALYZE for 5 representative queries to measure index scan latency (short, medium, long, common terms, rare terms) - Add throughput test running all 225 standard Cranfield queries - Add search quality validation (Precision@10 against relevance judgments) - Add index statistics output Local test results (1400 documents, 225 queries): - Query latency: 3.5-6.6ms per query - Throughput: 5.89 ms/query average (225 queries in 1.3s) - Precision@10: 0.60

The gh-pages branch doesn't exist yet, so PR benchmarks and release gates would fail trying to fetch it. Add skip-fetch-gh-pages: true since these jobs don't save data anyway - they only compare/alert.

The gh-pages branch doesn't exist yet and the benchmark action fails trying to switch to it. Switch to using GitHub Actions cache to store benchmark history: - PR benchmarks: restore from cache, compare, don't save (PRs don't update baseline) - Full benchmarks (weekly/manual): restore from cache, compare, save new baseline to cache - Release gate: restore from cache, compare with strict threshold This approach: 1. Doesn't require gh-pages branch setup 2. Keeps benchmark history in Actions cache (persists across runs) 3. Still enables regression detection and PR comments

tjgreen42 added 8 commits December 12, 2025 21:48

Fix benchmark workflow: reduce shared_buffers, add error logging

7a636e0

Fix benchmark workflow: use custom socket directory

905ec73

Fix benchmark workflow: run psql from correct directory for relative …

142079e

…paths

Fix full-benchmark job: add socket directory, use DATA_DIR env var

c18093d

Fix MS MARCO benchmark disk space issue on GitHub Actions

d45a6a6

- Add disk cleanup step to free ~25GB before running benchmarks - Add msmarco_size input option (100K, 500K, 1M, full) - Default to 1M passages for weekly runs (vs 8.8M full) - Update download.sh to support subset creation

Fix metric extraction and throughput formatting

7c35506

- Fix grep patterns to extract index build and load times - Fix RAISE NOTICE format specifiers for throughput output

tjgreen42 marked this pull request as draft December 15, 2025 01:46

tjgreen42 added 4 commits December 14, 2025 21:01

Fix benchmark action: skip gh-pages fetch when branch doesn't exist

1146f14

The gh-pages branch doesn't exist yet, so PR benchmarks and release gates would fail trying to fetch it. Add skip-fetch-gh-pages: true since these jobs don't save data anyway - they only compare/alert.

tjgreen42 marked this pull request as ready for review December 15, 2025 20:30

tjgreen42 merged commit 4322267 into main Dec 15, 2025
13 checks passed

tjgreen42 deleted the add-benchmark-suite branch December 15, 2025 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add public benchmark suite with MS MARCO and Wikipedia#66

Add public benchmark suite with MS MARCO and Wikipedia#66
tjgreen42 merged 13 commits intomainfrom
add-benchmark-suite

tjgreen42 commented Dec 13, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Dec 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tjgreen42 commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Tracking

Latest Results (full MS MARCO, 8.8M passages)

Version Bump

Files Added/Modified

Testing

One-Time Setup Required

Uh oh!

CLAassistant commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tjgreen42 commented Dec 13, 2025 •

edited

Loading

CLAassistant commented Dec 15, 2025 •

edited

Loading