Skip to content

bench: add insert benchmarks; fix insert performance regression#242

Merged
tjgreen42 merged 14 commits intomainfrom
insert-benchmarks
Mar 2, 2026
Merged

bench: add insert benchmarks; fix insert performance regression#242
tjgreen42 merged 14 commits intomainfrom
insert-benchmarks

Conversation

@tjgreen42
Copy link
Copy Markdown
Collaborator

@tjgreen42 tjgreen42 commented Feb 26, 2026

Summary

  • Add benchmark variants that measure insert-into-indexed-table performance,
    complementing the existing CREATE INDEX benchmarks
  • Remove dead tp_calculate_idf_sum() function that was called per-insert
    and scanning the entire term dictionary — caused inserts to be O(docs * terms)
    and MS MARCO insert to time out at 6+ hours
  • Compare pg_textsearch, System X (ParadeDB), and GIN+tsvector across three
    load patterns:
    • CREATE INDEX on populated table (existing)
    • Single-transaction insert into indexed table (new)
    • Concurrent pgbench insert into indexed table (new)
  • All three datasets (Cranfield, MS MARCO, Wikipedia) covered

Dead code removed

Field / Function Location Why
tp_calculate_idf_sum() build.c Called per-insert, O(docs * terms) — was 94% of insert time
idf_sum TpSharedIndexState Written by above function, never read
total_terms TpMemtable Written by above function, never read

total_terms is retained in the on-disk metapage struct for layout compatibility.

New workflow jobs

Job Engine What it measures
insert-benchmark pg_textsearch Single-txn + concurrent inserts, queries, validation
baseline-benchmark GIN+tsvector CREATE INDEX + insert variants, load metrics only
system-x-benchmark (extended) System X Added insert + concurrent variants

Benchmark results (MS MARCO, 8.8M docs)

Approach pg_textsearch System X GIN
CREATE INDEX 3m 45s 2m 19s 2m 20s
Insert (single session) 9m 56s 4m 37s 11m 2s
Insert (concurrent) 37m 47s 21m 0s 16m 30s

Testing

  • Benchmark run 22510679209 completed successfully (all datasets, all jobs)
  • CI green: PG17, PG18, sanitizers, formatting all pass

Add INSERT_TIME and CONCURRENT_INSERT_TIME marker extraction to
extract_metrics.sh. Add corresponding fields to format_for_action.sh
for dashboard publishing.
Add two new jobs and extend system-x-benchmark:

- insert-benchmark: pg_textsearch single-txn inserts and concurrent
  pgbench inserts across all three datasets, with validation and
  dashboard publishing for 6 metric prefixes

- baseline-benchmark: GIN+tsvector using built-in Postgres (no
  extensions), covering CREATE INDEX on populated table, single-txn
  insert, and concurrent insert for all datasets with 9 metric
  prefixes

- system-x-benchmark: extended with insert-load and concurrent
  pgbench insert steps for all datasets, plus format/publish steps
  for the new insert and concurrent metric variants
Three bugs fixed:
- Section headers for concurrent inserts lacked "Benchmark" keyword,
  making them invisible to the awk section filter in extract_metrics.sh
- Single-dataset runs used lowercase dataset slug in section matching
  (e.g. "cranfield Insert" vs "Cranfield Insert") causing case-sensitive
  mismatch
- Cranfield insert timing captured only one INSERT's time (~15ms) instead
  of total; now uses clock_timestamp() bookends around all 1400 inserts
- Added bm25_spill_index() before index size measurement in insert
  benchmarks so pg_relation_size() reflects actual data (not just the
  memtable metapage)
tp_calculate_idf_sum() was called on every tp_insert(), scanning the
entire term hash table to recompute a sum that was never read. Profiling
showed this consumed 94% of insert time: with 100K+ terms in the
memtable, each row insert triggered a full sequential scan of the
dshash, making insert performance O(docs * terms).

The idf_sum field in TpSharedIndexState and total_terms in TpMemtable
were dead code — written but never read by any query or scoring path.
Remove the function, both fields, and all call sites (insert path,
build finalization, recovery).

The on-disk metapage total_terms field is left in place to preserve
the page layout for existing indexes.
@tjgreen42 tjgreen42 changed the title bench: add insert-based benchmark variants bench: add insert benchmarks; fix insert performance regression Mar 1, 2026
- Mark total_terms in metapage as "unused, retained for on-disk compat"
- Increase INSERT_TIME grep window from -A 5 to -A 10 for robustness
@tjgreen42 tjgreen42 marked this pull request as ready for review March 2, 2026 22:39
@tjgreen42 tjgreen42 merged commit 1f411fd into main Mar 2, 2026
15 checks passed
@tjgreen42 tjgreen42 deleted the insert-benchmarks branch March 2, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants