perf: stack-allocate decode buffers in tp_decompress_block#253
Merged
perf: stack-allocate decode buffers in tp_decompress_block#253
Conversation
Replace palloc/pfree of two temporary uint32 arrays (doc_deltas, frequencies) with fixed-size stack arrays of TP_BLOCK_SIZE (128) elements. These 512-byte arrays (1 KiB total) are allocated on every block decompression call, and since TP_BLOCK_SIZE is a compile-time constant, they are safe VLA-free stack allocations. Eliminates allocator overhead on the hot path where profiling shows tp_decompress_block at 6.5% of CPU.
Collaborator
Author
Benchmark Results — MS-MARCO v2 (138M passages, 691 queries, LIMIT 10)Back-to-back runs on the same machine (16 cores, 123 GB RAM, PG17), identical query set. Per-Bucket Latency (p50 ms)
Summary
Modest end-to-end improvement as expected — |
tjgreen42
added a commit
that referenced
this pull request
Mar 3, 2026
## Summary - Update comparison page with results from benchmark run [22642807624](https://github.com/timescale/pg_textsearch/actions/runs/22642807624) - Overall throughput improved from 2.8x to 3.2x faster than System X - Build time gap narrowed from 2.0x to 1.6x (270s → 234s) - Key improvements since Feb 9: SIMD bitpack decoding (#250), stack-allocated decode buffers (#253), BMW term state pointer indirection (#249), arena allocator rewrite (#231), leader-only merge (#244) ## Testing - Numbers extracted from benchmark run on commit 1b09cc9 - gh-pages branch also needs updating (will push after merge)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
palloc/pfreeof two temporaryuint32arrays (doc_deltas,frequencies) intp_decompress_blockwith fixed-size stack arrays ofTP_BLOCK_SIZE(128) elementsTP_BLOCK_SIZEis a#defineconstant, so these are NOT VLAsTest plan
make clean && makecompiles with zero new warningsmake installcheck)make format-checkpasses