Add benchmark for insert_string.#1956
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a new benchmark source Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant GB as Google Benchmark
participant F as Benchmark Function (e.g., integer_hash)
participant B as insert_string_bench::Bench
participant S as deflate_state
participant IF as insert_func (e.g., ::insert_string / _roll)
GB->>F: Invoke with State(args: start_pos, count)
F->>B: Bench(State, insert_func)
Note over B,S: Per-iteration: reset hash tables and rolling state
loop Each benchmark iteration
B->>S: Reset head/prev/hash state
B->>IF: insert_func(S, start_pos, count)
IF-->>B: result (ignored/validated)
end
B-->>F: Complete
F-->>GB: Report timings (ns)
sequenceDiagram
autonumber
participant GB as Google Benchmark
participant FQ as quick_integer_hash / quick_rolling_hash
participant QB as quick_insert_string_bench::Bench
participant S as deflate_state
participant QF as quick_insert_func (quick_*)
GB->>FQ: Invoke with State(args: start_pos, iters)
FQ->>QB: Bench(State, quick_insert_func)
loop Each benchmark iteration
QB->>S: Reset head/prev/hash state
loop For N insertions
QB->>QF: quick_insert_func(S, pos++)
QF-->>QB: hash/prev updates
end
end
QB-->>FQ: Complete
FQ-->>GB: Report timings (ns)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1956 +/- ##
===========================================
- Coverage 81.91% 0 -81.92%
===========================================
Files 162 0 -162
Lines 14013 0 -14013
Branches 3137 0 -3137
===========================================
- Hits 11479 0 -11479
+ Misses 1568 0 -1568
+ Partials 966 0 -966 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
caddb8f to
4f762c7
Compare
| ->Args({100, 256}) /* Insert 256 consecutive positions */ \ | ||
| ->Args({100, 4096}) /* Insert 4KB consecutive positions */ \ | ||
| ->Args({100, 16384}) /* Insert 16KB consecutive positions (half window) */ \ | ||
| ->Args({100, 32765}) /* Insert nearly full window */ \ |
There was a problem hiding this comment.
note: I actually don't think it is ever used for such big inserts. I stuck a printf in there, and it printed mostly 3 and 7, some 8 and sprinkled with anything from 4 to ~50-70 or something like that.
Probably data dependent, but might be worth looking into to make sure the benchmark doesn't tell us an optimization is awesome for sizes that never happen. I too assumed fill_window would do these massive inserts.
There was a problem hiding this comment.
Yeah I had some difficulty determining what sizes to put here and assumed that it must be doing inserts for a full window at some point
There was a problem hiding this comment.
I have updated the values using distribution of count from silesia.tar and silesia-small.tar:
https://gist.github.com/nmoinvaz/f6de3ca6301a43203247f4e3078cea24
Majority of calls happen with count = 3.
There was a problem hiding this comment.
That is great 👍
I do wonder whether it would be worth tracing where those short and long ones come from and possibly split it into two functions. One optimized for <10 and one for >14 or whatever values make sense.
Were there none below 3 btw? If that is a rule, then we could perhaps add a comment and assert to that effect and allow future optimizations to assume as much.
There was a problem hiding this comment.
Yeah there was nothing below 3, at least for deflate medium.
6a7ad23 to
7731a14
Compare
|
Ahh, great idea. These improvements are a bit harder to measure even in the compression benchmark. |
There was a problem hiding this comment.
Pull Request Overview
This PR adds comprehensive benchmarking for insert_string and related functions in the zlib library. The benchmark tests different hash implementations (integer and rolling) and measures their performance under various workload scenarios.
- Adds a new benchmark file
benchmark_insert_string.ccwith test fixtures for both bulk and single insert operations - Integrates the new benchmark into the CMake build system
- Tests multiple scenarios from small insertions to large batches with different starting positions
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| test/benchmarks/benchmark_insert_string.cc | New benchmark implementation testing insert_string variants with different hash algorithms and workload patterns |
| test/benchmarks/CMakeLists.txt | Adds the new benchmark file to the build configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
It'd be nice if you applied these changes: ->Args({100, 255}) /* Near maximum observed values */ \
- ->Unit(benchmark::kMicrosecond);
+ ->Unit(benchmark::kNanosecond);
->Args({16000, 100}) /* 100 insertions at mid window (different hash distribution) */ \
- ->Unit(benchmark::kMicrosecond);
+ ->Unit(benchmark::kNanosecond);
Since each iteration takes around 300-600ns, this seems more appropriate :) |
7731a14 to
e80da3b
Compare
e80da3b to
74d644e
Compare
|
Done and rebased. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/benchmarks/benchmark_insert_string.cc (1)
114-117: Consider replacing the hardcoded support_flag “1” with a named constantUsing a named constant (e.g., ALWAYS_SUPPORTED) or a CPU-feature predicate improves clarity at the callsite.
Example:
// constexpr bool ALWAYS_SUPPORTED = true; BENCHMARK_INSERT_STRING(integer_hash, ::insert_string, ALWAYS_SUPPORTED); BENCHMARK_INSERT_STRING(rolling_hash, ::insert_string_roll, ALWAYS_SUPPORTED); BENCHMARK_QUICK_INSERT_STRING(quick_integer_hash, ::quick_insert_string, ALWAYS_SUPPORTED); BENCHMARK_QUICK_INSERT_STRING(quick_rolling_hash, ::quick_insert_string_roll, ALWAYS_SUPPORTED);As discussed in the thread, introducing a common define across benchmarks could be a follow-up PR.
Also applies to: 163-164
🧹 Nitpick comments (4)
test/benchmarks/benchmark_insert_string.cc (4)
28-57: Defensive allocation checks in SetUp and null-safe TearDownBenchmarks should fail fast and clearly on OOM instead of segfaulting. Currently, SetUp() does not check allocations (including the deflate_state itself), and TearDown() dereferences s unconditionally.
Suggested changes within the shown ranges:
void SetUp(const ::benchmark::State& state) { - s = (deflate_state*)zng_alloc(sizeof(deflate_state)); - memset(s, 0, sizeof(deflate_state)); + s = (deflate_state*)zng_alloc(sizeof(deflate_state)); + if (!s) { + state.SkipWithError("Failed to allocate deflate_state"); + return; + } + memset(s, 0, sizeof(deflate_state)); @@ - s->window = (uint8_t*)zng_alloc(TEST_WINDOW_SIZE); + s->window = (uint8_t*)zng_alloc(TEST_WINDOW_SIZE); @@ - s->head = (Pos*)zng_alloc(HASH_SIZE * sizeof(Pos)); - s->prev = (Pos*)zng_alloc(MAX_WSIZE * sizeof(Pos)); + s->head = (Pos*)zng_alloc(HASH_SIZE * sizeof(Pos)); + s->prev = (Pos*)zng_alloc(MAX_WSIZE * sizeof(Pos)); + if (!s->window || !s->head || !s->prev) { + state.SkipWithError("Failed to allocate window/hash tables"); + return; + } @@ - for (size_t i = 0; i < TEST_WINDOW_SIZE; i++) { + for (size_t i = 0; i < TEST_WINDOW_SIZE; i++) { // Create patterns that will exercise the hash function well s->window[i] = (uint8_t)((i * 17 + (i >> 4) * 31 + (i >> 8) * 13) & 0xFF); } } @@ - void TearDown(const ::benchmark::State& state) { - zng_free(s->window); - zng_free(s->head); - zng_free(s->prev); - zng_free(s); + void TearDown(const ::benchmark::State& state) { + if (!s) return; + zng_free(s->window); + zng_free(s->head); + zng_free(s->prev); + zng_free(s); + s = nullptr; }
73-77: Replace magic “-4” with hash read margin derived from MIN_MATCH/HASH_CALC_OFFSETThe bounds check uses a literal 4. Using protocol-aware constants prevents off-by-one errors and keeps the benchmark robust if hashing macros change.
Apply this diff locally to the two checks:
- if (str_pos + count >= TEST_WINDOW_SIZE - 4) { + if (str_pos + count > TEST_WINDOW_SIZE - kHashReadMargin) { state.SkipWithError("Parameters exceed window size"); return; }- if (start_pos + count >= TEST_WINDOW_SIZE - 4) { + if (start_pos + count > TEST_WINDOW_SIZE - kHashReadMargin) { state.SkipWithError("Parameters exceed window size"); return; }And add the following near the top (after includes/defines) to define the margin once:
// Number of bytes required by hash read at position `str`. // Prefer MIN_MATCH (3) plus any architecture-specific offset the hashing macros use. static constexpr size_t kHashReadMargin = MIN_MATCH + HASH_CALC_OFFSET;If HASH_CALC_OFFSET is not visible in this TU on some platforms, fall back to MIN_MATCH or wire this value in via a small helper header shared with insert_string_tpl.h.
Also applies to: 126-130
89-91: Prevent the compiler from eliding work; use DoNotOptimize/ClobberMemoryinsert_func writes to s->head/s->prev, but capturing the return value (Pos) and hinting the optimizer makes the intent explicit and future-proof.
- // Benchmark the insert_string function - insert_func(s, str_pos, count); + // Benchmark the insert_string function + Pos last = insert_func(s, str_pos, count); + benchmark::DoNotOptimize(last); + benchmark::ClobberMemory();
102-111: Name the arguments in the benchmark for readabilityArgNames helps interpret the output without guessing what range(0) and range(1) represent.
- BENCHMARK_REGISTER_F(insert_string_bench, name) \ + BENCHMARK_REGISTER_F(insert_string_bench, name) \ + ->ArgNames({"start_pos", "count"}) \ ->Args({100, 3}) /* Most common case */ \ ->Args({100, 4}) \ ->Args({100, 5}) \ ->Args({100, 7}) \ ->Args({100, 14}) /* Mid-range cluster */ \ ->Args({100, 32}) /* Transition point */ \ ->Args({100, 127}) /* Large cluster around powers of 2 */ \ ->Args({100, 255}) /* Near maximum observed values */ \ ->Unit(benchmark::kNanosecond);- BENCHMARK_REGISTER_F(quick_insert_string_bench, name) \ + BENCHMARK_REGISTER_F(quick_insert_string_bench, name) \ + ->ArgNames({"start_pos", "count"}) \ ->Args({100, 1}) /* Single insertion (baseline) */ \ ->Args({100, 100}) /* 100 insertions (measure amortized cost) */ \ ->Args({16000, 100}) /* 100 insertions at mid window (different hash distribution) */ \ ->Unit(benchmark::kNanosecond);Also applies to: 157-161
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
test/benchmarks/CMakeLists.txt(1 hunks)test/benchmarks/benchmark_insert_string.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test/benchmarks/benchmark_insert_string.cc (2)
zutil_p.h (1)
zng_alloc(12-14)insert_string_tpl.h (1)
Pos(59-76)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (119)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows GCC Compat No Opt
- GitHub Check: Windows ClangCl Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC Symbol Prefix
- GitHub Check: Ubuntu Clang No Threads
- GitHub Check: Ubuntu Clang
🔇 Additional comments (1)
test/benchmarks/CMakeLists.txt (1)
47-47: Hook-up looks correctbenchmark_insert_string.cc is correctly added to benchmark_zlib. No further CMake changes needed.
Summary by CodeRabbit
Tests
Chores