Conversation
WalkthroughAdds a functable include and inlines I/O helpers ( Changes
Sequence Diagram(s)sequenceDiagram
participant App
participant Deflate as deflate()
participant Read as read_buf
participant FT as functable
participant Flush as flush_pending_inline
App->>Deflate: deflate(strm, flush)
Deflate->>Read: read_buf(strm, buf, size)
alt wrap == gzip
Read->>FT: FUNCTABLE_CALL(crc32_fold_copy,...)
FT-->>Read: bytes copied
else wrap == zlib
Read->>FT: FUNCTABLE_CALL(adler32_fold_copy,...)
FT-->>Read: bytes copied
else
Read-->>Deflate: memcpy (bytes copied)
end
Deflate->>Flush: flush_pending_inline(strm)
Flush-->>Deflate: pending updated / bytes flushed
Deflate-->>App: return status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
🔇 Additional comments (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
deflate_stored.c (2)
97-101: Use the returned byte count from read_buf to update out pointersread_buf returns how many bytes were actually copied. While len is expected to be available here, using the return value makes the code robust and self-documenting.
Apply this diff:
- read_buf(s->strm, s->strm->next_out, len); - s->strm->next_out += len; - s->strm->avail_out -= len; - s->strm->total_out += len; + unsigned got = read_buf(s->strm, s->strm->next_out, len); + s->strm->next_out += got; + s->strm->avail_out -= got; + s->strm->total_out += got;
160-163: Likewise, prefer read_buf’s return value when updating window/write indicesAs above, using the actual number of bytes read avoids any future fragility if the invariants change.
Apply this diff:
- read_buf(s->strm, s->window + s->strstart, have); - s->strstart += have; - s->insert += MIN(have, s->w_size - s->insert); + unsigned got = read_buf(s->strm, s->window + s->strstart, have); + s->strstart += got; + s->insert += MIN(got, s->w_size - s->insert);deflate.c (2)
781-792: Direct use of flush_pending_inline in hot path is fine; consider consistencyUsing the inline in the tight loop removes call overhead. Elsewhere in this function we still call PREFIX(flush_pending). It’s okay to mix, but consider standardizing call sites for consistency (either all inline in hot paths, wrapper elsewhere), to reduce reader surprise.
835-841: Minor: consider using flush_pending_inline here tooNot performance-critical, but for symmetry with other header/trailer flushes you could call flush_pending_inline here as well.
Apply this diff if you prefer consistency:
- PREFIX(flush_pending)(strm); + flush_pending_inline(strm);deflate_p.h (1)
133-156: read_buf inline mirrors previous semantics; checksum handling looks solid
- Returns 0 early if no input; proper min(size, avail_in).
- Handles checksum updates:
- wrap == 2: crc32_fold_copy (guarded by GZIP)
- wrap == 1: adler32_fold_copy
- else/no checksum: memcpy
- Updates avail_in, next_in, total_in.
One tiny nit: if you expect any future call sites to rely on overlapping next_in/target buffers (unlikely for deflate), memmove would be safer than memcpy in the no-checksum branches. Current API assumes non-overlapping buffers, so leaving memcpy is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deflate.c(6 hunks)deflate.h(1 hunks)deflate_p.h(2 hunks)deflate_stored.c(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-02-18T02:53:23.621Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Applied to files:
deflate.c
🧬 Code Graph Analysis (3)
deflate_stored.c (1)
deflate_p.h (1)
read_buf(133-155)
deflate.h (2)
arch/s390/dfltcc_common.h (2)
dfltcc_state(75-75)dfltcc_state(83-83)arch/s390/dfltcc_inflate.c (7)
inflate_state(144-144)inflate_state(115-115)inflate_state(24-24)inflate_state(183-183)inflate_state(38-38)inflate_state(169-169)inflate_state(53-53)
deflate.c (1)
deflate_p.h (2)
flush_pending_inline(107-125)read_buf(133-155)
🔇 Additional comments (12)
deflate.c (8)
51-51: Including functable.h here is correct and necessaryThis ensures FUNCTABLE_INIT and FUNCTABLE_CALL are available before use in deflateInit2 and other paths. Inclusion order with deflate.h/deflate_p.h is sane.
743-749: Wrapper now delegates to flush_pending_inline — good refactorKeeping the exported/internal PREFIX(flush_pending) while inlining the core logic centralizes behavior and preserves ABI/visibility.
966-972: LGTM: inline flush after gzip header statesUsing flush_pending_inline here keeps the header emission path lean.
1043-1044: LGTM: inline flush for trailer emissionRight place to ensure trailer bytes are pushed without extra call overhead.
1266-1268: Switch to inlined read_buf in fill_window is correctSemantics preserved: read_buf updates avail_in/next_in/total_in and returns bytes read, which are accumulated into lookahead via n. No behavioral change except reduced call overhead.
430-436: Dictionary path: wrap suppression remains compatible with new read_bufSetting s->wrap = 0 to suppress checksum updates during dictionary insertion still works with the new read_buf logic that branches on wrap and DEFLATE_NEED_CHECKSUM.
980-985: Hook invocation preserved; no regression risk spottedThe DEFLATE_HOOK macro is retained and evaluated before selecting the strategy function, matching previous behavior while decoupling arch hooks to deflate.h.
249-251: FUNCTABLE_INIT placement is correctFunction table is initialized early in deflateInit2, before checksum folding calls; good.
deflate_p.h (3)
12-13: Including functable.h in this header is appropriate nowGiven read_buf and flush_pending_inline depend on FUNCTABLE_CALL, pulling functable.h here prevents accidental missing includes at call sites.
101-126: flush_pending_inline is correct and side-effect accounting is complete
- Ensures bit buffer is flushed before copying.
- Correctly updates next_out, total_out, avail_out, pending, and pending_out reset.
- Tracev retained.
161-169: FLUSH_BLOCK macros still call the wrapper; that’s fineThe wrapper now delegates to the inline; no behavior change.
deflate.h (1)
456-494: Good relocation and centralization of arch-specific hooks
- Moving hooks and window-alignment macros here simplifies deflate.c and improves modularity.
- Sensible defaults for non-arch builds:
- Alignment: 64-byte
- Hooks: benign no-ops
- DEFLATE_NEED_CHECKSUM defaults to 1 to keep legacy behavior
No issues spotted.
Develop Aug 14PR inline read_bufDeflatebench shows around 0.3-0.4% speedup for compression. |
|
Rebased and split up the inlining into two commits. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
deflate_p.h (4)
101-126: flush_pending_inline: Semantics look correct; consider unifying with out-of-line wrapper to avoid divergenceThe implementation mirrors the classic flush_pending: flush bits, copy min(pending, avail_out), bump pointers/counters, and reset pending_out when empty. Looks good.
Optional: to minimize the risk of the inline and the out-of-line PREFIX(flush_pending) drifting over time, make the latter a thin wrapper around this inline helper (or have FLUSH_BLOCK call the inline). This keeps one source of truth while still enabling inlining where used.
As a follow-up, consider changing the non-inlined wrapper in deflate.c to:
/* deflate.c */ Z_INTERNAL void PREFIX(flush_pending)(PREFIX3(stream) *strm) { flush_pending_inline(strm); }And/or update the macro here to use the inline helper:
/* deflate_p.h */ #define FLUSH_BLOCK_ONLY(s, last) { \ zng_tr_flush_block(s, (s->block_start >= 0 ? \ (char *)&s->window[(unsigned)s->block_start] : \ NULL), \ (uint32_t)((int)s->strstart - s->block_start), \ (last)); \ s->block_start = (int)s->strstart; \ flush_pending_inline(s->strm); \ }
101-126: Minor type consistency nit: consider uInt for lenlen derives from MIN(s->pending, strm->avail_out). Since avail_out is uInt, using uInt for len avoids implicit conversions when updating avail_out, while keeping the same range characteristics on supported platforms.
-static inline void flush_pending_inline(PREFIX3(stream) *strm) { - uint32_t len; +static inline void flush_pending_inline(PREFIX3(stream) *strm) { + uInt len;
133-155: Micro-optimization: avoid repeated strm->state dereferencesHoisting strm->state to a local helps codegen and readability, and shortens the FUNCTABLE_CALL argument for crc/adler paths.
static inline unsigned read_buf(PREFIX3(stream) *strm, unsigned char *buf, unsigned size) { - uint32_t len = MIN(strm->avail_in, size); + uint32_t len = MIN(strm->avail_in, size); + deflate_state *s = strm->state; @@ - if (!DEFLATE_NEED_CHECKSUM(strm)) { + if (!DEFLATE_NEED_CHECKSUM(strm)) { memcpy(buf, strm->next_in, len); #ifdef GZIP - } else if (strm->state->wrap == 2) { - FUNCTABLE_CALL(crc32_fold_copy)(&strm->state->crc_fold, buf, strm->next_in, len); + } else if (s->wrap == 2) { + FUNCTABLE_CALL(crc32_fold_copy)(&s->crc_fold, buf, strm->next_in, len); #endif - } else if (strm->state->wrap == 1) { + } else if (s->wrap == 1) { strm->adler = FUNCTABLE_CALL(adler32_fold_copy)(strm->adler, buf, strm->next_in, len); } else { memcpy(buf, strm->next_in, len); }
133-155: Consider force-inlining for hot pathsIf the intent is to realize consistent perf wins across compilers/optimization levels, using the project’s force-inline attribute macro (if available, e.g., Z_FORCEINLINE) can help ensure these helpers are inlined, especially in non-LTO builds.
-static inline void flush_pending_inline(PREFIX3(stream) *strm) { +Z_FORCEINLINE static void flush_pending_inline(PREFIX3(stream) *strm) { @@ -static inline unsigned read_buf(PREFIX3(stream) *strm, unsigned char *buf, unsigned size) { +Z_FORCEINLINE static unsigned read_buf(PREFIX3(stream) *strm, unsigned char *buf, unsigned size) {If Z_FORCEINLINE isn’t available or desired, ignore this.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deflate.c(6 hunks)deflate.h(1 hunks)deflate_p.h(2 hunks)deflate_stored.c(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- deflate.c
- deflate.h
- deflate_stored.c
🔇 Additional comments (2)
deflate_p.h (2)
12-13: Include functable.h here is correct and necessary for FUNCTABLE_CALL usageThis enables the crc/adler fold-copy dispatch below. No concerns.
127-156: No stale PREFIX(read_buf) references found; ready to approveA quick ripgrep search confirms there are no remaining calls to PREFIX(read_buf)(…). The only usages of read_buf(…) in deflate_stored.c and deflate.c will correctly bind to the new static inline implementation. LGTM.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1952 +/- ##
===========================================
- Coverage 81.91% 80.77% -1.15%
===========================================
Files 162 162
Lines 14013 14405 +392
Branches 3137 3304 +167
===========================================
+ Hits 11479 11635 +156
- Misses 1568 1739 +171
- Partials 966 1031 +65 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b0962c4 to
747f408
Compare
|
Redid benches after rebase, force-inline and removal of indirections in read_buf. PRDeflatebench overall 0.22% faster All in all I am pretty happy with this. I also retested inlining of read_buf and partial inlining of flush_pending independently and they both improve performance. |
|
Inlined flush_pending in DFLTCC's |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
arch/s390/dfltcc_deflate.c (1)
94-104: Missing header for flush_pending_inline leads to implicit declaration / no inlining
flush_pending_inlineis astatic inlinedefined in deflate_p.h. This file doesn’t include that header, so the call will either fail to compile (no prototype) or won't inline as intended. Include deflate_p.h after deflate.h.Apply:
#include "deflate.h" +#include "deflate_p.h" #include "trees_emit.h"
🧹 Nitpick comments (1)
deflate.c (1)
51-54: Dup include of functable.h (benign); consider relying on deflate_p.h
deflate_p.halready includesfunctable.h. Keeping both is harmless thanks to include guards, but you can remove this one to reduce redundancy.Apply:
-#include "functable.h"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
arch/s390/dfltcc_deflate.c(1 hunks)deflate.c(6 hunks)deflate_p.h(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-02-18T02:53:23.621Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Applied to files:
deflate.c
🧬 Code graph analysis (2)
deflate.c (4)
deflate_p.h (1)
void(107-125)arch/generic/crc32_fold_c.c (1)
void(15-18)adler32.c (3)
PREFIX(11-13)PREFIX(22-24)PREFIX(58-60)crc32.c (2)
PREFIX(21-25)PREFIX(35-37)
deflate_p.h (1)
deflate.c (1)
void(746-748)
🔇 Additional comments (8)
arch/s390/dfltcc_deflate.c (1)
93-105: Behavior preserved: send_eobs() still flushes bits then pending bytesSwitching to
flush_pending_inline(strm)correctly flushes bit buffer first (via zng_tr_flush_bits inside the inline) and copies out pending bytes. The subsequent memmove-on-remaining-pending logic remains valid, sincepending_outis only reset topending_bufwhenpending == 0. LGTM.deflate_p.h (2)
12-13: functable.h inclusion is appropriate hereRequired for FUNCTABLE_CALL used below and keeps inline helpers self-contained. LGTM.
101-126: Inline flush is correct and side-effect complete
flush_pending_inline()flushes pending bits first, respectsavail_out, updates all counters, and restorespending_outwhen drained. Matches legacy behavior while enabling inlining on hot paths.deflate.c (5)
743-748: Wrapper retained for ABI/internal call sites; implementation delegates to inlineThe wrapper now just calls
flush_pending_inline, preserving symbol presence while enabling selective inlining elsewhere. LGTM.
781-791: Hot path switch to inline flush is appropriateInlining the flush on the main deflate() path is consistent with the PR goal and should help a bit with register allocation and call overhead in tight loops.
966-972: Header/HCRC path uses inline flush — fineUsing
flush_pending_inlinehere ensures the buffer is empty before proceeding, matching prior behavior while allowing the compiler to inline when profitable.
1043-1054: Trailer path uses inline flush — fineKeeps semantics identical (trailer written once, then flush). LGTM.
1226-1227: Switched fill_window() to new read_buf helperThe inline
read_bufupdates checksum via fold-copy when needed and advances stream counters atomically. This centralizes I/O handling and is a good change.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deflate_p.h (1)
101-126: flush_pending_inline: correct semantics; tiny branch-hint nitLogic matches the previous non-inlined helper: flush bit buffer, copy min(pending, avail_out), update pointers/counters, and reset pending_out when empty. Consider adding an unlikely-branch hint for the early return to help the predictor in hot paths.
- if (len == 0) + if (UNLIKELY(len == 0)) return;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
arch/s390/dfltcc_deflate.c(1 hunks)deflate.c(6 hunks)deflate_p.h(2 hunks)deflate_stored.c(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- arch/s390/dfltcc_deflate.c
- deflate_stored.c
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-02-18T02:53:23.621Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Applied to files:
deflate.c
🧬 Code graph analysis (2)
deflate_p.h (2)
deflate.c (1)
void(746-748)trees.c (1)
void(375-407)
deflate.c (5)
deflate_p.h (1)
void(107-125)arch/generic/crc32_fold_c.c (1)
void(15-18)crc32.c (2)
PREFIX(21-25)PREFIX(35-37)adler32.c (3)
PREFIX(11-13)PREFIX(22-24)PREFIX(58-60)deflate.h (2)
deflate_allocs_s(135-143)internal_state(120-120)
🔇 Additional comments (8)
deflate_p.h (2)
12-13: LGTM: functable.h include is appropriate for FUNCTABLE_CALL usageIncluding functable.h here is correct since read_buf and (gzip/adler) fold copies rely on FUNCTABLE_CALL. Include guards prevent double inclusion with deflate.c.
127-156: No stalePREFIX(read_buf)calls—read_bufis correctly inlinedA ripgrep search confirmed that there are no call sites using the removed
PREFIX(read_buf)macro, and existing calls indeflate_stored.canddeflate.cinvoke the bareread_buf, which is now provided as astatic inlineindeflate_p.h:
- deflate_stored.c: calls to
read_buf(s->strm, …)at lines 97 and 160- deflate.c: call to
n = read_buf(…)at line 1226- No hits for
PREFIX(read_buf)(…)outsidedeflate_p.hSince each translation unit including
deflate_p.hwill see the inline definition ofread_buf, there are no unresolved symbols to worry about. This comment can be resolved as no further changes are needed.deflate.c (6)
51-51: Include order is finefunctable.h is needed here because this TU directly uses FUNCTABLE_CALL (e.g., adler32/crc32/slide_hash). Keeping it above deflate.h is consistent with prior style.
743-749: Wrapper preserves ABI while enabling hot-path inliningKeeping Z_INTERNAL PREFIX(flush_pending) as a thin wrapper over flush_pending_inline retains the exported symbol and avoids code bloat where the wrapper is used in colder paths.
779-792: Good: hot-path switch to flush_pending_inlineInlining here avoids a function call in the common pending != 0 case and pairs well with the earlier bit-buffer flush in the inline. No functional issues spotted.
966-972: Header/HCRC transition: inline flush is appropriateUsing flush_pending_inline here keeps the pipeline warm as we transition states. Matches the partial inlining strategy noted in the PR description.
1043-1044: Trailer flush: inline usage is consistentInlining at the trailer write is sensible, given it’s on the finishing hot path. State updates remain identical to the wrapper.
1226-1227: read_buf callsites fully migrated and verifiedAll calls to read_buf now resolve to the new TU-local inline defined in deflate_p.h. No legacy wrapper calls remain.
• deflate.c (lines 1226–1227): calls inline read_buf as intended.
• deflate_stored.c (lines 97 and 160): also invoke the inline version from deflate_p.h.No further changes required.
|
Rebased |
This results in a small performance improvement for compression.
Summary by CodeRabbit
Refactor
New Features
Chores