Eliminate redundant bounds checks in CompositeByteBuf accessors by franz1981 · Pull Request #16525 · netty/netty

franz1981 · 2026-03-18T15:35:05Z

Motivation:

Every _getXxx/_setXxx in CompositeByteBuf delegates to the underlying buffer's public API (e.g. c.buf.getByte()), which re-checks bounds that the composite level already validated.

Modifications:

Add an abuf field to Component that caches (AbstractByteBuf) buf at construction time, or null for non-AbstractByteBuf wrappers
Use abuf._getXxx() (no bounds check) when available, fall back to buf.getXxx() otherwise
Add sequentialReadBytes and sequentialGetBytes JMH benchmarks

Result:

Redundant bounds checks eliminated on the hot path.

franz1981 · 2026-03-18T15:53:11Z

@normanmaurer

  Benchmark                              (bufferType)  (size)  Mode  Cnt      Baseline (±err)       Commit1 (±err)   Ratio

  === Sequential ===
  sequentialGetBytes                     SMALL_CHUNKS   10240  thrpt   5   23563.32 ±  861.85   24996.74 ± 2834.19   1.06x
  sequentialGetBytes                     LARGE_CHUNKS   10240  thrpt   5   48116.85 ± 2396.69   56202.34 ±  674.51   1.17x
  sequentialGetBytes              SMALL_CHUNKS_DIRECT   10240  thrpt   5   20882.80 ± 3327.20   21529.90 ± 3040.57   1.03x
  sequentialGetBytes              LARGE_CHUNKS_DIRECT   10240  thrpt   5   38989.21 ±  771.70   44600.55 ±  546.00   1.14x

  sequentialReadBytes                    SMALL_CHUNKS   10240  thrpt   5   23030.90 ± 1182.36   24422.75 ±  859.11   1.06x
  sequentialReadBytes                    LARGE_CHUNKS   10240  thrpt   5   42214.83 ±  190.90   47075.50 ±  642.57   1.12x
  sequentialReadBytes             SMALL_CHUNKS_DIRECT   10240  thrpt   5   20359.31 ±  981.67   21145.33 ± 1250.80   1.04x
  sequentialReadBytes             LARGE_CHUNKS_DIRECT   10240  thrpt   5   34738.97 ±  263.33   37685.10 ±  313.38   1.08x

  === Random Access ===
  setGetLong                             SMALL_CHUNKS   10240  thrpt   5  3318081.9 ± 360816.0  3072150.6 ± 360034.8   0.93x
  setGetLong                             LARGE_CHUNKS   10240  thrpt   5 43629810.9 ± 433517.2 43819842.9 ± 240070.8   1.00x
  setLong                                SMALL_CHUNKS   10240  thrpt   5  5604152.6 ±  24458.0  6636585.7 ±  24754.9   1.18x
  setLong                                LARGE_CHUNKS   10240  thrpt   5 45805877.1 ± 116882.6 47834277.2 ± 431502.9   1.04x

Here we go:

LARGE_CHUNKS benefit most (+12-17%) — fewer component crossings means more time spent in the accessor itself
SMALL_CHUNKS benefit less (+3-6%) — binary search dominates, so the bounds check savings are diluted
Direct is consistently slower than heap (~15-20%) — that's the JDK 25 MemorySegment overhead on direct buffer access

franz1981 · 2026-03-18T16:07:30Z

@yawkat It's a tiny improvement, but for free, basically ^^
I'm checking for other low hanging fruits in assembly

Motivation: Every _getXxx/_setXxx in CompositeByteBuf delegates to the underlying buffer's public API (e.g. c.buf.getByte()), which re-checks bounds that the composite level already validated. Additionally, sequential readByte() calls pay a binary search cost on every component boundary crossing, since findComponent0 only caches the last accessed Component reference without its array index. Modifications: - Add an `abuf` field to Component that caches `(AbstractByteBuf) buf` at construction time, or null for non-AbstractByteBuf wrappers - Use abuf._getXxx() (no bounds check) when available, fall back to buf.getXxx() otherwise - Add sequentialReadBytes and sequentialGetBytes JMH benchmarks Result: Redundant bounds checks eliminated on the hot path.

franz1981 · 2026-03-18T17:38:31Z

Here we go, another round of improvements - this time for readByte only TBH:

  Benchmark                              (bufferType)  (size)    Baseline    Commit1   Commit1+2   C1+2/BL

  sequentialReadBytes                    SMALL_CHUNKS   10240    23031       24423      37012      1.61x  !!!
  sequentialReadBytes                    LARGE_CHUNKS   10240    42215       47076      48356      1.15x
  sequentialReadBytes             SMALL_CHUNKS_DIRECT   10240    20359       21145      31411      1.54x  !!!
  sequentialReadBytes             LARGE_CHUNKS_DIRECT   10240    20359       37685      38768      1.12x

  sequentialGetBytes                     SMALL_CHUNKS   10240    23563       24997      24215      1.03x  (unchanged, expected)
  sequentialGetBytes                     LARGE_CHUNKS   10240    48117       56202      56520      1.17x  (unchanged, expected)

multi-byte requires a bit more work so probably not worthy.

franz1981 · 2026-03-20T08:15:55Z

buffer/src/main/java/io/netty/buffer/CompositeByteBuf.java


    // weak cache - check it first when looking for component
    private Component lastAccessed;
+    private int lastAccessedIndex;


JOL says this will fix into existing wasted space, so basically free ^^

franz1981 · 2026-03-20T08:16:30Z

buffer/src/main/java/io/netty/buffer/CompositeByteBuf.java

            this.srcBuf = srcBuf;
            this.srcAdjustment = srcOffset - offset;
            this.buf = buf;
+            this.abuf = buf instanceof AbstractByteBuf ? (AbstractByteBuf) buf : null;


this is going to use the most optimized bound-check free version of each getXYZ access

franz1981 · 2026-03-20T08:17:43Z

@normanmaurer ready to go for me: I don't plan to put much more efforts here yet, as with minimal changes I see some measurable improvement, so, good ROI

Motivation: Sequential readByte() calls inherited from AbstractByteBuf pay a binary search cost on every component boundary crossing. For composites with many small components (e.g. decompressor output), this is the dominant cost — async-profiler shows 43% of CPU in findIt(). Modifications: - Add lastAccessedIndex field to track the array position of the cached component - Override readByte() to use lastAccessed cache directly, skipping the findComponent0 indirection - Add findComponentForRead() that advances to the next non-empty component in O(1) instead of binary search, falling back to findIt() only on random jumps - Add direct buffer variants to the sequential benchmark Result: Sequential readByte() throughput improved by up to 61% for small-chunk composites (SMALL_CHUNKS 10240, JDK 25).

franz1981 · 2026-03-23T07:51:17Z

PTAL @normanmaurer this should be enough for now. Mostly low hanging fruits ;)

Motivation: Every _getXxx/_setXxx in CompositeByteBuf delegates to the underlying buffer's public API (e.g. c.buf.getByte()), which re-checks bounds that the composite level already validated. Modifications: - Add an `abuf` field to Component that caches `(AbstractByteBuf) buf` at construction time, or null for non-AbstractByteBuf wrappers - Use abuf._getXxx() (no bounds check) when available, fall back to buf.getXxx() otherwise - Add sequentialReadBytes and sequentialGetBytes JMH benchmarks Result: Redundant bounds checks eliminated on the hot path. (cherry picked from commit 6001499)

netty-project-bot · 2026-03-24T07:10:30Z

Auto-port PR for 5.0: #16534

…accessors (#16534) Auto-port of #16525 to 5.0 Cherry-picked commit: 6001499 --- Motivation: Every _getXxx/_setXxx in CompositeByteBuf delegates to the underlying buffer's public API (e.g. c.buf.getByte()), which re-checks bounds that the composite level already validated. Modifications: - Add an `abuf` field to Component that caches `(AbstractByteBuf) buf` at construction time, or null for non-AbstractByteBuf wrappers - Use abuf._getXxx() (no bounds check) when available, fall back to buf.getXxx() otherwise - Add sequentialReadBytes and sequentialGetBytes JMH benchmarks Result: Redundant bounds checks eliminated on the hot path. Co-authored-by: Francesco Nigro <nigro.fra@gmail.com>

…rs (#16525)" This reverts commit 6001499.

…rs" (#16550) Reverts #16525

franz1981 force-pushed the composite_opt branch from a621787 to 8847e11 Compare March 18, 2026 15:35

franz1981 requested review from chrisvest and normanmaurer March 18, 2026 15:40

franz1981 force-pushed the composite_opt branch from 8847e11 to b45c2a5 Compare March 18, 2026 17:37

franz1981 commented Mar 20, 2026

View reviewed changes

franz1981 force-pushed the composite_opt branch from b45c2a5 to 84ceadc Compare March 23, 2026 07:49

chrisvest added this to the 4.2.11.Final milestone Mar 24, 2026

chrisvest added the needs-cherry-pick-5.0 This PR should be cherry-picked to 5.0 once merged. label Mar 24, 2026

chrisvest approved these changes Mar 24, 2026

View reviewed changes

normanmaurer merged commit 6001499 into netty:4.2 Mar 24, 2026
33 of 37 checks passed

netty-project-bot mentioned this pull request Mar 24, 2026

Auto-port 5.0: Eliminate redundant bounds checks in CompositeByteBuf accessors #16534

Merged

github-actions bot removed the needs-cherry-pick-5.0 This PR should be cherry-picked to 5.0 once merged. label Mar 24, 2026

chrisvest added a commit that referenced this pull request Mar 25, 2026

Revert "Eliminate redundant bounds checks in CompositeByteBuf accesso…

f79c0c0

…rs (#16525)" This reverts commit 6001499.

chrisvest mentioned this pull request Mar 25, 2026

Revert "Eliminate redundant bounds checks in CompositeByteBuf accessors" #16550

Merged

chrisvest added a commit that referenced this pull request Mar 25, 2026

Revert "Eliminate redundant bounds checks in CompositeByteBuf accesso…

7074624

…rs" (#16550) Reverts #16525

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eliminate redundant bounds checks in CompositeByteBuf accessors#16525

Eliminate redundant bounds checks in CompositeByteBuf accessors#16525
normanmaurer merged 2 commits intonetty:4.2from
franz1981:composite_opt

franz1981 commented Mar 18, 2026

Uh oh!

franz1981 commented Mar 18, 2026 •

edited

Loading

Uh oh!

franz1981 commented Mar 18, 2026

Uh oh!

franz1981 commented Mar 18, 2026 •

edited

Loading

Uh oh!

franz1981 Mar 20, 2026

Uh oh!

franz1981 Mar 20, 2026

Uh oh!

franz1981 commented Mar 20, 2026

Uh oh!

franz1981 commented Mar 23, 2026

Uh oh!

Uh oh!

netty-project-bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

franz1981 commented Mar 18, 2026

Uh oh!

franz1981 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franz1981 commented Mar 18, 2026

Uh oh!

franz1981 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franz1981 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

franz1981 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

franz1981 commented Mar 20, 2026

Uh oh!

franz1981 commented Mar 23, 2026

Uh oh!

Uh oh!

netty-project-bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

franz1981 commented Mar 18, 2026 •

edited

Loading

franz1981 commented Mar 18, 2026 •

edited

Loading