Skip to content

Replace chunk extension Match[] state machine with flat transition table#16542

Open
franz1981 wants to merge 3 commits into
netty:4.2from
franz1981:4.2_16540
Open

Replace chunk extension Match[] state machine with flat transition table#16542
franz1981 wants to merge 3 commits into
netty:4.2from
franz1981:4.2_16540

Conversation

@franz1981

Copy link
Copy Markdown
Contributor

Motivation:
The existing enum State with Match[] (BitSet) arrays uses multiple
pointer-chasing loads per byte. It also has two bugs: ';' was not
excluded from ChunkExtValToken causing valid multi-extension lines
to be rejected, and finish() did not accept CHUNK_EXT_VAL_TOKEN as
a valid final state.

Modification:
Replace with a flat byte[7*256] transition table populated at class
load time. Fix both missing cases. Add regression tests and benchmarks.

Result:
Fixes #16540

@franz1981

franz1981 commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

Benchmark Results

CPU: see lscpu, NUMA node 0 pinned via numactl --localalloc -N 0
JDK: Oracle 25.0.2 (runtime), Temurin 21.0.8 (build)

1. End-to-end HTTP chunked request/response

HttpChunkedRequestResponseBenchmark — 16 chunks with randomized extensions (name-only, token, quoted, mixed).

Metric Baseline Optimized Delta
Throughput (ops/s) 10,916,175 11,063,757 +1.4%
Instructions/op 1,171.5 1,156.9 -1.2%
Branches/op 248.3 244.2 -1.7%

Validation is ~3% of total work - you were right @chrisvest (it is a drop in the ocean ^^).
Hot path is channel pipeline machinery instead...

2. Isolated process() loop

HttpChunkExtValidationBenchmark — no allocation, reuses processor instance, 64 pre-generated inputs.

Length enumBitSet (ops/μs) flatTable (ops/μs) Speedup
64 2.416 9.094 3.76x
256 0.587 2.115 3.60x
1024 0.193 0.520 2.69x

Per-byte breakdown at length=256:

Metric enumBitSet flatTable Ratio
cycles/byte 28.6 8.0 3.6x fewer
instructions/byte 67.1 10.7 6.3x fewer
branches/byte 11.3 2.3 4.9x fewer
branch-misses/op 4.0 0.2 17x fewer
IPC 2.35 1.34

@franz1981

franz1981 commented Mar 30, 2026

Copy link
Copy Markdown
Contributor Author

@chrisvest @normanmaurer any chance you had taken a look at the additional test?

@chrisvest

Copy link
Copy Markdown
Member

I think it'd be better to separate bug fixes and performance improvements into separate PRs.

I like the execution speed of the table approach, but I prefer describing the grammar in terms of the enum and match clauses. Can we keep the enum and have it compile into the table? Perhaps we can make a small internal DSL out of this, for the various small-scale parsing needs we have.

chrisvest added a commit to chrisvest/netty that referenced this pull request Mar 30, 2026
Motivation:
The chunk extension parsing/validation logic did not correctly account for extensions with multiple key-value pairs.

Modification:
- Adapt parsing logic to accept the repeatability of extension key-value pairs, and that chunk extensions can end on unquoted value tokens.
- Add tests to capture these cases.

Result:
More correct HTTP chunk extension parsing.

The tests are lifted from netty#16542
@chrisvest

Copy link
Copy Markdown
Member

@franz1981 I extract the parsing fix to a separate PR: #16579

@franz1981

Copy link
Copy Markdown
Contributor Author

Can we keep the enum and have it compile into the table?

I can see why; my concern is to make the static init to be intensive (GC and CPU wise) for no apparent reason. But the benefit of a declarative API, for us humans, is undeniable, is agree

Let me check if I can make this some sort of builder, without any GC root so, once CDS or Layden Is used, the table is the only thing which would be retained at runtime

normanmaurer added a commit that referenced this pull request Mar 31, 2026
Motivation:
The chunk extension parsing/validation logic did not correctly account
for extensions with multiple key-value pairs.

Modification:
- Adapt parsing logic to accept the repeatability of extension key-value
pairs, and that chunk extensions can end on unquoted value tokens.
- Add tests to capture these cases.

Result:
More correct HTTP chunk extension parsing.

The tests are lifted from #16542

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
netty-project-bot pushed a commit that referenced this pull request Mar 31, 2026
Motivation:
The chunk extension parsing/validation logic did not correctly account
for extensions with multiple key-value pairs.

Modification:
- Adapt parsing logic to accept the repeatability of extension key-value
pairs, and that chunk extensions can end on unquoted value tokens.
- Add tests to capture these cases.

Result:
More correct HTTP chunk extension parsing.

The tests are lifted from #16542

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
(cherry picked from commit b2c0dfd)
netty-project-bot pushed a commit that referenced this pull request Mar 31, 2026
Motivation:
The chunk extension parsing/validation logic did not correctly account
for extensions with multiple key-value pairs.

Modification:
- Adapt parsing logic to accept the repeatability of extension key-value
pairs, and that chunk extensions can end on unquoted value tokens.
- Add tests to capture these cases.

Result:
More correct HTTP chunk extension parsing.

The tests are lifted from #16542

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
(cherry picked from commit b2c0dfd)
@chrisvest

Copy link
Copy Markdown
Member

I can see why; my concern is to make the static init to be intensive (GC and CPU wise) for no apparent reason. But the benefit of a declarative API, for us humans, is undeniable, is agree

Yeah, I know the init cost is there, but I think it's worth it in the worst case. I don't know how much a builder would help. It's hard to quantify the early init performance. In the other extreme end, we could code-gen from an annotation processor or something, but that would also be a lot of work.

@normanmaurer

Copy link
Copy Markdown
Member

I can see why; my concern is to make the static init to be intensive (GC and CPU wise) for no apparent reason. But the benefit of a declarative API, for us humans, is undeniable, is agree

Yeah, I know the init cost is there, but I think it's worth it in the worst case. I don't know how much a builder would help. It's hard to quantify the early init performance. In the other extreme end, we could code-gen from an annotation processor or something, but that would also be a lot of work.

Please don't over-optimize. I am strongly in favor of not adding even more complexity that would be a nightmare to maintain (like code-gen)

chrisvest added a commit that referenced this pull request Apr 3, 2026
Auto-port of #16579 to 4.1
Cherry-picked commit: b2c0dfd

---
Motivation:
The chunk extension parsing/validation logic did not correctly account
for extensions with multiple key-value pairs.

Modification:
- Adapt parsing logic to accept the repeatability of extension key-value
pairs, and that chunk extensions can end on unquoted value tokens.
- Add tests to capture these cases.

Result:
More correct HTTP chunk extension parsing.

The tests are lifted from #16542

Co-authored-by: Chris Vest <christianvest_hansen@apple.com>
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
@franz1981

Copy link
Copy Markdown
Contributor Author

As soon as I am back from holidays I will finish this one 😄

chrisvest added a commit that referenced this pull request Apr 9, 2026
Auto-port of #16579 to 5.0
Cherry-picked commit: b2c0dfd

---
Motivation:
The chunk extension parsing/validation logic did not correctly account
for extensions with multiple key-value pairs.

Modification:
- Adapt parsing logic to accept the repeatability of extension key-value
pairs, and that chunk extensions can end on unquoted value tokens.
- Add tests to capture these cases.

Result:
More correct HTTP chunk extension parsing.

The tests are lifted from #16542

Co-authored-by: Chris Vest <christianvest_hansen@apple.com>
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
The existing enum State with Match[] (BitSet) arrays uses multiple
pointer-chasing loads per byte in the process() hot path.

Modification:
Compile the declarative enum + Match specification into a flat
byte[7*256] transition table at class load time. Reorder state
constants so that invalid-at-finish states are contiguous, enabling
finish() to use a single range check instead of a switch - this
benefits native image AOT compilation which has no branch profiling.

Result:
Fixes netty#16540
@franz1981

Copy link
Copy Markdown
Contributor Author

PTAL @chrisvest and @normanmaurer
I have slightly changed the finish logic to become a single comparison ;)
Nothing wow, but since often we parse 6-8 chars...

@franz1981

Copy link
Copy Markdown
Contributor Author

Ptal again @chrisvest @normanmaurer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve stricter HTTP/1.1 chunk extension parsing

3 participants