Replace chunk extension Match[] state machine with flat transition table#16542
Replace chunk extension Match[] state machine with flat transition table#16542franz1981 wants to merge 3 commits into
Conversation
Benchmark ResultsCPU: see 1. End-to-end HTTP chunked request/response
Validation is ~3% of total work - you were right @chrisvest (it is a drop in the ocean ^^). 2. Isolated process() loop
Per-byte breakdown at length=256:
|
|
@chrisvest @normanmaurer any chance you had taken a look at the additional test? |
|
I think it'd be better to separate bug fixes and performance improvements into separate PRs. I like the execution speed of the table approach, but I prefer describing the grammar in terms of the enum and match clauses. Can we keep the enum and have it compile into the table? Perhaps we can make a small internal DSL out of this, for the various small-scale parsing needs we have. |
Motivation: The chunk extension parsing/validation logic did not correctly account for extensions with multiple key-value pairs. Modification: - Adapt parsing logic to accept the repeatability of extension key-value pairs, and that chunk extensions can end on unquoted value tokens. - Add tests to capture these cases. Result: More correct HTTP chunk extension parsing. The tests are lifted from netty#16542
|
@franz1981 I extract the parsing fix to a separate PR: #16579 |
I can see why; my concern is to make the static init to be intensive (GC and CPU wise) for no apparent reason. But the benefit of a declarative API, for us humans, is undeniable, is agree Let me check if I can make this some sort of builder, without any GC root so, once CDS or Layden Is used, the table is the only thing which would be retained at runtime |
Motivation: The chunk extension parsing/validation logic did not correctly account for extensions with multiple key-value pairs. Modification: - Adapt parsing logic to accept the repeatability of extension key-value pairs, and that chunk extensions can end on unquoted value tokens. - Add tests to capture these cases. Result: More correct HTTP chunk extension parsing. The tests are lifted from #16542 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation: The chunk extension parsing/validation logic did not correctly account for extensions with multiple key-value pairs. Modification: - Adapt parsing logic to accept the repeatability of extension key-value pairs, and that chunk extensions can end on unquoted value tokens. - Add tests to capture these cases. Result: More correct HTTP chunk extension parsing. The tests are lifted from #16542 Co-authored-by: Norman Maurer <norman_maurer@apple.com> (cherry picked from commit b2c0dfd)
Motivation: The chunk extension parsing/validation logic did not correctly account for extensions with multiple key-value pairs. Modification: - Adapt parsing logic to accept the repeatability of extension key-value pairs, and that chunk extensions can end on unquoted value tokens. - Add tests to capture these cases. Result: More correct HTTP chunk extension parsing. The tests are lifted from #16542 Co-authored-by: Norman Maurer <norman_maurer@apple.com> (cherry picked from commit b2c0dfd)
Yeah, I know the init cost is there, but I think it's worth it in the worst case. I don't know how much a builder would help. It's hard to quantify the early init performance. In the other extreme end, we could code-gen from an annotation processor or something, but that would also be a lot of work. |
Please don't over-optimize. I am strongly in favor of not adding even more complexity that would be a nightmare to maintain (like code-gen) |
Auto-port of #16579 to 4.1 Cherry-picked commit: b2c0dfd --- Motivation: The chunk extension parsing/validation logic did not correctly account for extensions with multiple key-value pairs. Modification: - Adapt parsing logic to accept the repeatability of extension key-value pairs, and that chunk extensions can end on unquoted value tokens. - Add tests to capture these cases. Result: More correct HTTP chunk extension parsing. The tests are lifted from #16542 Co-authored-by: Chris Vest <christianvest_hansen@apple.com> Co-authored-by: Norman Maurer <norman_maurer@apple.com>
|
As soon as I am back from holidays I will finish this one 😄 |
Auto-port of #16579 to 5.0 Cherry-picked commit: b2c0dfd --- Motivation: The chunk extension parsing/validation logic did not correctly account for extensions with multiple key-value pairs. Modification: - Adapt parsing logic to accept the repeatability of extension key-value pairs, and that chunk extensions can end on unquoted value tokens. - Add tests to capture these cases. Result: More correct HTTP chunk extension parsing. The tests are lifted from #16542 Co-authored-by: Chris Vest <christianvest_hansen@apple.com> Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation: The existing enum State with Match[] (BitSet) arrays uses multiple pointer-chasing loads per byte in the process() hot path. Modification: Compile the declarative enum + Match specification into a flat byte[7*256] transition table at class load time. Reorder state constants so that invalid-at-finish states are contiguous, enabling finish() to use a single range check instead of a switch - this benefits native image AOT compilation which has no branch profiling. Result: Fixes netty#16540
|
PTAL @chrisvest and @normanmaurer |
|
Ptal again @chrisvest @normanmaurer |
Motivation:
The existing enum State with Match[] (BitSet) arrays uses multiple
pointer-chasing loads per byte. It also has two bugs: ';' was not
excluded from ChunkExtValToken causing valid multi-extension lines
to be rejected, and finish() did not accept CHUNK_EXT_VAL_TOKEN as
a valid final state.
Modification:
Replace with a flat byte[7*256] transition table populated at class
load time. Fix both missing cases. Add regression tests and benchmarks.
Result:
Fixes #16540