Skip to content

Stricter HTTP/1.1 chunk extension parsing#16537

Merged
chrisvest merged 1 commit into
netty:4.1from
chrisvest:4.1-merge
Mar 24, 2026
Merged

Stricter HTTP/1.1 chunk extension parsing#16537
chrisvest merged 1 commit into
netty:4.1from
chrisvest:4.1-merge

Conversation

@chrisvest

Copy link
Copy Markdown
Member

Motivation:
Chunk extensions can include quoted string values, which themselves can include linebreaks and escapes for quotes. We need to parse these properly to ensure we find the correct start of the chunk data.

Modification:

  • Implement full RFC 9112 HTTP/1.1 compliant parsing of chunk start lines.
  • Add test cases from the Funky Chunks research: https://w4ke.info/2025/10/29/funky-chunks-2.html
  • This inclues chunk extensions with quoted strings that have linebreaks in them, and quoted strings that use escape codes.
  • Remove a test case that asserted support for control characters in the middle of chunk start lines, including after a naked chunk length field. Such control characters are not permitted by the standard.

Result:
Prevents HTTP message smuggling through carefully crafted chunk extensions.

  • Revert the ByteProcessor changes

  • Add a benchmark for HTTP/1.1 chunk decoding

  • Fix chunk initial line decoding

The initial line was not correctly truncated at its line break and ended up including some of the chunk contents.

  • Failing to parse chunk size must throw NumberFormatException

  • Line breaks are completely disallowed within chunk extensions

Change the chunk parsing back to its original code, because we know that line breaks are not supposed to occur within chunk extensions at all. This means doing the SWAR search should be suitable.

Modify the byte processor and add it as a validation step of the parsed chunk start line.

Update the tests to match.

  • Fix checkstyle

(cherry picked from commit 3b76df1)

* Stricter HTTP/1.1 chunk extension parsing

Motivation:
Chunk extensions can include quoted string values, which themselves can include linebreaks and escapes for quotes.
We need to parse these properly to ensure we find the correct start of the chunk data.

Modification:
- Implement full RFC 9112 HTTP/1.1 compliant parsing of chunk start lines.
- Add test cases from the Funky Chunks research: https://w4ke.info/2025/10/29/funky-chunks-2.html
- This inclues chunk extensions with quoted strings that have linebreaks in them, and quoted strings that use escape codes.
 - Remove a test case that asserted support for control characters in the middle of chunk start lines, including after a naked chunk length field. Such control characters are not permitted by the standard.

 Result:
 Prevents HTTP message smuggling through carefully crafted chunk extensions.

* Revert the ByteProcessor changes

* Add a benchmark for HTTP/1.1 chunk decoding

* Fix chunk initial line decoding

The initial line was not correctly truncated at its line break and ended up including some of the chunk contents.

* Failing to parse chunk size must throw NumberFormatException

* Line breaks are completely disallowed within chunk extensions

Change the chunk parsing back to its original code, because we know that line breaks are not supposed to occur within chunk extensions at all.
This means doing the SWAR search should be suitable.

Modify the byte processor and add it as a validation step of the parsed chunk start line.

Update the tests to match.

* Fix checkstyle

(cherry picked from commit 3b76df1)
@chrisvest chrisvest added this to the 4.1.132.Final milestone Mar 24, 2026
@franz1981

franz1981 commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

Sorry for been late @chrisvest
Here a suggestion to move to a single table lookup..
The per-byte Match[] scan has two costs: iterating N matchers per byte, and a double dependent load (match.get(value)STATES_BY_ORDINAL[next]) where each load can't start until the previous resolves — worst case 3–5 serialised dependent loads on states with multiple matchers.

Both can be eliminated by flattening all transitions into a single byte[N_STATES * 256] and tracking state as a plain int. Hot path: one shift, one OR, one array load — no loop, no second dependent load. Footprint: 7 × 256 = 1,792 bytes in a single contiguous array vs ~1,320 bytes scattered across 30 heap objects (15 Match + 15 long[] backing arrays) with pointer chasing.

There is a correctness fix bundled here: the current matchers in ChunkExtValStart and ChunkExtValToken overlap on specific bytes (" at 0x22 and ; at 0x3B respectively). The original code is safe because it is first-match-wins. A naïve table compilation would be last-write-wins and silently produce wrong transitions for those bytes. The fix is to make the matchers explicitly disjoint by adding '"' and ';' to the relevant exclusion lists, and to enforce disjointness unconditionally at build time:

private static final byte[] TRANSITIONS = compile();

static byte[] compile() {
    Match[][] stateMatchers = new Match[N_STATES][];

    stateMatchers[SIZE] = new Match[]{
            new Match(SIZE).chars("0123456789abcdefABCDEF \t"),
            new Match(CHUNK_EXT_NAME).chars(";")
    };
    stateMatchers[CHUNK_EXT_NAME] = new Match[]{
            new Match(CHUNK_EXT_NAME)
                    .range(0x21, 0x7E)
                    .chars(" \t")
                    .chars("(),/:<=>?@[\\]{}", false),
            new Match(CHUNK_EXT_VAL_START).chars("=")
    };
    stateMatchers[CHUNK_EXT_VAL_START] = new Match[]{
            new Match(CHUNK_EXT_VAL_START).chars(" \t"),
            new Match(CHUNK_EXT_VAL_QUOTED).chars("\""),
            // explicitly exclude '"' (0x22) — disjoint with the matcher above
            new Match(CHUNK_EXT_VAL_TOKEN)
                    .range(0x21, 0x7E)
                    .chars("(),/:<=>?@[\\]{}\"", false)
    };
    stateMatchers[CHUNK_EXT_VAL_QUOTED] = new Match[]{
            new Match(CHUNK_EXT_VAL_QUOTED_ESCAPE).chars("\\"),
            new Match(CHUNK_EXT_VAL_QUOTED_END).chars("\""),
            // 0x23-0x5B excludes '"' (0x22), 0x5D-0x7E excludes '\' (0x5C) — disjoint by range
            new Match(CHUNK_EXT_VAL_QUOTED)
                    .chars("\t !")
                    .range(0x23, 0x5B)
                    .range(0x5D, 0x7E)
                    .range(0x80, 0xFF)
    };
    stateMatchers[CHUNK_EXT_VAL_QUOTED_ESCAPE] = new Match[]{
            new Match(CHUNK_EXT_VAL_QUOTED)
                    .chars("\t ")
                    .range(0x21, 0x7E)
                    .range(0x80, 0xFF)
    };
    stateMatchers[CHUNK_EXT_VAL_QUOTED_END] = new Match[]{
            new Match(CHUNK_EXT_VAL_QUOTED_END).chars("\t "),
            new Match(CHUNK_EXT_NAME).chars(";")
    };
    stateMatchers[CHUNK_EXT_VAL_TOKEN] = new Match[]{
            // explicitly exclude ';' (0x3B) — disjoint with the matcher below
            new Match(CHUNK_EXT_VAL_TOKEN)
                    .range(0x21, 0x7E)
                    .chars("(),/:<=>?@[\\]{};", false),
            new Match(CHUNK_EXT_NAME).chars(";")
    };

    byte[] table = new byte[N_STATES * 256];
    Arrays.fill(table, (byte) -1);
    for (int state = 0; state < N_STATES; state++) {
        int base = state << 8;
        for (Match m : stateMatchers[state]) {
            for (int i = m.nextSetBit(0); i >= 0; i = m.nextSetBit(i + 1)) {
                if (table[base | i] != -1) {
                    throw new IllegalStateException(
                            "overlapping matchers at byte 0x" + Integer.toHexString(i) +
                            " in state " + state);
                }
                table[base | i] = (byte) m.then;
            }
        }
    }
    return table;
}
private int state = SIZE;

@Override
public boolean process(byte value) {
    int next = TRANSITIONS[state << 8 | (value & 0xFF)];
    if (next < 0) {
        if (state == SIZE) {
            throw new NumberFormatException("Invalid chunk size");
        }
        throw new InvalidChunkExtensionException("Invalid chunk extension");
    }
    state = next;
    return true;
}
  • & 0xFF on value is required since byte is signed and used as an array index.
  • Match objects and stateMatchers are GC'd after static init — zero runtime footprint.
  • The IllegalStateException in compile() is unconditional (not an assert) so overlap bugs are caught at class load regardless of JVM flags.

@chrisvest

Copy link
Copy Markdown
Member Author

@franz1981 Let's look at that in a follow-up PR. I'm in the middle of cutting a release, and will have to include this one as is.

@chrisvest chrisvest merged commit 60e53c9 into netty:4.1 Mar 24, 2026
17 of 18 checks passed
@chrisvest chrisvest deleted the 4.1-merge branch March 24, 2026 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants