Stricter HTTP/1.1 chunk extension parsing#16537
Conversation
* Stricter HTTP/1.1 chunk extension parsing Motivation: Chunk extensions can include quoted string values, which themselves can include linebreaks and escapes for quotes. We need to parse these properly to ensure we find the correct start of the chunk data. Modification: - Implement full RFC 9112 HTTP/1.1 compliant parsing of chunk start lines. - Add test cases from the Funky Chunks research: https://w4ke.info/2025/10/29/funky-chunks-2.html - This inclues chunk extensions with quoted strings that have linebreaks in them, and quoted strings that use escape codes. - Remove a test case that asserted support for control characters in the middle of chunk start lines, including after a naked chunk length field. Such control characters are not permitted by the standard. Result: Prevents HTTP message smuggling through carefully crafted chunk extensions. * Revert the ByteProcessor changes * Add a benchmark for HTTP/1.1 chunk decoding * Fix chunk initial line decoding The initial line was not correctly truncated at its line break and ended up including some of the chunk contents. * Failing to parse chunk size must throw NumberFormatException * Line breaks are completely disallowed within chunk extensions Change the chunk parsing back to its original code, because we know that line breaks are not supposed to occur within chunk extensions at all. This means doing the SWAR search should be suitable. Modify the byte processor and add it as a validation step of the parsed chunk start line. Update the tests to match. * Fix checkstyle (cherry picked from commit 3b76df1)
|
Sorry for been late @chrisvest Both can be eliminated by flattening all transitions into a single There is a correctness fix bundled here: the current matchers in private static final byte[] TRANSITIONS = compile();
static byte[] compile() {
Match[][] stateMatchers = new Match[N_STATES][];
stateMatchers[SIZE] = new Match[]{
new Match(SIZE).chars("0123456789abcdefABCDEF \t"),
new Match(CHUNK_EXT_NAME).chars(";")
};
stateMatchers[CHUNK_EXT_NAME] = new Match[]{
new Match(CHUNK_EXT_NAME)
.range(0x21, 0x7E)
.chars(" \t")
.chars("(),/:<=>?@[\\]{}", false),
new Match(CHUNK_EXT_VAL_START).chars("=")
};
stateMatchers[CHUNK_EXT_VAL_START] = new Match[]{
new Match(CHUNK_EXT_VAL_START).chars(" \t"),
new Match(CHUNK_EXT_VAL_QUOTED).chars("\""),
// explicitly exclude '"' (0x22) — disjoint with the matcher above
new Match(CHUNK_EXT_VAL_TOKEN)
.range(0x21, 0x7E)
.chars("(),/:<=>?@[\\]{}\"", false)
};
stateMatchers[CHUNK_EXT_VAL_QUOTED] = new Match[]{
new Match(CHUNK_EXT_VAL_QUOTED_ESCAPE).chars("\\"),
new Match(CHUNK_EXT_VAL_QUOTED_END).chars("\""),
// 0x23-0x5B excludes '"' (0x22), 0x5D-0x7E excludes '\' (0x5C) — disjoint by range
new Match(CHUNK_EXT_VAL_QUOTED)
.chars("\t !")
.range(0x23, 0x5B)
.range(0x5D, 0x7E)
.range(0x80, 0xFF)
};
stateMatchers[CHUNK_EXT_VAL_QUOTED_ESCAPE] = new Match[]{
new Match(CHUNK_EXT_VAL_QUOTED)
.chars("\t ")
.range(0x21, 0x7E)
.range(0x80, 0xFF)
};
stateMatchers[CHUNK_EXT_VAL_QUOTED_END] = new Match[]{
new Match(CHUNK_EXT_VAL_QUOTED_END).chars("\t "),
new Match(CHUNK_EXT_NAME).chars(";")
};
stateMatchers[CHUNK_EXT_VAL_TOKEN] = new Match[]{
// explicitly exclude ';' (0x3B) — disjoint with the matcher below
new Match(CHUNK_EXT_VAL_TOKEN)
.range(0x21, 0x7E)
.chars("(),/:<=>?@[\\]{};", false),
new Match(CHUNK_EXT_NAME).chars(";")
};
byte[] table = new byte[N_STATES * 256];
Arrays.fill(table, (byte) -1);
for (int state = 0; state < N_STATES; state++) {
int base = state << 8;
for (Match m : stateMatchers[state]) {
for (int i = m.nextSetBit(0); i >= 0; i = m.nextSetBit(i + 1)) {
if (table[base | i] != -1) {
throw new IllegalStateException(
"overlapping matchers at byte 0x" + Integer.toHexString(i) +
" in state " + state);
}
table[base | i] = (byte) m.then;
}
}
}
return table;
}private int state = SIZE;
@Override
public boolean process(byte value) {
int next = TRANSITIONS[state << 8 | (value & 0xFF)];
if (next < 0) {
if (state == SIZE) {
throw new NumberFormatException("Invalid chunk size");
}
throw new InvalidChunkExtensionException("Invalid chunk extension");
}
state = next;
return true;
}
|
|
@franz1981 Let's look at that in a follow-up PR. I'm in the middle of cutting a release, and will have to include this one as is. |
Motivation:
Chunk extensions can include quoted string values, which themselves can include linebreaks and escapes for quotes. We need to parse these properly to ensure we find the correct start of the chunk data.
Modification:
Result:
Prevents HTTP message smuggling through carefully crafted chunk extensions.
Revert the ByteProcessor changes
Add a benchmark for HTTP/1.1 chunk decoding
Fix chunk initial line decoding
The initial line was not correctly truncated at its line break and ended up including some of the chunk contents.
Failing to parse chunk size must throw NumberFormatException
Line breaks are completely disallowed within chunk extensions
Change the chunk parsing back to its original code, because we know that line breaks are not supposed to occur within chunk extensions at all. This means doing the SWAR search should be suitable.
Modify the byte processor and add it as a validation step of the parsed chunk start line.
Update the tests to match.
(cherry picked from commit 3b76df1)