chore: Updated content-type header parsing#6414
Conversation
c6e17ba to
817bc34
Compare
gurgunday
left a comment
There was a problem hiding this comment.
I think this is hard to land in v5
Also, we should check how much it affects the performance
Unfortunately we removed the benchmark CI, someone should run it locally (I'll do it when I review)
|
It like a light weight version of content-type parser? How the performance after the change? |
93f288a to
b620bfc
Compare
In our code base, "content-type parser" refers to the thing that collects and manages the functions used to parse HTTP POST bodies. Prior to this PR, the indicator used to denote what format those POST bodies is in was inspected in various ways that are very fragile. This PR introduces a dedicated identifier parser.
I don't know. Likely slower than the previous fragile checks, but likely not the slowest possible. Unfortunately, proper handling of the |
b620bfc to
bd67b41
Compare
mcollina
left a comment
There was a problem hiding this comment.
Are you sure that there are no content types without a /? I think a user could create one like that, and I feel we should kind of support it.
Please review https://httpwg.org/specs/rfc9110.html#media.type The required format is |
|
@mcollina I have written the benchmark below to test the performance of splitting out only the Benchmark'use strict'
const assert = require('node:assert')
const { Suite } = require('bench-node')
const suite = new Suite({
reporter
})
const basicMediaType = 'application/json'
const paramsMediaType = 'application/json; charset=utf-8; foo=bar'
suite.add('iife split (basic)', () => {
let parts = (() => {
const [first, ...rest] = basicMediaType.split('/')
if (rest.length === 0) return [first]
return [first, rest.join('/')]
})()
assert.equal(parts[0], 'application')
assert.equal(parts[1], 'json')
})
suite.add('iife split (params)', () => {
let parts = (() => {
const [first, ...rest] = paramsMediaType.split('/')
if (rest.length === 0) return [first]
return [first, rest.join('/')]
})()
const type = parts[0]
parts = parts[1].split(';')
const subtype = parts[0]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
suite.add('long split (basic)', () => {
let parts = basicMediaType.split('/')
assert.equal(parts[0], 'application')
assert.equal(parts[1], 'json')
})
suite.add('long split (params)', () => {
let parts = paramsMediaType.split('/')
const type = parts[0]
parts = parts[1].split(';')
const subtype = parts[0]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
const tokensReg = /^([\w!#$%&'*+.^`|~-]+)\/([\w!#$%&'*+.^`|~-]+)\s*;?/
suite.add('regex split (basic)', () => {
const matches = tokensReg.exec(basicMediaType)
const type = matches[1]
const subtype = matches[2]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
suite.add('regex split (params)', () => {
const matches = tokensReg.exec(paramsMediaType)
const type = matches[1]
const subtype = matches[2]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
suite.run()
const { format } = new Intl.NumberFormat('en-US', {
useGrouping: true
})
function reporter (results) {
for (const result of results) {
const ops = format(result.opsSec)
console.log(`${result.name}: ${ops} ops/sec`)
}
} |
|
I'll be getting back to this soon. |
4523a2b to
126463b
Compare
|
I've analyzed the blocker regarding the performance vs. accuracy of the header parsing. Since Regex is proving difficult for strictness and split has overhead, the most performant way to handle this in V8 is usually a direct indexOf slice. Here is a proposed solution that fixes the vulnerability (adding the missing else check) and includes a high-performance parser that passes the tests without Regex complexity.
The Logic Fix (Inside validate) This ensures we don't "Fail Open" if the content type is missing from the schema map. This approach should satisfy the 415 test expectation and resolve the performance regression. |
23c9425 to
7bbaad3
Compare
Co-authored-by: Manuel Spigolon <behemoth89@gmail.com> Signed-off-by: James Sumners <321201+jsumners@users.noreply.github.com>
|
Here's some benchmark results for the algorithm in commit 7bbaad3: Note that the I await the group's decision. Benchmark script:'use strict'
const assert = require('node:assert')
const { Suite } = require('bench-node')
const suite = new Suite({
reporter
})
const basicMediaType = 'application/json'
const paramsMediaType = 'application/json; charset=utf-8; foo="bar"'
const typeNameReg = /^[\w!#$%&'*+.^`|~-]+$/
const subtypeNameReg = /^[\w!#$%&'*+.^`|~-]+\s*/
const keyValuePairsReg = /([\w!#$%&'*+.^`|~-]+)=([^;]*)/gm
suite.add('iife split (basic)', () => {
let parts = (() => {
const [first, ...rest] = basicMediaType.split('/')
if (rest.length === 0) return [first]
return [first, rest.join('/')]
})()
assert.equal(parts[0], 'application')
assert.equal(parts[1], 'json')
})
suite.add('iife split (params)', () => {
let parts = (() => {
const [first, ...rest] = paramsMediaType.split('/')
if (rest.length === 0) return [first]
return [first, rest.join('/')]
})()
const type = parts[0]
parts = parts[1].split(';')
const subtype = parts[0]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
suite.add('long split (basic)', () => {
let parts = basicMediaType.split('/')
assert.equal(parts[0], 'application')
assert.equal(parts[1], 'json')
})
suite.add('long split (params)', () => {
let parts = paramsMediaType.split('/')
const type = parts[0]
parts = parts[1].split(';')
const subtype = parts[0]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
const tokensReg = /^([\w!#$%&'*+.^`|~-]+)\/([\w!#$%&'*+.^`|~-]+)\s*;?/
suite.add('regex split (basic)', () => {
const matches = tokensReg.exec(basicMediaType)
const type = matches[1]
const subtype = matches[2]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
suite.add('regex split (params)', () => {
const matches = tokensReg.exec(paramsMediaType)
const type = matches[1]
const subtype = matches[2]
assert.equal(type, 'application')
assert.equal(subtype, 'json')
})
suite.add('indexOf based (basic)', () => {
let sepIdx = basicMediaType.indexOf(';')
if (sepIdx === -1) {
sepIdx = basicMediaType.indexOf('/')
if (sepIdx === -1) return
const type = basicMediaType.slice(0, sepIdx).trimStart().toLowerCase()
const subtype = basicMediaType.slice(sepIdx + 1).trimEnd().toLowerCase()
if (typeNameReg.test(type) === true && subtypeNameReg.test(subtype) === true) {
assert.equal(type, 'application')
assert.equal(subtype, 'json')
}
return
}
})
suite.add('indexOf based (params)', () => {
let sepIdx = paramsMediaType.indexOf(';')
if (sepIdx === -1) {
sepIdx = paramsMediaType.indexOf('/')
if (sepIdx === -1) return
const type = paramsMediaType.slice(0, sepIdx).trimStart().toLowerCase()
const subtype = paramsMediaType.slice(sepIdx + 1).trimEnd().toLowerCase()
if (typeNameReg.test(type) === true && subtypeNameReg.test(subtype) === true) {
assert.equal(type, 'application')
assert.equal(subtype, 'json')
}
return
}
const mediaType = paramsMediaType.slice(0, sepIdx).toLowerCase()
const paramsList = paramsMediaType.slice(sepIdx + 1).trim()
sepIdx = mediaType.indexOf('/')
if (sepIdx === -1) return
const type = mediaType.slice(0, sepIdx).trimStart()
const subtype = mediaType.slice(sepIdx + 1).trimEnd()
if (typeNameReg.test(type) === false || subtypeNameReg.test(subtype) === false) {
return
}
assert.equal(type, 'application')
assert.equal(subtype, 'json')
const parameters = new Map()
let matches = keyValuePairsReg.exec(paramsList)
while (matches) {
const key = matches[1]
const value = matches[2]
if (value[0] === '"') {
if (value.at(-1) !== '"') {
parameters.set(key, 'invalid quoted string')
matches = keyValuePairsReg.exec(paramsList)
continue
}
parameters.set(key, value.slice(1, value.length - 1))
} else {
parameters.set(key, value)
}
matches = keyValuePairsReg.exec(paramsList)
}
assert.equal(parameters.size, 2)
})
suite.run()
const { format } = new Intl.NumberFormat('en-US', {
useGrouping: true
})
function reporter (results) {
for (const result of results) {
const ops = format(result.opsSec)
console.log(`${result.name}: ${ops} ops/sec`)
}
} |
|
@Eomm do you have any further concern? |
jean-michelet
left a comment
There was a problem hiding this comment.
Nice!
Let's close this after merge:
#6332
This is a bug fix. The current content-type parser algorithm allows several invalid values, many of which allow for schema validation to be incorrectly applied. You can see this in the tests I added, updated, and removed. I'll merge and release this on 2026-01-26 if we don't have any objections by then. attn: @fastify/leads |
|
I personally don't have objections, let's see users' reaction |
Resolves #6332.
This is a follow-on from #6303. This introduces a more robust content-type header value parsing algorithm.
IMPORTANT: this results in much stricter content-type handling. Prior to this PR, we were accepting, and handling, headers like
content-type: a-simple-string. That is not valid. There must be a type and a subtype present. So a "valid" form would becontent-type: a-simple-string/foo.This change also enforces normalization of the content-type value. For example, if the request has
application/json; charset=utf-8as the value, it will be normalized toapplication/json; charset="utf-8".