Skip to content

chore: Updated content-type header parsing#6414

Merged
jsumners merged 5 commits intomainfrom
parse-content-type
Jan 26, 2026
Merged

chore: Updated content-type header parsing#6414
jsumners merged 5 commits intomainfrom
parse-content-type

Conversation

@jsumners
Copy link
Member

@jsumners jsumners commented Dec 14, 2025

Resolves #6332.

This is a follow-on from #6303. This introduces a more robust content-type header value parsing algorithm.

IMPORTANT: this results in much stricter content-type handling. Prior to this PR, we were accepting, and handling, headers like content-type: a-simple-string. That is not valid. There must be a type and a subtype present. So a "valid" form would be content-type: a-simple-string/foo.

This change also enforces normalization of the content-type value. For example, if the request has application/json; charset=utf-8 as the value, it will be normalized to application/json; charset="utf-8".

@jsumners jsumners force-pushed the parse-content-type branch 3 times, most recently from c6e17ba to 817bc34 Compare December 14, 2025 18:41
@jsumners jsumners marked this pull request as ready for review December 14, 2025 18:43
@jsumners jsumners requested review from a team and mcollina December 14, 2025 18:47
Copy link
Member

@gurgunday gurgunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is hard to land in v5

Also, we should check how much it affects the performance

Unfortunately we removed the benchmark CI, someone should run it locally (I'll do it when I review)

@jsumners jsumners marked this pull request as draft December 14, 2025 22:42
@jsumners jsumners marked this pull request as ready for review December 15, 2025 00:20
@climba03003
Copy link
Member

It like a light weight version of content-type parser?
Removed the parameters parts, it is ok on our case because we never use it.

How the performance after the change?

@jsumners jsumners force-pushed the parse-content-type branch 2 times, most recently from 93f288a to b620bfc Compare December 15, 2025 11:35
@jsumners
Copy link
Member Author

It like a light weight version of content-type parser? Removed the parameters parts, it is ok on our case because we never use it.

In our code base, "content-type parser" refers to the thing that collects and manages the functions used to parse HTTP POST bodies. Prior to this PR, the indicator used to denote what format those POST bodies is in was inspected in various ways that are very fragile. This PR introduces a dedicated identifier parser.

How the performance after the change?

I don't know. Likely slower than the previous fragile checks, but likely not the slowest possible. Unfortunately, proper handling of the content-type header value is an imperative. We cannot continue using the previous fragile inspections.

@jsumners jsumners added the bugfix Issue or PR that should land as semver patch label Dec 15, 2025
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that there are no content types without a /? I think a user could create one like that, and I feel we should kind of support it.

@jsumners
Copy link
Member Author

Are you sure that there are no content types without a /? I think a user could create one like that, and I feel we should kind of support it.

Please review https://httpwg.org/specs/rfc9110.html#media.type

The required format is type/subtype. Anything written as type is not spec compliant. We opt to adhere to the specs: https://github.com/fastify/fastify/blob/main/docs/Reference/Principles.md#specification-adherence

@jsumners
Copy link
Member Author

jsumners commented Dec 15, 2025

@mcollina I have written the benchmark below to test the performance of splitting out only the type and subtype parts. I agree that the IIFE is too expensive. It looks like the regular expression version becomes worth it as we start dealing with the parameters.

Benchmark
'use strict'

const assert = require('node:assert')
const { Suite } = require('bench-node')
const suite = new Suite({
  reporter
})

const basicMediaType = 'application/json'
const paramsMediaType = 'application/json; charset=utf-8; foo=bar'

suite.add('iife split (basic)', () => {
  let parts = (() => {
    const [first, ...rest] = basicMediaType.split('/')
    if (rest.length === 0) return [first]
    return [first, rest.join('/')]
  })()
  assert.equal(parts[0], 'application')
  assert.equal(parts[1], 'json')
})

suite.add('iife split (params)', () => {
  let parts = (() => {
    const [first, ...rest] = paramsMediaType.split('/')
    if (rest.length === 0) return [first]
    return [first, rest.join('/')]
  })()

  const type = parts[0]
  parts = parts[1].split(';')
  const subtype = parts[0]
  
  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

suite.add('long split (basic)', () => {
  let parts = basicMediaType.split('/')
  assert.equal(parts[0], 'application')
  assert.equal(parts[1], 'json')
})

suite.add('long split (params)', () => {
  let parts = paramsMediaType.split('/')
  const type = parts[0]

  parts = parts[1].split(';')
  const subtype = parts[0]

  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

const tokensReg = /^([\w!#$%&'*+.^`|~-]+)\/([\w!#$%&'*+.^`|~-]+)\s*;?/
suite.add('regex split (basic)', () => {
  const matches = tokensReg.exec(basicMediaType)
  const type = matches[1]
  const subtype = matches[2]
  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

suite.add('regex split (params)', () => {
  const matches = tokensReg.exec(paramsMediaType)
  const type = matches[1]
  const subtype = matches[2]
  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

suite.run()

const { format } = new Intl.NumberFormat('en-US', {
  useGrouping: true
})
function reporter (results) {

  for (const result of results) {
    const ops = format(result.opsSec)
    console.log(`${result.name}: ${ops} ops/sec`)
  }
}
❯ node --allow-natives-syntax index.js
iife split (basic): 9,655,587.18 ops/sec
iife split (params): 6,261,203.487 ops/sec
long split (basic): 15,462,605.433 ops/sec
long split (params): 8,142,902.648 ops/sec
regex split (basic): 15,020,238.273 ops/sec
regex split (params): 14,836,393.537 ops/sec

Copy link
Member

@Eomm Eomm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closes #6077

Minor suggestions

@jsumners
Copy link
Member Author

I'll be getting back to this soon.

@jsumners jsumners force-pushed the parse-content-type branch from 4523a2b to 126463b Compare January 1, 2026 18:58
@maneeshaking
Copy link

Hi @jsumners and @mcollina ,

I've analyzed the blocker regarding the performance vs. accuracy of the header parsing. Since Regex is proving difficult for strictness and split has overhead, the most performant way to handle this in V8 is usually a direct indexOf slice.

Here is a proposed solution that fixes the vulnerability (adding the missing else check) and includes a high-performance parser that passes the tests without Regex complexity.

  1. Optimized Parser (Replaces getEssenceMediaType) This handles application/json; charset=utf-8 correctly without allocating arrays.
function getEssenceMediaType (header) {
  if (!header) return ''
  let end = header.indexOf(';')
  if (end === -1) end = header.length
  return header.slice(0, end).trim().toLowerCase()
}

The Logic Fix (Inside validate) This ensures we don't "Fail Open" if the content type is missing from the schema map.

// ... inside validate() ...
    } else if (context[bodySchema]) {
      const contentType = getEssenceMediaType(request.headers['content-type'])
      const contentSchema = context[bodySchema][contentType]
      
      if (contentSchema) {
        validatorFunction = contentSchema
      } else {
        // Fix: Fail closed if Content-Type is present but not defined in schema
        const error = new Error('Content-Type not supported')
        error.statusCode = 415
        error.code = 'FST_ERR_VALIDATION'
        return error
      }
    }

This approach should satisfy the 415 test expectation and resolve the performance regression.

@jsumners jsumners requested a review from mcollina January 19, 2026 13:41
Co-authored-by: Manuel Spigolon <behemoth89@gmail.com>
Signed-off-by: James Sumners <321201+jsumners@users.noreply.github.com>
@jsumners jsumners requested a review from Eomm January 19, 2026 13:42
@jsumners
Copy link
Member Author

Here's some benchmark results for the algorithm in commit 7bbaad3:

❯ node --allow-natives-syntax index.js
iife split (basic): 9,570,374.43 ops/sec
iife split (params): 6,220,149.478 ops/sec
long split (basic): 15,476,672.638 ops/sec
long split (params): 8,291,173.241 ops/sec
regex split (basic): 15,846,585.338 ops/sec
regex split (params): 15,680,046.717 ops/sec
indexOf based (basic): 10,012,107.29 ops/sec
indexOf based (params): 2,958,473.828 ops/sec

Note that the long split (params) only benchmarks the split operations between the media type field and the parameters list. It does not cover the parsing of those parameters. Feel free to update and run locally to get a better approximation.

I await the group's decision.

Benchmark script:
'use strict'

const assert = require('node:assert')
const { Suite } = require('bench-node')
const suite = new Suite({
  reporter
})

const basicMediaType = 'application/json'
const paramsMediaType = 'application/json; charset=utf-8; foo="bar"'

const typeNameReg = /^[\w!#$%&'*+.^`|~-]+$/
const subtypeNameReg = /^[\w!#$%&'*+.^`|~-]+\s*/
const keyValuePairsReg = /([\w!#$%&'*+.^`|~-]+)=([^;]*)/gm

suite.add('iife split (basic)', () => {
  let parts = (() => {
    const [first, ...rest] = basicMediaType.split('/')
    if (rest.length === 0) return [first]
    return [first, rest.join('/')]
  })()
  assert.equal(parts[0], 'application')
  assert.equal(parts[1], 'json')
})

suite.add('iife split (params)', () => {
  let parts = (() => {
    const [first, ...rest] = paramsMediaType.split('/')
    if (rest.length === 0) return [first]
    return [first, rest.join('/')]
  })()

  const type = parts[0]
  parts = parts[1].split(';')
  const subtype = parts[0]
  
  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

suite.add('long split (basic)', () => {
  let parts = basicMediaType.split('/')
  assert.equal(parts[0], 'application')
  assert.equal(parts[1], 'json')
})

suite.add('long split (params)', () => {
  let parts = paramsMediaType.split('/')
  const type = parts[0]

  parts = parts[1].split(';')
  const subtype = parts[0]

  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

const tokensReg = /^([\w!#$%&'*+.^`|~-]+)\/([\w!#$%&'*+.^`|~-]+)\s*;?/
suite.add('regex split (basic)', () => {
  const matches = tokensReg.exec(basicMediaType)
  const type = matches[1]
  const subtype = matches[2]
  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

suite.add('regex split (params)', () => {
  const matches = tokensReg.exec(paramsMediaType)
  const type = matches[1]
  const subtype = matches[2]
  assert.equal(type, 'application')
  assert.equal(subtype, 'json')
})

suite.add('indexOf based (basic)', () => {
  let sepIdx = basicMediaType.indexOf(';')
  if (sepIdx === -1) {
    sepIdx = basicMediaType.indexOf('/')
    if (sepIdx === -1) return
    const type = basicMediaType.slice(0, sepIdx).trimStart().toLowerCase()
    const subtype = basicMediaType.slice(sepIdx + 1).trimEnd().toLowerCase()

    if (typeNameReg.test(type) === true && subtypeNameReg.test(subtype) === true) {
      assert.equal(type, 'application')
      assert.equal(subtype, 'json')
    }

    return
  }
})

suite.add('indexOf based (params)', () => {
  let sepIdx = paramsMediaType.indexOf(';')
  if (sepIdx === -1) {
    sepIdx = paramsMediaType.indexOf('/')
    if (sepIdx === -1) return
    const type = paramsMediaType.slice(0, sepIdx).trimStart().toLowerCase()
    const subtype = paramsMediaType.slice(sepIdx + 1).trimEnd().toLowerCase()

    if (typeNameReg.test(type) === true && subtypeNameReg.test(subtype) === true) {
      assert.equal(type, 'application')
      assert.equal(subtype, 'json')
    }

    return
  }

  const mediaType = paramsMediaType.slice(0, sepIdx).toLowerCase()
  const paramsList = paramsMediaType.slice(sepIdx + 1).trim()

  sepIdx = mediaType.indexOf('/')
  if (sepIdx === -1) return
  const type = mediaType.slice(0, sepIdx).trimStart()
  const subtype = mediaType.slice(sepIdx + 1).trimEnd()

  if (typeNameReg.test(type) === false || subtypeNameReg.test(subtype) === false) {
    return
  }
  assert.equal(type, 'application')
  assert.equal(subtype, 'json')

  const parameters = new Map()
  let matches = keyValuePairsReg.exec(paramsList)
  while (matches) {
    const key = matches[1]
    const value = matches[2]
    if (value[0] === '"') {
      if (value.at(-1) !== '"') {
        parameters.set(key, 'invalid quoted string')
        matches = keyValuePairsReg.exec(paramsList)
        continue
      }
      parameters.set(key, value.slice(1, value.length - 1))
    } else {
      parameters.set(key, value)
    }
    matches = keyValuePairsReg.exec(paramsList)
  }
  assert.equal(parameters.size, 2)
})

suite.run()

const { format } = new Intl.NumberFormat('en-US', {
  useGrouping: true
})
function reporter (results) {

  for (const result of results) {
    const ops = format(result.opsSec)
    console.log(`${result.name}: ${ops} ops/sec`)
  }
}

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jsumners
Copy link
Member Author

@Eomm do you have any further concern?

Copy link
Member

@jean-michelet jean-michelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Let's close this after merge:
#6332

Copy link
Member

@gurgunday gurgunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we landing this as a bug fix? Wouldn't SemVer major be more appropriate?

@jsumners
Copy link
Member Author

Are we landing this as a bug fix? Wouldn't SemVer major be more appropriate?

This is a bug fix. The current content-type parser algorithm allows several invalid values, many of which allow for schema validation to be incorrectly applied. You can see this in the tests I added, updated, and removed.

I'll merge and release this on 2026-01-26 if we don't have any objections by then.

attn: @fastify/leads

@jsumners jsumners merged commit 32d7b6a into main Jan 26, 2026
35 checks passed
@jsumners jsumners deleted the parse-content-type branch January 26, 2026 11:08
@gurgunday
Copy link
Member

I personally don't have objections, let's see users' reaction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Issue or PR that should land as semver patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Content-Type parsing is incorrect

8 participants