Skip to content

fix: make content-type case-insensitive#5742

Merged
mcollina merged 3 commits into
mainfrom
lowercase
Oct 16, 2024
Merged

fix: make content-type case-insensitive#5742
mcollina merged 3 commits into
mainfrom
lowercase

Conversation

@gurgunday

@gurgunday gurgunday commented Oct 15, 2024

Copy link
Copy Markdown
Member

so this actually correctly fixes #5740

standards say Content-Type is case insensitive, which is why we prevent the registration of text/html and text/HTML at the same time, but currently don't treat text/html and text/HTML as the same

@gurgunday gurgunday added the benchmark Label to run benchmark against PR and main branch label Oct 15, 2024
@github-actions

Copy link
Copy Markdown

Node: 20
PR: [1] 1565k requests in 30.05s, 294 MB read
MAIN: [1] 1546k requests in 30.04s, 290 MB read


Node: 22
PR: [1] 1546k requests in 30.04s, 290 MB read
MAIN: [1] 1560k requests in 30.04s, 293 MB read

@github-actions

Copy link
Copy Markdown

Node: 20
PR: [1] 1367k requests in 30.05s, 257 MB read
MAIN: [1] 1357k requests in 30.04s, 255 MB read


Node: 22
PR: [1] 1380k requests in 30.04s, 259 MB read
MAIN: [1] 1392k requests in 30.05s, 261 MB read

@github-actions github-actions Bot removed the benchmark Label to run benchmark against PR and main branch label Oct 15, 2024
@gurgunday

Copy link
Copy Markdown
Member Author

We currently don't have this:

For example, the following media types are equivalent in describing HTML text data encoded in the UTF-8 character encoding scheme, but the first is preferred for consistency (the "charset" parameter value is defined as being case-insensitive in [RFC2046], Section 4.1.2):

text/html;charset=utf-8
Text/HTML;Charset="utf-8"
text/html; charset="utf-8"
text/html;charset=UTF-8

@gurgunday gurgunday requested review from a team and mcollina October 15, 2024 20:55
@Uzlopak

Uzlopak commented Oct 15, 2024

Copy link
Copy Markdown
Contributor

cant we avoid somehow toLowerCase?

@gurgunday

This comment was marked as outdated.

@gurgunday

This comment was marked as outdated.

@gurgunday

This comment was marked as outdated.

@L2jLiga

L2jLiga commented Oct 16, 2024

Copy link
Copy Markdown
Contributor

cant we avoid somehow toLowerCase?

We already call toLowerCase in add and hasParser, I think its ok to have it in run as well

@climba03003

climba03003 commented Oct 16, 2024

Copy link
Copy Markdown
Member

I would say we should do something like this. The intention is that we cache both any-case header and lowercase header.
So, most of the time it hit the cache with original header and check only once by lowercase header.

Please be aware multipart/form-data always pollute the cache (spammed with all useless cache record) since the boundary part may not be the same for all time.
But most of the case, the full lookup should only run one time.

ContentTypeParser.prototype.getParser = function (contentType) {
  // fast-path for remembered content-type (incoming as lowercase)
  let parser = this.customParsers.get(contentType)
  if (parser !== undefined) return parser
  parser = this.cache.get(contentType)
  if (parser !== undefined) return parser
  // case-incensitive cache check
  const lowercaseContentType = contentType.toLowerCase()
  let parser = this.customParsers.get(lowercaseContentType)
  if (parser !== undefined) return parser
  parser = this.cache.get(lowercaseContentType)
  if (parser !== undefined) return parser


  // eslint-disable-next-line no-var
  for (var i = 0; i !== this.parserList.length; ++i) {
    const parserListItem = this.parserList[i]
    if (
      lowercaseContentType.slice(0, parserListItem.length) === parserListItem &&
      (lowercaseContentType.length === parserListItem.length || lowercaseContentType.charCodeAt(parserListItem.length) === 59 /* `;` */ || lowercaseContentType.charCodeAt(parserListItem.length) === 32 /* ` ` */)
    ) {
      parser = this.customParsers.get(parserListItem)
      this.cache.set(contentType, parser)
      // cache for both lowercase and original header
      this.cache.set(lowercaseContentType, parser)
      return parser
    }
  }
  // eslint-disable-next-line no-var
  for (var j = 0; j !== this.parserRegExpList.length; ++j) {
    const parserRegExp = this.parserRegExpList[j]
    // regexp should use the original header in all case
    // it allows the user to choose whether they want case-sensitive or case-insensitive
    if (parserRegExp.test(contentType)) {
      parser = this.customParsers.get(parserRegExp.toString())
      this.cache.set(contentType, parser)
      // cache for both lowercase and original header
      this.cache.set(lowercaseContentType, parser)
      return parser
    }
  }
  return this.customParsers.get('')
}

@metcoder95 metcoder95 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine, the benchmarks doesn't show a big difference and should be fine.

Nevertheless, the suggestion of caching from @climba03003 seems quite interesting; adding a disclosure of the multipart/form-data limitations might be sufficient.

@gurgunday

gurgunday commented Oct 16, 2024

Copy link
Copy Markdown
Member Author

@climba03003 wdyt? I like this approach!

Every string Content-Type parser is lowercase , so:

If we miss the original cache checks, then we lowercase and try to match manually like before

If we match there, then we save the original version to the cache for future matches

@gurgunday

Copy link
Copy Markdown
Member Author

Note that this only applies to string content-type parsers

Like you said, RegExp offers more freedom in how a user would want to match

@gurgunday gurgunday requested a review from a team October 16, 2024 10:04
@gurgunday gurgunday requested a review from metcoder95 October 16, 2024 13:11

@mcollina mcollina left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina merged commit c1789df into main Oct 16, 2024
@mcollina mcollina deleted the lowercase branch October 16, 2024 14:57

@metcoder95 metcoder95 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions

Copy link
Copy Markdown

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Oct 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

addContentTypeParser() no longer case-sensitive

6 participants