Skip to content

Fix malformed Content-Type headers#2081

Merged
dentarg merged 1 commit intosinatra:mainfrom
stanhu:sh-fix-malformed-content-type
Feb 9, 2025
Merged

Fix malformed Content-Type headers#2081
dentarg merged 1 commit intosinatra:mainfrom
stanhu:sh-fix-malformed-content-type

Conversation

@stanhu
Copy link
Contributor

@stanhu stanhu commented Feb 3, 2025

This commit ensures that all parameters of Content-Type are separated with commas (,) instead of semicolons (;).

RFC 7231 says:

Content-Type = media-type

Media types are defined in [Section 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1).
An example of the field is

Content-Type: text/html; charset=ISO-8859-4

RFC 7231 3.1.1.1 says:

Media types define both a data format and various processing models:
   how to process that data in accordance with each context in which it
   is received.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   name=value pairs.

     parameter      = token "=" ( token / quoted-string )

   The type, subtype, and parameter name tokens are case-insensitive.
   Parameter values might or might not be case-sensitive, depending on
   the semantics of the parameter name.  The presence or absence of a
   parameter might be significant to the processing of a media-type,
   depending on its definition within the media type registry.

   A parameter value that matches the token production can be
   transmitted either as a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.  For example, the following
   examples are all equivalent, but the first is preferred for
   consistency:

     text/html;charset=utf-8
     text/html;charset=UTF-8
     Text/HTML;Charset="utf-8"
     text/html; charset="utf-8"

According to this link, it seems that there was prior confusion in early RFCs over this. It appears , was a mistake, and that ; should always be used.

Most people probably haven't run into this because in order to trigger this bug:

  1. The Sinatra app needs to insert a ; in the call to content_type (such as content_type "text/plain; version=0.0.4"). If you omit the ; in the Content-Type, then everything is fine.

  2. The client that talks to the app needs to reject , in the Content-Type. Golang's mime.ParseMediaType appears to reject Content-Type values that contain , but with the introduction of scrape: provide a fallback format prometheus/prometheus#15136 Prometheus v3 started to fail hard when this occurred.

Closes #2076

This commit ensures that all parameters of `Content-Type` are
separated with commas (`,`) instead of semicolons (`;`).

[RFC 7231](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.5) says:

```
Content-Type = media-type

Media types are defined in [Section 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1).
An example of the field is

Content-Type: text/html; charset=ISO-8859-4
```

[RFC 7231 3.1.1.1](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1) says:

```
Media types define both a data format and various processing models:
   how to process that data in accordance with each context in which it
   is received.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   name=value pairs.

     parameter      = token "=" ( token / quoted-string )

   The type, subtype, and parameter name tokens are case-insensitive.
   Parameter values might or might not be case-sensitive, depending on
   the semantics of the parameter name.  The presence or absence of a
   parameter might be significant to the processing of a media-type,
   depending on its definition within the media type registry.

   A parameter value that matches the token production can be
   transmitted either as a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.  For example, the following
   examples are all equivalent, but the first is preferred for
   consistency:

     text/html;charset=utf-8
     text/html;charset=UTF-8
     Text/HTML;Charset="utf-8"
     text/html; charset="utf-8"
```

According to [this link](https://stackoverflow.com/a/35879320), it
seems that there was prior confusion in early RFCs over this.  It
appears `,` was a mistake, and that `;` should always be used.

Most people probably haven't run into this because in order to trigger
this bug:

1. The Sinatra app needs to insert a `;` in the call to `content_type`
    (such as `content_type "text/plain; version=0.0.4"`). If you omit
    the `;` in the `Content-Type`, then everything is fine.

2. The client that talks to the app needs to reject `,` in the
`Content-Type`. Golang's `mime.ParseMediaType` appears to reject
`Content-Type` values that contain `,` but with the introduction of
prometheus/prometheus#15136 Prometheus v3
started to fail hard when this occurred.

Closes sinatra#2076
@stanhu stanhu force-pushed the sh-fix-malformed-content-type branch from cda26bc to 13c477a Compare February 3, 2025 17:22
@dentarg dentarg added the bug label Feb 9, 2025
@dentarg dentarg merged commit 025e8c5 into sinatra:main Feb 9, 2025
26 checks passed
zzak pushed a commit to zzak/sinatra that referenced this pull request May 23, 2025
This commit ensures that all parameters of `Content-Type` are separated
with commas (`,`) instead of semicolons (`;`).

RFC 7231 (https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.5) says:

      Content-Type = media-type

    Media types are defined in Section 3.1.1.1. An example of the field is

      Content-Type: text/html; charset=ISO-8859-4

RFC 7231 3.1.1.1 (https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1) says:

    Media types define both a data format and various processing models:
    how to process that data in accordance with each context in which it
    is received.

      media-type = type "/" subtype *( OWS ";" OWS parameter )
      type       = token
      subtype    = token

    The type/subtype MAY be followed by parameters in the form of
    name=value pairs.

      parameter      = token "=" ( token / quoted-string )

    The type, subtype, and parameter name tokens are case-insensitive.
    Parameter values might or might not be case-sensitive, depending on
    the semantics of the parameter name.  The presence or absence of a
    parameter might be significant to the processing of a media-type,
    depending on its definition within the media type registry.

    A parameter value that matches the token production can be
    transmitted either as a token or within a quoted-string.  The quoted
    and unquoted values are equivalent.  For example, the following
    examples are all equivalent, but the first is preferred for
    consistency:

      text/html;charset=utf-8
      text/html;charset=UTF-8
      Text/HTML;Charset="utf-8"
      text/html; charset="utf-8"

According to https://stackoverflow.com/a/35879320, it seems
that there was prior confusion in early RFCs over this. It appears `,`
was a mistake, and that `;` should always be used.

Most people probably haven't run into this because in order to trigger
this bug:

1. The Sinatra app needs to insert a `;` in the call to `content_type`
(such as `content_type "text/plain; version=0.0.4"`). If you omit the
`;` in the `Content-Type`, then everything is fine.

2. The client that talks to the app needs to reject `,` in the
`Content-Type`. Golang's `mime.ParseMediaType` appears to reject
`Content-Type` values that contain `,` but with the introduction of
prometheus/prometheus#15136 Prometheus v3
started to fail hard when this occurred.
zzak pushed a commit that referenced this pull request May 23, 2025
This commit ensures that all parameters of `Content-Type` are separated
with commas (`,`) instead of semicolons (`;`).

RFC 7231 (https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.5) says:

      Content-Type = media-type

    Media types are defined in Section 3.1.1.1. An example of the field is

      Content-Type: text/html; charset=ISO-8859-4

RFC 7231 3.1.1.1 (https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1) says:

    Media types define both a data format and various processing models:
    how to process that data in accordance with each context in which it
    is received.

      media-type = type "/" subtype *( OWS ";" OWS parameter )
      type       = token
      subtype    = token

    The type/subtype MAY be followed by parameters in the form of
    name=value pairs.

      parameter      = token "=" ( token / quoted-string )

    The type, subtype, and parameter name tokens are case-insensitive.
    Parameter values might or might not be case-sensitive, depending on
    the semantics of the parameter name.  The presence or absence of a
    parameter might be significant to the processing of a media-type,
    depending on its definition within the media type registry.

    A parameter value that matches the token production can be
    transmitted either as a token or within a quoted-string.  The quoted
    and unquoted values are equivalent.  For example, the following
    examples are all equivalent, but the first is preferred for
    consistency:

      text/html;charset=utf-8
      text/html;charset=UTF-8
      Text/HTML;Charset="utf-8"
      text/html; charset="utf-8"

According to https://stackoverflow.com/a/35879320, it seems
that there was prior confusion in early RFCs over this. It appears `,`
was a mistake, and that `;` should always be used.

Most people probably haven't run into this because in order to trigger
this bug:

1. The Sinatra app needs to insert a `;` in the call to `content_type`
(such as `content_type "text/plain; version=0.0.4"`). If you omit the
`;` in the `Content-Type`, then everything is fine.

2. The client that talks to the app needs to reject `,` in the
`Content-Type`. Golang's `mime.ParseMediaType` appears to reject
`Content-Type` values that contain `,` but with the introduction of
prometheus/prometheus#15136 Prometheus v3
started to fail hard when this occurred.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Built content_type are malformed

2 participants