Skip to content

[XML] properly handle normalizedString & token #1098

@jkowalleck

Description

@jkowalleck

CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines normalizedString as follows:

<xs:simpleType name="normalizedString" id="normalizedString">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#normalizedString"/>
  </xs:annotation>
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="replace" id="normalizedString.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

normalizedString represents white space normalized strings. The ·value space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·lexical space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·base type· of normalizedString is string.


CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines token as follows:

<xs:simpleType name="token" id="token">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#token"/>
  </xs:annotation>
  <xs:restriction base="xs:normalizedString">
    <xs:whiteSpace value="collapse" id="token.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

token represents tokenized strings. The ·value space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·lexical space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·base type· of token is normalizedString.


therefore, on XML-normalization for normalizedString, the following chars must be replaced by space( ):

  • carriage return: \r (#xD)
  • line feed: \n (#xA)
  • tab: \t (#x9)

Therefore, on XML-normalization for token, the following must aplpy:

  • all from above
  • consecutive spaces are collapsed to one space.
  • leading and trialing spaces are truncated

Affected are only fields that are defined as normalizedString respective token in XML spec!
Other field MUST NOT be affected!


checklist:

  • have implementation
  • have tests

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomershelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions