[XML] properly handle `normalizedString` & `token`

CycloneDX uses http://www.w3.org/2001/XMLSchema - which [defines `normalizedString`](http://www.w3.org/TR/xmlschema-2/#normalizedString) as follows:

```xml
<xs:simpleType name="normalizedString" id="normalizedString">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#normalizedString"/>
  </xs:annotation>
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="replace" id="normalizedString.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>
```

> normalizedString represents white space normalized strings. The [·value space·](https://www.w3.org/TR/xmlschema-2/#dt-value-space) of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The [·lexical space·](https://www.w3.org/TR/xmlschema-2/#dt-lexical-space) of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The [·base type·](https://www.w3.org/TR/xmlschema-2/#dt-basetype) of normalizedString is [string](https://www.w3.org/TR/xmlschema-2/#string).

----


CycloneDX uses http://www.w3.org/2001/XMLSchema - which [defines `token`](http://www.w3.org/TR/xmlschema-2/#token) as follows:

```xml
<xs:simpleType name="token" id="token">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#token"/>
  </xs:annotation>
  <xs:restriction base="xs:normalizedString">
    <xs:whiteSpace value="collapse" id="token.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>
```

> token represents tokenized strings. The [·value space·](https://www.w3.org/TR/xmlschema-2/#dt-value-space) of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The [·lexical space·](https://www.w3.org/TR/xmlschema-2/#dt-lexical-space) of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The [·base type·](https://www.w3.org/TR/xmlschema-2/#dt-basetype) of token is [normalizedString](https://www.w3.org/TR/xmlschema-2/#normalizedString).

----

therefore, on XML-normalization for `normalizedString`, the following chars must be replaced by space(` `):
- carriage return: `\r` (#xD)
- line feed: `\n` (#xA)
- tab: `\t` (#x9)

Therefore, on XML-normalization for `token`, the following must aplpy:
- all from above
- consecutive spaces are collapsed to one space.
- leading and trialing spaces are truncated

Affected are only fields that are defined as `normalizedString` respective `token` in XML spec!
Other field MUST NOT be affected!

----

checklist:
- [ ] have implementation 
- [ ] have tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XML] properly handle `normalizedString` & `token` #1098

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[XML] properly handle normalizedString & token #1098

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[XML] properly handle `normalizedString` & `token` #1098