gh-90949: add Expat API to prevent XML deadly allocations #139234

picnixz · 2025-09-22T12:53:14Z

@hartwork I observed that the maximum allocation factor is only checked if we exceed the activation threshold. The online docs for Expat didn't explicitly mention it so I mentioned it in our docs.

I eventually decided to redo the float-check in case of an error so that we can give a better message to the user (as it relates to security practices, I think it's good to explain why the API call failed as much as we can).

Issue: expose expat XML billion laughs attack mitigation APIs #90949

📚 Documentation preview 📚: https://cpython-previews--139234.org.readthedocs.build/

picnixz · 2025-09-22T16:25:11Z

I plan to also expose the billion laugh mitigation API, but in a separate PR.

Lib/test/test_pyexpat.py

hartwork · 2025-09-22T16:49:06Z

@picnixz thanks for taking on this project — very much appreciated! 🙏

@hartwork I observed that the maximum allocation factor is only checked if we exceed the activation threshold. The online docs for Expat didn't explicitly mention it so I mentioned it in our docs.

@picnixz there is mention of the activation threshold in the final note of https://libexpat.github.io/doc/api/latest/#XML_SetAllocTrackerMaximumAmplification but I'm happy if the topic gets more attention in CPython docs.

I already started review and spotted a few things. I will likely submit a first round of review some time later today.

picnixz · 2025-09-22T17:05:04Z

there is mention of the activation threshold in the final note of

I assume you are actually talking about this:

So if you do reduce the maximum allowed amplification, please make sure that the activation threshold is still big enough to not end up with undesired false positives (i.e. benign files being rejected).

I actually understood it only as "if you use a smaller factor then in practice, you might want to change the activation threshold" but I didn't understand it as "the amplification factor is only used if we exceed the threshold in the first place". I assumed from the note on XML_SetAllocTrackerActivationThreshold that the two values were totally independent and could have two different protections:

Note: For types of allocations that intentionally bypass tracking and limiting, please see XML_SetAllocTrackerMaximumAmplification above.

picnixz · 2025-09-22T17:05:35Z

I already started review and spotted a few things. I will likely submit a first round of review some time later today.

Thank you! I'll hold the work on the billion laugh API as it'll be easier to add the API once this one is merged.

hartwork

@picnixz here's everything I found so far. Happy to learn what I missed in these findings 🍻

Doc/library/pyexpat.rst

Lib/test/test_pyexpat.py

hartwork · 2025-09-22T17:18:44Z

Modules/pyexpat.c

 }
 #endif

+#if XML_COMBINED_VERSION >= 20702


I understand that Expat >=2.7.2 is needed to offer this functionality, but maybe the the Python API should be available in any case and raise exceptions that the compile time Expat was not recent enough to support this feature. Is that what's happening? I remember we had this very question when introducing SetReparseDeferralEnabled.

Argument Clinic will handle this automatically by "not" having the function (an AttributeError will be raised). For SetReparseDeferralEnabled, it appears that the API call will be a no-op otherwise.

Maybe it's better to raise NotImplementedError instead of the default AttributeError, as we usually do in ssl when the libssl backend is not the latest.

@picnixz I'd like to vote for NotImplementedError or no-op rather than AttributeError for now. I'll need to sleep over this and potentially dig up past communication on the reparse deferall thing. There was past rationale on this somewhere.

I would prefer a NotImplementedError over a no-op, as otherwise people might assume that they are protected against some attacks while they are not.

@picnixz I should note that protection is active by default (or it would not have fixed the vulnerability). So a no-op in the tuning function would leave them protected under default settings, not unprotected.

Yes, but not necessarily for all what they want. If the default setting is too lax (for whatever reason), then they would be unprotected for them. I prefer that users know that whatever protection they chose is in effect in an explicit way.

Modules/pyexpat.c

hartwork · 2025-09-22T17:25:36Z

Lib/test/test_pyexpat.py

+        p.SetAllocTrackerActivationThreshold(0)
+        # At runtime, the peak amplification factor is 101.71,
+        # which is above the default threshold (100.0).
+        msg = re.escape("out of memory: line 3, column 15")


I would like to note that billion laughs protection (in default configuration) hits before the allocation limiter e.g. for the example payload from wikipedia. As a result, checking for "out of memory" is vital (and likely only succeeds because of the p.SetAllocTrackerActivationThreshold(0) further up). So good call 👍

Lib/test/test_pyexpat.py

Doc/library/pyexpat.rst

hartwork · 2025-09-22T18:11:16Z

I assume you are actually talking about this:

So if you do reduce the maximum allowed amplification, please make sure that the activation threshold is still big enough to not end up with undesired false positives (i.e. benign files being rejected).

@picnixz I confirm, yes, exactly 👍

I actually understood it only as "if you use a smaller factor then in practice, you might want to change the activation threshold" but I didn't understand it as "the amplification factor is only used if we exceed the threshold in the first place". I assumed from the note on XML_SetAllocTrackerActivationThreshold that the two values were totally independent and could have two different protections:

I see. I think that means that upstream docs should be adjusted some way to make it easier to read that (or them) as interconnected. How about I try a related pull request upstream and you take the review seat there?

I'll hold the work on the billion laugh API as it'll be easier to add the API once this one is merged.

Understood, good plan 👍

picnixz · 2025-09-22T19:19:30Z

I think I'm done with the changes. Thanks for the review! I'm going offline now, but I'll continue tomorrow.

I see. I think that means that upstream docs should be adjusted some way to make it easier to read that (or them) as interconnected. How about I try a related pull request upstream and you take the review seat there?

Feel free to ping me!

hartwork · 2025-09-22T22:05:17Z

Modules/clinic/pyexpat.c.h

+"hierarchy.\n"
+"\n"
+"The \'max_factor\' value must be a non-NaN floating point value greater than\n"
+"or equal to 1.0. Amplifications factors greater than 100 can been observed\n"


Suggested change

"or equal to 1.0. Amplifications factors greater than 100 can been observed\n"

"or equal to 1.0. Amplifications factors greater than 100.0 can been observed\n"

(for consistency?)

hartwork · 2025-09-22T22:05:46Z

Modules/pyexpat.c

+hierarchy.
+
+The 'max_factor' value must be a non-NaN floating point value greater than
+or equal to 1.0. Amplifications factors greater than 100 can been observed


Suggested change

or equal to 1.0. Amplifications factors greater than 100 can been observed

or equal to 1.0. Amplifications factors greater than 100.0 can been observed

(for consistency?)

hartwork · 2025-09-22T22:06:05Z

Doc/library/pyexpat.rst

+   of bytes of dynamic memory allocated in the parser hierarchy.
+
+   The *max_factor* value must be a non-NaN :class:`float` value greater than
+   or equal to 1.0. Amplifications factors greater than 100 can been observed


Suggested change

or equal to 1.0. Amplifications factors greater than 100 can been observed

or equal to 1.0. Amplifications factors greater than 100.0 can been observed

(for consistency?)

hartwork · 2025-09-22T22:12:10Z

Lib/test/test_pyexpat.py

+        self.assertIsNone(p.SetAllocTrackerMaximumAmplification(10_000))
+        self.assertIsNotNone(p.Parse(payload, True))
+
+    def test_set_alloc_tracker_maximum_amplification_infty(self):


Suggested change

def test_set_alloc_tracker_maximum_amplification_infty(self):

def test_set_alloc_tracker_maximum_amplification_infinity(self):

(just an idea, and because git grep infty did not return any matches)

hartwork · 2025-09-23T00:33:32Z

Modules/pyexpat.c

+
+Sets the maximum amplification factor between direct input and bytes of dynamic memory allocated.
+
+By default, parsers objects have a maximum amplification factor of 100.


Suggested change

By default, parsers objects have a maximum amplification factor of 100.

By default, parser objects have a maximum amplification factor of 100.0.

(singular + for consistency?)

hartwork · 2025-09-23T00:33:53Z

Doc/library/pyexpat.rst

+   Sets the maximum amplification factor between direct input and bytes
+   of dynamic memory allocated.
+
+   By default, parsers objects have a maximum amplification factor of 100.


Suggested change

By default, parsers objects have a maximum amplification factor of 100.

By default, parser objects have a maximum amplification factor of 100.0.

(singular + for consistency?)

hartwork · 2025-09-23T00:34:53Z

Doc/library/pyexpat.rst

+   Sets the number of allocated bytes of dynamic memory needed to activate
+   protection against disproportionate use of RAM.
+
+   By default, parsers objects have an allocation activation threshold of 64 MiB,


Suggested change

By default, parsers objects have an allocation activation threshold of 64 MiB,

By default, parser objects have an allocation activation threshold of 64 MiB,

Singular

hartwork · 2025-09-23T00:35:22Z

Modules/pyexpat.c

+
+Sets the number of allocated bytes of dynamic memory needed to activate protection against disproportionate use of RAM.
+
+By default, parsers objects have an allocation activation threshold of 64 MiB.


Suggested change

By default, parsers objects have an allocation activation threshold of 64 MiB.

By default, parser objects have an allocation activation threshold of 64 MiB.

(singular)

picnixz added 4 commits September 22, 2025 14:51

Expose XML Expat 2.7.2 mitigation APIs

c1c23fb

add tests

12bef9c

docs

1d7e599

NEWS

3dcd9bd

picnixz requested a review from AA-Turner as a code owner September 22, 2025 12:53

picnixz added the topic-XML label Sep 22, 2025

bedevere-app bot added the awaiting core review label Sep 22, 2025

bedevere-app bot mentioned this pull request Sep 22, 2025

expose expat XML billion laughs attack mitigation APIs #90949

Open

picnixz added 4 commits September 22, 2025 17:04

Merge branch 'main' into feat/xml/mitigation-api-90949

192fe08

fix docs

0ecbd55

fix tests

07445ad

regen SBOM

9c7371f

picnixz force-pushed the feat/xml/mitigation-api-90949 branch from 3fa5b10 to 9c7371f Compare September 22, 2025 15:23

picnixz requested a review from sethmlarson as a code owner September 22, 2025 15:23

picnixz added 3 commits September 22, 2025 18:43

remove unused include

1085584

fix possible error handling

c10fe91

undef macro after usage

18d175f

picnixz commented Sep 22, 2025

View reviewed changes

Lib/test/test_pyexpat.py Outdated Show resolved Hide resolved

Update Lib/test/test_pyexpat.py

911b2b7

update comments

d636685

picnixz force-pushed the feat/xml/mitigation-api-90949 branch from 4ba478d to d636685 Compare September 22, 2025 17:12

hartwork reviewed Sep 22, 2025

View reviewed changes

hartwork mentioned this pull request Sep 22, 2025

xmlwf: Resolve use of functions XML_GetErrorLineNumber and XML_GetErrorColumnNumber libexpat/libexpat#1053

Merged

picnixz added 2 commits September 22, 2025 20:37

use better test names

b951065

simplify roles usage

e11bf14

picnixz added 4 commits September 22, 2025 20:39

prevent reparse deferral of Expat to blow up

3e45613

test better numeric values

fb83fb5

update docs

7f91f2e

avoid deprecated XML_GetError{Line,Column}Number

64af05c

picnixz force-pushed the feat/xml/mitigation-api-90949 branch from c5a1590 to 64af05c Compare September 22, 2025 19:12

raise NotImplementedError for unavailable mitigation APIs

b01e53d

picnixz force-pushed the feat/xml/mitigation-api-90949 branch from 2e61140 to b01e53d Compare September 22, 2025 19:17

hartwork reviewed Sep 22, 2025

View reviewed changes

hartwork reviewed Sep 23, 2025

View reviewed changes

	"or equal to 1.0. Amplifications factors greater than 100 can been observed\n"
	"or equal to 1.0. Amplifications factors greater than 100.0 can been observed\n"

	or equal to 1.0. Amplifications factors greater than 100 can been observed
	or equal to 1.0. Amplifications factors greater than 100.0 can been observed

	def test_set_alloc_tracker_maximum_amplification_infty(self):
	def test_set_alloc_tracker_maximum_amplification_infinity(self):


		Sets the maximum amplification factor between direct input and bytes of dynamic memory allocated.

		By default, parsers objects have a maximum amplification factor of 100.

	By default, parsers objects have a maximum amplification factor of 100.
	By default, parser objects have a maximum amplification factor of 100.0.

	By default, parsers objects have an allocation activation threshold of 64 MiB,
	By default, parser objects have an allocation activation threshold of 64 MiB,


		Sets the number of allocated bytes of dynamic memory needed to activate protection against disproportionate use of RAM.

		By default, parsers objects have an allocation activation threshold of 64 MiB.

Uh oh!

gh-90949: add Expat API to prevent XML deadly allocations #139234

Are you sure you want to change the base?

gh-90949: add Expat API to prevent XML deadly allocations #139234

Conversation

picnixz commented Sep 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Sep 22, 2025

Uh oh!

Uh oh!

hartwork commented Sep 22, 2025

Uh oh!

picnixz commented Sep 22, 2025

Uh oh!

picnixz commented Sep 22, 2025

Uh oh!

hartwork left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hartwork Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

picnixz Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hartwork commented Sep 22, 2025

Uh oh!

picnixz commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hartwork Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hartwork Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

picnixz commented Sep 22, 2025 •

edited by github-actions bot

Loading

hartwork Sep 22, 2025 •

edited

Loading

picnixz Sep 22, 2025 •

edited

Loading

picnixz commented Sep 22, 2025 •

edited

Loading

hartwork Sep 23, 2025 •

edited

Loading

hartwork Sep 23, 2025 •

edited

Loading