-
-
Notifications
You must be signed in to change notification settings - Fork 32.9k
gh-90949: add Expat API to prevent XML deadly allocations #139234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3fa5b10 to
9c7371f
Compare
|
I plan to also expose the billion laugh mitigation API, but in a separate PR. |
|
@picnixz thanks for taking on this project — very much appreciated! 🙏
@picnixz there is mention of the activation threshold in the final note of https://libexpat.github.io/doc/api/latest/#XML_SetAllocTrackerMaximumAmplification but I'm happy if the topic gets more attention in CPython docs. I already started review and spotted a few things. I will likely submit a first round of review some time later today. |
I assume you are actually talking about this:
I actually understood it only as "if you use a smaller factor then in practice, you might want to change the activation threshold" but I didn't understand it as "the amplification factor is only used if we exceed the threshold in the first place". I assumed from the note on
|
Thank you! I'll hold the work on the billion laugh API as it'll be easier to add the API once this one is merged. |
4ba478d to
d636685
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picnixz here's everything I found so far. Happy to learn what I missed in these findings 🍻
Modules/pyexpat.c
Outdated
| } | ||
| #endif | ||
|
|
||
| #if XML_COMBINED_VERSION >= 20702 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that Expat >=2.7.2 is needed to offer this functionality, but maybe the the Python API should be available in any case and raise exceptions that the compile time Expat was not recent enough to support this feature. Is that what's happening? I remember we had this very question when introducing SetReparseDeferralEnabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument Clinic will handle this automatically by "not" having the function (an AttributeError will be raised). For SetReparseDeferralEnabled, it appears that the API call will be a no-op otherwise.
Maybe it's better to raise NotImplementedError instead of the default AttributeError, as we usually do in ssl when the libssl backend is not the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picnixz I'd like to vote for NotImplementedError or no-op rather than AttributeError for now. I'll need to sleep over this and potentially dig up past communication on the reparse deferall thing. There was past rationale on this somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a NotImplementedError over a no-op, as otherwise people might assume that they are protected against some attacks while they are not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picnixz I should note that protection is active by default (or it would not have fixed the vulnerability). So a no-op in the tuning function would leave them protected under default settings, not unprotected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but not necessarily for all what they want. If the default setting is too lax (for whatever reason), then they would be unprotected for them. I prefer that users know that whatever protection they chose is in effect in an explicit way.
Lib/test/test_pyexpat.py
Outdated
| p.SetAllocTrackerActivationThreshold(0) | ||
| # At runtime, the peak amplification factor is 101.71, | ||
| # which is above the default threshold (100.0). | ||
| msg = re.escape("out of memory: line 3, column 15") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to note that billion laughs protection (in default configuration) hits before the allocation limiter e.g. for the example payload from wikipedia. As a result, checking for "out of memory" is vital (and likely only succeeds because of the p.SetAllocTrackerActivationThreshold(0) further up). So good call 👍
@picnixz I confirm, yes, exactly 👍
I see. I think that means that upstream docs should be adjusted some way to make it easier to read that (or them) as interconnected. How about I try a related pull request upstream and you take the review seat there?
Understood, good plan 👍 |
c5a1590 to
64af05c
Compare
2e61140 to
b01e53d
Compare
|
I think I'm done with the changes. Thanks for the review! I'm going offline now, but I'll continue tomorrow.
Feel free to ping me! |
| "hierarchy.\n" | ||
| "\n" | ||
| "The \'max_factor\' value must be a non-NaN floating point value greater than\n" | ||
| "or equal to 1.0. Amplifications factors greater than 100 can been observed\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "or equal to 1.0. Amplifications factors greater than 100 can been observed\n" | |
| "or equal to 1.0. Amplifications factors greater than 100.0 can been observed\n" |
(for consistency?)
| hierarchy. | ||
| The 'max_factor' value must be a non-NaN floating point value greater than | ||
| or equal to 1.0. Amplifications factors greater than 100 can been observed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| or equal to 1.0. Amplifications factors greater than 100 can been observed | |
| or equal to 1.0. Amplifications factors greater than 100.0 can been observed |
(for consistency?)
| of bytes of dynamic memory allocated in the parser hierarchy. | ||
|
|
||
| The *max_factor* value must be a non-NaN :class:`float` value greater than | ||
| or equal to 1.0. Amplifications factors greater than 100 can been observed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| or equal to 1.0. Amplifications factors greater than 100 can been observed | |
| or equal to 1.0. Amplifications factors greater than 100.0 can been observed |
(for consistency?)
| self.assertIsNone(p.SetAllocTrackerMaximumAmplification(10_000)) | ||
| self.assertIsNotNone(p.Parse(payload, True)) | ||
|
|
||
| def test_set_alloc_tracker_maximum_amplification_infty(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_set_alloc_tracker_maximum_amplification_infty(self): | |
| def test_set_alloc_tracker_maximum_amplification_infinity(self): |
(just an idea, and because git grep infty did not return any matches)
| Sets the maximum amplification factor between direct input and bytes of dynamic memory allocated. | ||
| By default, parsers objects have a maximum amplification factor of 100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| By default, parsers objects have a maximum amplification factor of 100. | |
| By default, parser objects have a maximum amplification factor of 100.0. |
(singular + for consistency?)
| Sets the maximum amplification factor between direct input and bytes | ||
| of dynamic memory allocated. | ||
|
|
||
| By default, parsers objects have a maximum amplification factor of 100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| By default, parsers objects have a maximum amplification factor of 100. | |
| By default, parser objects have a maximum amplification factor of 100.0. |
(singular + for consistency?)
| Sets the number of allocated bytes of dynamic memory needed to activate | ||
| protection against disproportionate use of RAM. | ||
|
|
||
| By default, parsers objects have an allocation activation threshold of 64 MiB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| By default, parsers objects have an allocation activation threshold of 64 MiB, | |
| By default, parser objects have an allocation activation threshold of 64 MiB, |
Singular
| Sets the number of allocated bytes of dynamic memory needed to activate protection against disproportionate use of RAM. | ||
| By default, parsers objects have an allocation activation threshold of 64 MiB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| By default, parsers objects have an allocation activation threshold of 64 MiB. | |
| By default, parser objects have an allocation activation threshold of 64 MiB. |
(singular)
@hartwork I observed that the maximum allocation factor is only checked if we exceed the activation threshold. The online docs for Expat didn't explicitly mention it so I mentioned it in our docs.
I eventually decided to redo the float-check in case of an error so that we can give a better message to the user (as it relates to security practices, I think it's good to explain why the API call failed as much as we can).
📚 Documentation preview 📚: https://cpython-previews--139234.org.readthedocs.build/