filed: fix python plugin crash on python <3.10#1889
Merged
BareosBot merged 4 commits intobareos:masterfrom Aug 8, 2024
Merged
Conversation
78e2c89 to
f409fcd
Compare
6 tasks
6 tasks
f409fcd to
cb8c9f2
Compare
I am not sure if this is necessary, but the documentation says: ``` The global interpreter lock need not be held, but may be held if it is necessary to serialize calls to this function. ``` But it does not elaborate on *when* it is necessary to serialize those calls. As such I will pretend that it is also necessary and use the gil to do so.
While it should technically be no problem to call the bareosfd capi without holding the gil, we have to keep in mind that we are still modifying shared global state and as such we need to serialize those calls. (Ab-)Using the gil to do so is the easiest way to fix these problems. I am not 100% sure how running multiple plugin jobs in parallel never ran into issues considering that handlePluginEvent was constantly clobbering the global "plugin context" in bareosfd. Its possible that im overlooking something.
Python uses a (Type, Name) -> Object cache for member lookups inside
of types. This cache was shared globally by all interpreters up until
3.10, where it was made to be per-interpreter instead:
```
// typeobject.c
static struct method_cache_entry method_cache[1 << MCACHE_SIZE_EXP];
```
The cache does sanity checking to make sure to only return the correct
answers:
```
if (method_cache[h].version == type->tp_version_tag &&
method_cache[h].name == name) {
method_cache_hits++;
return method_cache[h].value;
}
```
but the problem is that this does not correctly work with static
types: Their type id (type->tp_version_tag) is the same for all
interpreters since they all reuse the same type. This means that if
you ask for the same members in two different interpreters and those
member names are allocated *at the same address*, then you will get
the same object pointer back. This object pointer may not belong to
one of those interpreters though and in fact can be a dangling pointer
if the other interpreter was already killed.
This actually happened at a customer when accessing
`datetime.datetime.max` where the `Lookup(datetime, max)` call
actually returned a cached result from a previous interpreter. This
may have to do with string interning (which caused the 'max' name to
have the same address), but it could also have been a pure
coincidence.
Thankfully python offers a (stable api) call to clear the cache. This
bug can be worked around by just calling that function everytime we
destroy an interpreter so that the map never contains dangling
pointers. This obviously does not fix the problem completely but it
should at least fix it for cases where there are no concurrent python
jobs running.
Its possible that there are more suprises like this in the python code
base.
f6f7650 to
a42e3e3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pythons internal cache may contain dangling pointers to objects of already destroyed interpreters. This is thankfully very unlikely but it does still happen. As such we need to make sure to clear the cache whenever we destroy a subinterpreter.
For more details see the commit messages.
Thank you for contributing to the Bareos Project!
Please check
If you have any questions or problems, please give a comment in the PR.
Helpful documentation and best practices
Checklist for the reviewer of the PR (will be processed by the Bareos team)
Make sure you check/merge the PR using
devtools/pr-toolto have some simple automated checks run and a proper changelog record added.General
Source code quality