ENH some steps to make cloudpickle dynamic function/classes more deterministic#524
ENH some steps to make cloudpickle dynamic function/classes more deterministic#524ogrisel merged 20 commits intocloudpipe:masterfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #524 +/- ##
==========================================
+ Coverage 96.08% 96.12% +0.04%
==========================================
Files 3 3
Lines 562 568 +6
Branches 116 123 +7
==========================================
+ Hits 540 546 +6
Misses 11 11
Partials 11 11 ☔ View full report in Codecov by Sentry. |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ogrisel
left a comment
There was a problem hiding this comment.
Here is a pass of feedback. I included some comments on my analysis of the xfailed test about string literals in class attributes. Interning is probably not a safe solution in this case...
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
|
I found a better solution to string interning: prevent memoization for problematic strings that might be interned or not. For the use of |
ogrisel
left a comment
There was a problem hiding this comment.
More feedback on the new strategy.
| # or not. This is not guaranteed for reconstructed dynamic function, so we make | ||
| # sure it is never interned. | ||
| "__name__": "".join(func.__name__), | ||
| "__qualname__": func.__qualname__, |
There was a problem hiding this comment.
Any idea why we don't need this treatment for the __qualname__ value nor for the keys of func.__dict__?
There was a problem hiding this comment.
Actually, at least for the latter, there are still things to solver. I will push a slight change to the test to show what I mean.
There was a problem hiding this comment.
The issue with string interning is mostly due to collision between the string defined in two different places: typically method names that are interned as class attributes, vs the dynamic func name.
So I would say this won't happen for __qualname__ as this is only present here but maybe I am wrong.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ogrisel
left a comment
There was a problem hiding this comment.
Can you please add a changelog entry?
There seems to be a leftover call to sys.intern (see below).
Also, I would rather avoid duplicating variations of the same inline comment all over the place and instead make them all point to the inline comment of _class_setstate to limit redundancy and make it more maintainable.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ogrisel
left a comment
There was a problem hiding this comment.
LGTM with the missing changelog entry.
|
Thanks @tomMoral! |
cloudpicklestream, leads to the same pickle file:One case seems unsolvable: when updating the class state, all
attrnamefor a class are interned. This is why we are also interning the dynamic func names, to be consistent between their name and the method names (all interned).But if some values in the pickle stream (anywhere) have the same value as one
attrname, then these two strings which are the same in the original pickle file become different in the unpickled objects. Typically, pickling the following original dynamic class results in memorization ofjoinbetween the method name and the argument value, while pickling the unpickled class leads to no memorization:arg_1 = "join"value is not interned. In contrast, thejoinmethod name is.Not sure if there is a solution for this one.
Note that this test can be run locally with
pytest -vv --runxfail -k determinist_subworker_str_interningRelated issues: #510, #453.