What happened?
If an object of dynamic class is serialized and deserialized multiple times through cloudpickle's dump and load, the subsequent deserialization overwrite the class states of the class for the deserialized objects.
This behavior was confirmed with pickle_dump.py and pickle_load.py. The dump script serializes an object of a dynamic DoFn class defined within a function. The load script then deserializes these bytes twice, and prints the original function information of the method on_window_end_timer for the first object.
The result shows the methods in the class was changed.
I believe this is because cloudpickle reuses the class at _lookup_class_or_track for the same class tracker id, however _class_setstate always updates the class states although the class is a reused one. Note that the _class_setstate returned as 6th tuple item of reducer_override, which is called at pickle loading instead of __setstate__. See reducer_override and __reduce__
IIUC, This issue can cause unexpected KeyError or ValueError with TimerSpec in Dataflow Python jobs with Apache Beam 2.65.0.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
What happened?
If an object of dynamic class is serialized and deserialized multiple times through
cloudpickle'sdumpandload, the subsequent deserialization overwrite the class states of the class for the deserialized objects.This behavior was confirmed with pickle_dump.py and pickle_load.py. The dump script serializes an object of a dynamic
DoFnclass defined within a function. The load script then deserializes these bytes twice, and prints the original function information of the methodon_window_end_timerfor the first object.The result shows the methods in the class was changed.
I believe this is because
cloudpicklereuses the class at_lookup_class_or_trackfor the same class tracker id, however_class_setstatealways updates the class states although the class is a reused one. Note that the_class_setstatereturned as 6th tuple item ofreducer_override, which is called at pickle loading instead of__setstate__. Seereducer_overrideand__reduce__IIUC, This issue can cause unexpected
KeyErrororValueErrorwithTimerSpecin Dataflow Python jobs with Apache Beam 2.65.0.Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components