[Data] Make All Preprocessors Implement SerializablePreprocessorBase#61341
Merged
bveeramani merged 55 commits intoray-project:masterfrom Mar 4, 2026
Merged
[Data] Make All Preprocessors Implement SerializablePreprocessorBase#61341bveeramani merged 55 commits intoray-project:masterfrom
bveeramani merged 55 commits intoray-project:masterfrom
Conversation
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
… plan Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
…reprocessor field Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
…ty + remove preprocessor setter Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
…to make-prepro-serializable Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR implements serialization support for preprocessors that were not yet inheriting from SerializablePreprocessorBase. The changes add the required abstract methods _get_serializable_fields() and _set_serializable_fields() to enable CloudPickle-based serialization for all preprocessors.
Changes:
- Migrated 9 preprocessor classes to inherit from
SerializablePreprocessorBaseand implement serialization methods - Added comprehensive serialization tests for all migrated preprocessors
- Updated imports and added
@SerializablePreprocessordecorator with version and identifier metadata
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| python/ray/data/preprocessors/vectorizer.py | Added serialization support for HashingVectorizer and CountVectorizer |
| python/ray/data/preprocessors/transformer.py | Added serialization support for PowerTransformer |
| python/ray/data/preprocessors/torch.py | Added serialization support for TorchVisionPreprocessor |
| python/ray/data/preprocessors/tokenizer.py | Added serialization support for Tokenizer |
| python/ray/data/preprocessors/normalizer.py | Added serialization support for Normalizer |
| python/ray/data/preprocessors/hasher.py | Added serialization support for FeatureHasher |
| python/ray/data/preprocessors/discretizer.py | Added serialization support for CustomKBinsDiscretizer and UniformKBinsDiscretizer |
| python/ray/data/preprocessors/concatenator.py | Added serialization support for Concatenator |
| python/ray/data/preprocessors/chain.py | Added serialization support for Chain preprocessor |
| python/ray/data/tests/preprocessors/test_vectorizer.py | Added serialization tests for vectorizers |
| python/ray/data/tests/preprocessors/test_transformer.py | Added serialization test for PowerTransformer |
| python/ray/data/tests/preprocessors/test_torch.py | Added serialization test for TorchVisionPreprocessor |
| python/ray/data/tests/preprocessors/test_tokenizer.py | Added serialization test for Tokenizer |
| python/ray/data/tests/preprocessors/test_normalizer.py | Added serialization test for Normalizer |
| python/ray/data/tests/preprocessors/test_hasher.py | Added serialization test for FeatureHasher |
| python/ray/data/tests/preprocessors/test_discretizer.py | Added serialization tests for discretizers |
| python/ray/data/tests/preprocessors/test_concatenator.py | Added serialization test for Concatenator |
| python/ray/data/tests/preprocessors/test_chain.py | Added serialization test for Chain |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
…to make-prepro-serializable
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
goutamvenkat-anyscale
approved these changes
Mar 4, 2026
manhld0206
pushed a commit
to manhld0206/ray
that referenced
this pull request
Mar 5, 2026
…ay-project#61341) ## Description The `SerializablePreprocessorBase` abstract class declares functions for saving and loading preprocessor states and should be implemented by all preprocessors. This PR implements the abstract methods for all preprocessors that are not yet inheriting from this base class. ## Related issues Related to ray-project#61028 , which implemented backwards compatibility for legacy pickling layer. The `__setstate__` functions should be removed along with the deprecated `Predictor`, while `_get_serializable_fields` and `_set_serializable_fields` should be used instead for saving and loading preprocessor states in future iterations. ## Additional information Accompanied by new field serializing tests of all preprocessors involved. --------- Signed-off-by: Sirui Huang <ray.huang@anyscale.com> Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
bittoby
pushed a commit
to bittoby/ray
that referenced
this pull request
Mar 6, 2026
…ay-project#61341) ## Description The `SerializablePreprocessorBase` abstract class declares functions for saving and loading preprocessor states and should be implemented by all preprocessors. This PR implements the abstract methods for all preprocessors that are not yet inheriting from this base class. ## Related issues Related to ray-project#61028 , which implemented backwards compatibility for legacy pickling layer. The `__setstate__` functions should be removed along with the deprecated `Predictor`, while `_get_serializable_fields` and `_set_serializable_fields` should be used instead for saving and loading preprocessor states in future iterations. ## Additional information Accompanied by new field serializing tests of all preprocessors involved. --------- Signed-off-by: Sirui Huang <ray.huang@anyscale.com> Signed-off-by: bittoby <bittoby@users.noreply.github.com>
ParagEkbote
pushed a commit
to ParagEkbote/ray
that referenced
this pull request
Mar 10, 2026
…ay-project#61341) ## Description The `SerializablePreprocessorBase` abstract class declares functions for saving and loading preprocessor states and should be implemented by all preprocessors. This PR implements the abstract methods for all preprocessors that are not yet inheriting from this base class. ## Related issues Related to ray-project#61028 , which implemented backwards compatibility for legacy pickling layer. The `__setstate__` functions should be removed along with the deprecated `Predictor`, while `_get_serializable_fields` and `_set_serializable_fields` should be used instead for saving and loading preprocessor states in future iterations. ## Additional information Accompanied by new field serializing tests of all preprocessors involved. --------- Signed-off-by: Sirui Huang <ray.huang@anyscale.com> Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
ryanaoleary
pushed a commit
to ryanaoleary/ray
that referenced
this pull request
Mar 13, 2026
…ay-project#61341) ## Description The `SerializablePreprocessorBase` abstract class declares functions for saving and loading preprocessor states and should be implemented by all preprocessors. This PR implements the abstract methods for all preprocessors that are not yet inheriting from this base class. ## Related issues Related to ray-project#61028 , which implemented backwards compatibility for legacy pickling layer. The `__setstate__` functions should be removed along with the deprecated `Predictor`, while `_get_serializable_fields` and `_set_serializable_fields` should be used instead for saving and loading preprocessor states in future iterations. ## Additional information Accompanied by new field serializing tests of all preprocessors involved. --------- Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The
SerializablePreprocessorBaseabstract class declares functions for saving and loading preprocessor states and should be implemented by all preprocessors. This PR implements the abstract methods for all preprocessors that are not yet inheriting from this base class.Related issues
Related to #61028 , which implemented backwards compatibility for legacy pickling layer. The
__setstate__functions should be removed along with the deprecatedPredictor, while_get_serializable_fieldsand_set_serializable_fieldsshould be used instead for saving and loading preprocessor states in future iterations.Additional information
Accompanied by new field serializing tests of all preprocessors involved.