Skip to content

Conversation

@KlaudiuszRydzy
Copy link
Contributor

@KlaudiuszRydzy KlaudiuszRydzy commented Jul 10, 2024

Description

Enhanced the serialization class to include third party serializer support with numpy, pandas, and polars.
Updated serializer_test.py to account for the new serializers, and updated the dev imports to include support
for the new serializers.

Fixes

Type of Change

  • Breaking Change (fix or enhancement which changes existing semantics of the public interface)
  • Enhancement (new features or improvements to existing functionality)
  • Bug (fixes for a bug or issue)
  • Internal (refactoring, style changes, testing, optimizations)
  • Documentation update (changes to documentation or examples)
  • Package (dependencies, versions, package metadata)
  • Development (CI workflows, pre-commit, linters, templates)
  • Security (security related changes)

Testing

Serialization tests have been expanded, and we tested the new performance here: KlaudiuszRydzy/ProxyStore-Serialization@56718a7.

With 1GB data types, numpy serialization is ~40% faster, pandas serialization is ~50% faster, and polars serialization is ~40% faster. General data types that were pickled have the same performance as before.

Pull Request Checklist

Please confirm the PR meets the following requirements.

  • Tags added to PR (e.g., breaking, bug, enhancement, internal, documentation, package, development, security).
  • Code changes pass pre-commit (e.g., mypy, ruff, etc.).
  • Tests have been added to show the fix is effective or that the new feature works.
  • New and existing unit tests pass locally with the changes.
  • Docs have been updated and reviewed if relevant.

@KlaudiuszRydzy
Copy link
Contributor Author

in main branch

(proxyenv) klaudiuszrydzy@Klaudiuszs-MacBook-Air proxystore % python -m timeit -n 10 -s "import numpy; from proxystore.serialize import serialize; x = numpy.random.rand(10000, 10000)" "serialize(x)"
10 loops, best of 5: 411 msec per loop
(proxyenv) klaudiuszrydzy@Klaudiuszs-MacBook-Air proxystore % python -m timeit -n 10 -s "import io, numpy; x = numpy.random.rand(10000, 10000)" "output = io.BytesIO(); numpy.save(output, x); output.getbuffer()"
10 loops, best of 5: 193 msec per loop

in new branch

(proxyenv) klaudiuszrydzy@Klaudiuszs-MacBook-Air proxystore % python -m timeit -n 10 -s "import numpy; from proxystore.serialize import serialize; x = numpy.random.rand(10000, 10000)" "serialize(x)"
10 loops, best of 5: 359 msec per loop
(proxyenv) klaudiuszrydzy@Klaudiuszs-MacBook-Air proxystore % python -m timeit -n 10 -s "import io, numpy; x = numpy.random.rand(10000, 10000)" "output = io.BytesIO(); numpy.save(output, x); output.getbuffer()"
10 loops, best of 5: 206 msec per loop

@gpauloski gpauloski changed the title Serialization update Add custom serialization for NumPy, Pandas, and Polars types Jul 10, 2024
@gpauloski gpauloski added the enhancement New features or improvements to existing functionality label Jul 10, 2024
KlaudiuszRydzy and others added 3 commits July 15, 2024 12:26
Switched to reading/writing from io.BytesIO buffers to reduces copies
when appending/partitioning identifiers. Added custom serializers for
numpy, pandas, and polars types to improve performance where possible.

Related to proxystore#591
@gpauloski gpauloski force-pushed the serialization-update branch from f0457a3 to 60bc619 Compare July 15, 2024 17:27
@gpauloski gpauloski changed the title Add custom serialization for NumPy, Pandas, and Polars types Improve serialization performance with buffers and custom type support Jul 15, 2024
@gpauloski gpauloski merged commit 8f7b401 into proxystore:main Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New features or improvements to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve serialization of common known third-party types

2 participants