Skip to content

Pickle is significantly slower than a memory copy #7544

@mrocklin

Description

@mrocklin

My machine copies memory at 5GB/s

In [1]: b = b'0' * 1000000000

In [2]: %time len(b[1:])
CPU times: user 139 ms, sys: 63.3 ms, total: 202 ms
Wall time: 202 ms
Out[2]: 999999999

But NumPy arrays only serialize at 2.5 GB/s

In [4]: import numpy as np

In [5]: x = np.random.randint(0, 255, dtype='u1', size=1000000000)  # 1GB

In [6]: import pickle

In [7]: %time len(pickle.dumps(x, protocol=-1))
CPU times: user 309 ms, sys: 96.2 ms, total: 405 ms
Wall time: 404 ms
Out[7]: 1000000161

Why the extra time?

Versions

Python 3.4, Linux, NumPy 1.11.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions