-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH: Improve performance of np.save for small arrays
#18657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
_filter_header from _write_array_header.np.save for small arrays
|
Thanks! The documentation of Since this seems worth it... there isn't a version number that we can rely on having stripped headers when reading? I am just curious if we could skip this even on reading most of the time. IIRC, we do have different version numbers but only use the new one in the rare event that the old version is not able to write the array correctly? If that is the case, then it seems like its not worth skipping the header cleanup on read. |
eric-wieser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this line was pointless on Python3. I haven't checked to see if the CI failure is real.
|
Hi @seberg 👋 |
|
This could use a release note. See |
4cc5340 to
26ec1c7
Compare
|
@seberg @eric-wieser Ping. |
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy, with putting it in. Found something to nitpick, but there always is :).
We could name the improvement.rst also performance.rst, since we actually got a "performance" category in the release notes right now.
numpy/lib/format.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I doubt it is very useful, but its simple enough. If you are enthusiastic, maybe add a comment why we only do this on older "versions", though?
(Would be happy to just skip as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful because it lets me pair a write_array(..., version=(3,0) with read_array(..., version=(3,0)) and get the nice perf boost 😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, fair enough :). Thanks @ohadravid!
numpy/lib/format.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the sorted above looks correct to me. But I think d.keys() now prints slightly less nice? OTOH, this is de-facto an internal error so I am OK with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the sorted bash when generating the error :)
Improve performance of `np.save` by removing the call when writing the header, as it is known to be done in Python 3.
Improve performance of `np.load` for arrays with version >= (3,0) by removing the call, as it is known to be done in Python 3.
3942cdf to
6fc8d44
Compare
Improve performance of `np.load`.
Improve performance of
np.savefor small arrays by removing the call to_filter_headerwhen writing the header, as it is known to be done in Python 3.
We use
np.saveto save a lot of small arrays (which are 3D point, so are just 3 ints/floats),and we found out that
np.saveis about 20 times slower thanpickle.dumps.Profiling shows that most of the time is spent in
_filter_header,which (as I understand it) is not really needed anymore.
The result is about 10x improvement on my machine:
Before:
140 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)After:
14.4 µs ± 497 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)For reference, using
pickle.dumpsis:6.27 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)Edit: This PR also changes reading an array to only call this function for older versions which might have been created by Python 2 before this function was added in 2014 (which are still used by default),
resulting in a nice improvement as well:
Reading a v1 array (same as today):
141 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)Reading a v3 array now:
56 µs ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)37.3 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)(after more little improvements)