WIP, RFC: Optimize setitem with chunk equal to fill_value#367
WIP, RFC: Optimize setitem with chunk equal to fill_value#367jakirkham wants to merge 2 commits intozarr-developers:masterfrom jakirkham:opt_setitem_fill_value
Conversation
Matches how these lines are written in `_set_basic_selection_zd`.
Add a simple check to see if the key-value pair is just being set with a chunk equal to the fill value. If so, simply delete the key-value pair instead of storing a chunk that only contains the fill value. The Array will behave the same externally. However this will cutdown on the space require to store the Array. Also will make sure that copying one Array to another Array won't dramatically effect the storage size.
|
I have a very large, relatively sparse (lots of empty chunks) array. I would really like to avoid writing all those empty chunks to disk. So 👍 to this! |
| chunk[selection] = value | ||
|
|
||
| # clear chunk if it only contains the fill value | ||
| if np.all(np.equal(chunk, self._fill_value)): |
There was a problem hiding this comment.
This seems like it could sometimes be an expensive operation. So it should be optional.
There was a problem hiding this comment.
How large are typical chunk sizes for you?
There was a problem hiding this comment.
regardless of chunk size, in many cases the user knows in advance whether this check will ever be useful. For dense imaging data, this conditional will almost never evaluate to true; for sparse data, this conditional will often evaluate to true. So the user should have some discretion over whether this check occurs.
|
Thanks! Yeah that was the idea behind it. There was funkiness with Can give this another look 🙂 |
|
I actually assumed recently that this was the default. |
|
Nope, but I think we discussed recently how we might get this to work for |
|
I was also bitten by thinking that was the default! Would love to see this merged. |
|
@jakirkham I would be happy to move this PR forward, but I don't think I have push access to your fork. I made some changes (added a kwarg to the |
|
Sounds good. In that case feel free to send a PR to zarr-developers. Am going to close this out. |
Fixes #366
If a chunk only contains the
fill_value, delete the key-value pair for that chunk instead of storing the array. Should cutdown on storage space require for theArray. Also should improve the performance of copying oneArrayto another.TODO:
tox -e docs)