Avoid large chunks from getitem with lists#6514
Conversation
mrocklin
left a comment
There was a problem hiding this comment.
I took a brief look. I don't understand this code well enough any more to have anything substantive to say though :/
|
Thanks for the quick review. Just to clarify one thing: is
referring to the changes here, or the slicing code in general. I'm happy to spend time cleaning up these changes to make things simpler if desired. |
|
Slicing code in general
…On Mon, Aug 17, 2020 at 6:21 AM Tom Augspurger ***@***.***> wrote:
Thanks for the quick review. Just to clarify one thing: is
I don't understand this code well enough
referring to the changes here, or the slicing code in general. I'm happy
to spend time cleaning up these changes to make things simpler if desired.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6514 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTD4N2I6PXPF5ZGWTFDSBEVGPANCNFSM4P7P2OXQ>
.
|
|
@jrbourbeau would you have a chance to glance through here? |
|
This is also the sort of thing that @jcrist might find reasonable to do
some hour when things are quiet at the cabin. He's also probably more
familiar with this code than most (at least to the extent that anyone is :)
)
…On Tue, Aug 18, 2020 at 11:56 AM Tom Augspurger ***@***.***> wrote:
@jrbourbeau <https://github.com/jrbourbeau> would you have a chance to
glance through here?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6514 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTGPWP7R2QCJANQJ273SBLFG7ANCNFSM4P7P2OXQ>
.
|
|
Fixed the merge conflict. cc @jcrist if you have a chance to review this (no rush). |
e06395f to
ddba6d3
Compare
jcrist
left a comment
There was a problem hiding this comment.
Overall this looks good to me, just one small question.
| # Check for chunks from the plan that would violate the user's | ||
| # configured chunk size. | ||
| nbytes = utils.parse_bytes(config.get("array.chunk-size")) | ||
| other_chunks = [chunks[i] for i in range(len(chunks)) if i != axis] |
There was a problem hiding this comment.
This code path might never be hit with unknown chunk sizes, but if it is, are unknown chunks properly handled (and if so, could you add a test for this case)?
There was a problem hiding this comment.
Thanks. It's not handled "correctly" in the sense that we avoid creating a large block, since we don't actually know what the size in bytes is. But it does at least compute, as long as you aren't slicing on an axis with unknown chunk sizes (that case hits #6586).
I've added a test to ensure that we at least cover this case.
|
I think that Jim's comment has been addressed. cc @dask/maintenance. |
|
Sorry for the long delay in response here @TomAugspurger . Let's go ahead and merge and hope for the best :) |
This partially reverts the changes made in dask#6514. It restores the old behavior with a warning that large chunks (10x array.chunk-size) are being produced. Additionally, it adds a new config option to control the behavior (`array.slicing.split-large-chunks`). Setting that to `False` silences the warnings and keeps the "old" behavior (one output block per input block touched, even if this makes a large output). Setting that to `True` silences the warning and restores the Dask 2.26 behavior of splitting. Closes dask#6646
This partially reverts the changes made in #6514. It restores the old behavior with a warning that large chunks (10x array.chunk-size) are being produced. Additionally, it adds a new config option to control the behavior (`array.slicing.split-large-chunks`). Setting that to `False` silences the warnings and keeps the "old" behavior (one output block per input block touched, even if this makes a large output). Setting that to `True` silences the warning and restores the Dask 2.26 behavior of splitting. Closes #6646
This partially reverts the changes made in dask#6514. It restores the old behavior with a warning that large chunks (10x array.chunk-size) are being produced. Additionally, it adds a new config option to control the behavior (`array.slicing.split-large-chunks`). Setting that to `False` silences the warnings and keeps the "old" behavior (one output block per input block touched, even if this makes a large output). Setting that to `True` silences the warning and restores the Dask 2.26 behavior of splitting. Closes dask#6646
This changes
Array[list_of_indices]to ensure that we don't generate a task graph that contains too large chunks (larger thanarray.chunk-size). The basic idea is toarray.chunk-size, and split thoseIf people familiar with slicing (like @jakirkham) have time I'd appreciate their thoughts on where I did this. It could possibly be done in either
takeorslicing_plan, buttakeseemed best since it has access to all the necessary information (the slices generated byslicing_plan, and the sizes of all the other chunks.Closes #6270