Array slicing HighLevelGraph layer#7655
Conversation
|
It's not super clear what I should do next: work on "bubbling" the high level graph layer up? Find a specific use case to benchmark? Cases like #5918 (comment) are unaffected by the changes here - should I try to target something like this? |
|
For some context, I've added these methods:
I can't really test things well without all four methods in place, so once I've got Also note: finding the length and/or html representation of the dask computation will materialize the full graph. I haven't changed this part yet. The good news is, it should be a lot simpler to calculate the length without materializing the task dictionary than it is for our overlapping array work. |
|
I have a mistake in my implementation, that means any unsliced region of the array causes an error on Eg: with this PR branch, I can do this: import dask.array as da
arr = da.ones((6,6), chunks=(3,3))
result = arr[:6, :6]
result.compute()but this gives me an error import dask.array as da
arr = da.ones((6,6), chunks=(3,3))
result = arr[:5, :5] # only some of the whole array is sliced
result.compute() # KeyError: ('ones-9321b8a9f4b4fe7e5d4e0edb7273ca60', (0, 1)) |
|
Update:
Here's a little experiment I did, putting a breakpoint in before this line in Line 121 in afb180f Dataframe shuffle example - workingThis is the dataframe shuffle example - it works fine: from dask.datasets import timeseries
ddf = timeseries().shuffle("id", shuffle="tasks")
ddf.compute()Here are all the relevant values in the Array slicing example - brokenThis is the array slicing example - it's broken, I need to figure out what went wrong import dask.array as da
x = da.ones((9, 9), chunks=(3, 3))
y = x[5:, 5:]
y.compute()Here are all the relevant values in the Possible leadsIt seems pretty fishy that there is one item in args for the working dataframe example, but there are two items in args for the array example. Perhaps that has something to do with why it's broken. |
|
The getitem syntax we need to achieve is: from operator import getitem
import dask.array as da
arr = da.ones((6,6), chunks=(3,3))
getitem(arr, (slice(5, None, None), slice(5, None, None))).compute()
# array([[1.]])Which is equivalent to: import dask.array as da
arr = da.ones((6,6), chunks=(3,3))
arr[5:, 5:].compute()
# array([[1.]]) |
Another debugging experiment: comparing the dask
|
|
The high level graph layer dependencies are the same for both (but that doesn't rule out me making a mistake when I'm trying to use them to generate the fully materialized task graph) >>> hlg.dependencies
{'array-d52eb27cbf828b7e8f8f94b1a6890e54': set(),
'getitem-bfd48e62fd5701fbd00c89a4c4b68984': {'array-d52eb27cbf828b7e8f8f94b1a6890e54'}}Also, the reported dependencies on the layers looks correct (and is the same for both the hlg.keys()
>>> hlg.keys()
dict_keys([('array-d52eb27cbf828b7e8f8f94b1a6890e54', 0, 0), ('array-d52eb27cbf828b7e8f8f94b1a6890e54', 0, 1), ('array-d52eb27cbf828b7e8f8f94b1a6890e54', 1, 0), ('array-d52eb27cbf828b7e8f8f94b1a6890e54', 1, 1), ('getitem-bfd48e62fd5701fbd00c89a4c4b68984', 0, 0)])
>>> key = ('getitem-bfd48e62fd5701fbd00c89a4c4b68984', 0, 0)
>>> layer1.get_dependencies(key, hlg.keys())
{('array-d52eb27cbf828b7e8f8f94b1a6890e54', 1, 1)}I'm a bit stumped, and would appreciate any suggestions for things to try next. |
|
Progress update: I have change the cull dependencies parts so we have a import numpy as np
import dask.array as da
x = da.from_array(np.arange(16).reshape((4,4)), chunks=(2,2))
y = x[3:, 3:]
y.compute()TODO next:
|
|
@ian-r-rose & @gjoseph92 I think this is the wrong approach. Here's why I think this:
I think the problem we want to solve is in a different place. I'm going to test something out & then write a short summary on that. |
|
I ran the test, unfortunately I don't have anything good to report here. We'll talk about it at our meeting tomorrow (or today, by the time you read this 🙂 ) |
ian-r-rose
left a comment
There was a problem hiding this comment.
This is looking very good.
dask/layers.py
Outdated
| out_name, | ||
| in_name, | ||
| blockdims, | ||
| new_blockdims, |
There was a problem hiding this comment.
This argument isn't currently used, though it could be used to implement get_output_keys(), I think.
Or parts_out could do it, which may be simpler. That would allow you to get rid of new_blockdims here.
dask/layers.py
Outdated
| This method does not require graph materialization. | ||
| """ | ||
| deps = defaultdict(set) | ||
| parts_out = parts_out or self._keys_to_parts(keys) |
There was a problem hiding this comment.
| parts_out = parts_out or self._keys_to_parts(keys) | |
| parts_out = parts_out or self._keys_to_output_parts(keys) |
There was a problem hiding this comment.
Since this code path isn't really used, we could also just make the arguments non-optional. In general I prefer non-nullable arguments unless there is a compelling reason for them
dask/layers.py
Outdated
| def _input_parts(self): | ||
| """Simple utility to get input chunk indices.""" | ||
| parts_in = set() | ||
| block_slices = self._get_block_slices(self.shape, self.blockdims, self.index) | ||
| _part = [] | ||
| for i in block_slices: | ||
| _part += sorted(i.keys()) | ||
| parts_in.add(tuple(_part)) | ||
| return parts_in |
There was a problem hiding this comment.
Sadly, no. This is the bit that's currently broken & I can't see a way to fix it without using the itertools product. (I know I said I fixed it, I was wrong. It only happened to work well for the one specific example I used as a test case)
I might have explained myself poorly today, perhaps this summary helps:
- You can see how this implementation breaks things, but choosing any slicing example that will return values from more than one input chunk. Eg;
import numpy as np
import dask.array as da
x = da.from_array(np.arange(16).reshape((4,4)), chunks=(2,2))
y = x[1:, 1:]
y.compute() # fails with the implementation- The itertools product function is needed so we list all of the necessary input chunks (all the combinations from our 1d chunk results).
- This means there's not a lot of logic left that only happens in the
_construct_graphfunction. That's why I was concerned about how useful this is (but I hadn't been considering your point about the benefits for nested slicing)
There was a problem hiding this comment.
I've just pushed a commit that replaces
def _input_parts(self):
"""Simple utility to get input chunk indices."""
parts_in = set()
block_slices = self._get_block_slices(self.shape, self.blockdims, self.index)
_part = []
for i in block_slices:
_part += sorted(i.keys())
parts_in.add(tuple(_part))
return parts_inwith this instead
def _input_parts(self):
"""Simple utility to get input chunk indices."""
block_slices = self._get_block_slices(self.shape, self.blockdims, self.index)
parts_in = set(product(*[sorted(i.keys()) for i in block_slices]))
return parts_inNow I can make slices over multiple chunks again (yay, less broken), but it's not a great sign for the performance improvements we were hoping to get out of this. I haven't benchmarked anything yet though.
|
So python sets/dicts do not preserve order. Oops. Mucking that up is a large part of why I've twisted myself up into knots and wasted a lot of time. Now I can focus on the optimization. Currently there is one failing test, which I think is testing whether a nested slice of slices has the task graph optimized correctly. Obviously, that's the bit I need to do next. |
|
BTW, I played around a little with how to combine multiple nested slices into a single slice operation. (Note: there's still an error here you'll see if you stride forward and backwards with the slicing, I'll need to take into account the array length and remainder to get this part correct). https://gist.github.com/GenevieveBuckley/f913cef861a918c2e1c94862e7bdbc60 |
Sets don't but dictionaries do - it's mandated by the language spec since py37 (and was an implementation detail in py36) https://docs.python.org/3.7/library/stdtypes.html#dict.values |
|
Thanks @dhirschfeld. I remember assuming ordered dictionaries while working on a project with python 3.6, and hadn't realized it had changed since then. That's good to hear! |
|
I'm not seeing any kind of speed improvement, and I would imagine I would be seeing some kind of change by now if what we're doing here an effective approach. import dask.array as da
a = da.ones(1_000, chunks=1)
%timeit a[10:20].compute()
a = da.ones(10_000, chunks=1)
%timeit a[10:20].compute()
a = da.ones(100_000, chunks=1)
%timeit a[10:20].compute()
# and so on...Last time we spoke @ian-r-rose and @gjoseph92 suggested this work might eventually be useful for a nested "slice into a slice" situation. There'd need to be a bit more work done for that to happen, see #7655. But the reason I started this was because we wanted to make |
|
Quick comment, probably wrong. I'm not tracking this PR well enough to speak with any sort of authority, but this caught my eye:
Just so people are aware, implementing all of the corner cases for numpy-style indexing is very complex. I would recommend that people here not start walking in this direction. Instead there is a big pile of code in slicing.py which, although inscrutable, appears to be pretty much correct. I would love for us to find a way to reuse or wrap this code without having to reimplement it. |
Particularly since this is implemented in the "reuse or wrap this code without having to reimplement it" fashion, I'm not really surprised that end-to-end performance is the same. However, I would hope that %timeit a[10:20]is slightly faster than on main. And I'd expect the
Even if we don't see immediate performance improvements, I think there's value in a high-level Slicing layer. I don't see how we can one day turn off low-level optimizations for Array without moving slicing into HighLevelGraph layers. In particular, we'll need |
I haven't been following the details of this PR, but just wanted to chip in to say that this would be pretty valuable for us (as well as doing slice-of-slice optimisations at the high level). We currently rely on some hacks like cached properties that construct slices lazily so that we can avoid paying the construction cost if a sliced array never actually gets used. |
Ah, of course - that was a silly thing for me to do, thanks for pointing it out. Here we go, looking at just the graph construction stage. (Just to double check my understanding, this is graph construction, but not optimization or computation, do I have that right?) % timeit a[10:20]And even though we haven't specifically done anything special for slices of slices in this PR, I thought it might be fun to look at those times too. %timeit a[100:200][10:20]... and a slice of a slice of a slice... %timeit a[100:200][0:50][10:20] |
|
@bmerry is there a specific example you'd like us to look at/benchmark, just to keep your use case in mind while I work on this? |
|
Also, I've just been playing with Emil's benchmark script from here. Comparing the It's a difference of 24.7 milliseconds, which is approximately 5.7% faster. main branchslicing-HLG branch |
I'm not sure there is really anything specific - I don't have a standalone benchmark. I guess one aspect of our use case is that we have a multi-dimensional array, and if the graph construction can be made to take O(sum(num_chunks)) rather than O(product(num_chunks)) then I'd expect it to be a game-changer. While any performance improvement is appreciated, 5-10% improvements aren't going to remove our need to delay slicing until it's needed. |





Array slicing HighLevelGraph layer
FYI @ian-r-rose & @gjoseph92
I've made a start, inserting a HighLevelGraph into the
slice_slices_and_integersfunction ofdask.array.slicing.py.The next step is to somehow propagate this upwards through the rest of the slicing functions, so that it is only materialized if absolutely necessary (eg: to do fancy indexing, etc.)