Add an rewrite rule for handling (getitem, (elemwise, x), i) by shoyer · Pull Request #410 · dask/dask

shoyer · 2015-07-11T07:06:19Z

These should be rewritten from (getitem, (elemwise, x), i) to (elemwise, (getitem, x, i)). In many cases, this translates into a significant reduction in IO.

This is particularly relevant for computed coordinates, which are ubiquitous in the fields of meteorology and occeanography. These case typically be calculated in a straightforward way from ufuncs, and we would like to support these using dask in xray (see pydata/xarray#462).

I'll note that this PR is not entirely complete yet -- there are a few other dask array operations that will require wrapping. Also, we would like to optimize the tasks resulting from stacking or concatenation followed by indexing for the same reasons (e.g., so we can apply a coordinate transformation after loading a multi-file dataset).

mrocklin · 2015-07-11T12:13:05Z

It's cool to see other people pick up the rewrite rule framework. This came up before when we first added it, but can we verify that adding these rules doesn't introduce substantial overhead in the common case?

I think that the way @jcrist implemented this it should be quite robust to matching against many rules but it'd be good to verify.

jcrist · 2015-07-11T13:41:14Z

Rewriting is O(n) in the length of the first pattern to be matched. If no match, it's a little harder to figure out due to backtracking (non-deterministic algorithm). I have another version that is deterministic, which would bound failing to find a match to O(n) longest pattern that could be matched. Either way, there is negligible overhead for many-to-one matching. In this case, there actually is no overhead, as the variation is in the functions instead of the args.

shoyer · 2015-07-11T14:38:04Z

I'll note that it would very nice to extend the matching system to automatically handle even unrecognized NumPy ufuncs (or ufunc like objects like those provided by Numba's vectorize). These have a nin attributre that specifies how many input arguments they take.

shoyer · 2015-07-12T22:59:39Z

@jcrist are you working on isinstance checks so I can register all ufuncs? or should I finish this up?

jcrist · 2015-07-12T23:03:25Z

I probably won't get to the pattern matcher improvements until later this week. We can always push a new PR cleaning up the rules you add here, or you can wait until this process is simplified.

These should be rewritten from `(getitem, (elemwise, x), i)` to `(elemwise, (getitem, x, i))`. In many cases, this translates into a significant reduction in IO. This is particularly relevant for computed coordinates, which are ubiquitous in the fields of meteorology and occeanography. These case typically be calculated in a straightforward way from ufuncs, and we would like to support these using dask in xray (see pydata/xarray#462).

shoyer · 2015-07-13T01:35:57Z

Another limitation of the current pattern matcher is that it has no way to recognize partial arguments. This means that it can't match in the presence of optional keyword arguments (usually written with partial, at least in the current version of dask).

It also occurs to me that my current logic is flawed, because It presumes that I want to apply getitem to all ufunc or binop arguments. In fact, these arguments only need to be broadcast compatible and could even include scalars (which cannot be indexed). So it seems we will need to figure out some tricks so that we can move the getitem inside elemwise in these cases.

mrocklin · 2015-07-13T21:36:12Z

I've resolved what was causing the python 3 errors in travis. The pains of developing in multiple git repositories at once.

Things should be smoother after a rebase/merge with master

shoyer · 2015-08-31T00:02:52Z

I'd like revisit this, to follow up on my refactor of elemwise in #622.

My plan is to broadcast all arrays before calling atop (e.g., using broadcast_to). Then, this pattern matching rule can move the indexing inside the element-wise operations. I don't plan to turn broadcast scalar arguments, because this would cause a discrepancy with NumPy's dtype promotion rules, which special case scalars (see #622 for more details).

@jcrist To make this work well, it would be very helpful to have support for rules that match based on types. In particular, I would like to be able to match all tasks with a ufunc or partial_by_order object as the first element. Otherwise, the only way to properly handle the array + scalar case would be to update matching rules at runtime, which seems non-ideal.

shoyer · 2015-08-31T00:09:28Z

Actually, it looks like using isinstance checks for ufuncs alone won't suffice. I'm also need to verify that the ufunc signature is None. Otherwise, it could be a gufunc, for which this rewrite rule would not apply.

mrocklin · 2015-09-21T16:34:00Z

@shoyer can I close this as stale?

shoyer · 2015-09-22T21:08:26Z

Fair enough, I'll make another issue just so we don't forget about it

mrocklin · 2015-09-22T21:09:00Z

Lovely. Getting close to zero-lingering-pull-requests.

shoyer force-pushed the elemwise-rewrite branch from 983d4d4 to fc1dcb0 Compare July 13, 2015 00:19

shoyer closed this Sep 22, 2015

shoyer mentioned this pull request Sep 22, 2015

Add optimizations to dask.array to remap ufunc(data)[key] to ufunc(data[key]) #746

Open

phofl added a commit to phofl/dask that referenced this pull request Dec 23, 2024

Fix from_pandas for empty df (dask#410)

c7b051f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an rewrite rule for handling (getitem, (elemwise, x), i)#410

Add an rewrite rule for handling (getitem, (elemwise, x), i)#410
shoyer wants to merge 1 commit intodask:masterfrom
shoyer:elemwise-rewrite

shoyer commented Jul 11, 2015

Uh oh!

mrocklin commented Jul 11, 2015

Uh oh!

jcrist commented Jul 11, 2015

Uh oh!

shoyer commented Jul 11, 2015

Uh oh!

shoyer commented Jul 12, 2015

Uh oh!

jcrist commented Jul 12, 2015

Uh oh!

shoyer commented Jul 13, 2015

Uh oh!

mrocklin commented Jul 13, 2015

Uh oh!

shoyer commented Aug 31, 2015

Uh oh!

shoyer commented Aug 31, 2015

Uh oh!

mrocklin commented Sep 21, 2015

Uh oh!

shoyer commented Sep 22, 2015

Uh oh!

mrocklin commented Sep 22, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

shoyer commented Jul 11, 2015

Uh oh!

mrocklin commented Jul 11, 2015

Uh oh!

jcrist commented Jul 11, 2015

Uh oh!

shoyer commented Jul 11, 2015

Uh oh!

shoyer commented Jul 12, 2015

Uh oh!

jcrist commented Jul 12, 2015

Uh oh!

shoyer commented Jul 13, 2015

Uh oh!

mrocklin commented Jul 13, 2015

Uh oh!

shoyer commented Aug 31, 2015

Uh oh!

shoyer commented Aug 31, 2015

Uh oh!

mrocklin commented Sep 21, 2015

Uh oh!

shoyer commented Sep 22, 2015

Uh oh!

mrocklin commented Sep 22, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants