Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from by Illviljan · Pull Request #6997 · dask/dask

Illviljan · 2020-12-19T19:01:58Z

Add the numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from.

Closes some of #2911.

[ x] Tests added / passed
[ x] Passes black dask / flake8 dask

da.triu and da.tril got simplified as well and looks very similar to the numpy version now. It was just an old implementation, right? Or was there a clever reason for all the loops?
Chunking in da.triu and da.tril is a bit different now because because they use da.tri which uses da.greater_equal.outer, not sure if this is better. Would love some thoughts on that.

I had to move around a few functions as well to avoid circular dependencies.

numpy assumes N and M always comes first and second.

Illviljan · 2020-12-20T21:18:10Z

I gave a try calculating where triu_indices would be algebraically in hopes that it would be faster because

it is quite easy to visualize where the triangular arrays will be. Implementing it on the other hand wasn't as straightforward...
should not require to trigger compute_chunksize as all the sizes are already known.

But unfortunately I couldn't get it up to the same speed as the numpy way at all. It is also pretty tricky fitting in the non-triangular arrays when k is negative. Rectangular arrays is also tricky to fit in.

Algebraic triu_indices

import dask.array as da

def _triu_indices(n, k=0, m=None):
    # https://stackoverflow.com/questions/242711/algorithm-for-index-numbers-of-triangular-matrix-coefficients

    if m is None:
        m = n

    if k < 0 or m != n:
        # If k is negative or M does not match n then use normal method,
        # could not figure out how to calculate those algebraically:
        return da.nonzero(~da.tri(n, m, k=k - 1, dtype=bool))

    def _triangular_number(n, k=0):
        """Get the triangular number."""
        if k >= 0:
            T_n = (n - k) * (n - k + 1) * 0.5
        else:
            # When k is negative add triangular numbers from the other
            # side instead:
            k = -1 * k + 1
            T_n = (n - 0) * (n - 0 + 1) * 0.5
            T_n += (n - 1) * (n - 1 + 1) * 0.5
            T_n -= (n - k) * (n - k + 1) * 0.5

        return T_n

    def _triangular_root(T_n, k=0):
        """Solve _triangular_number for n instead."""
        n = (da.sqrt(8*T_n + (1 - 2*k)*(1 - 2*k) + 4*k - 4*k*k) - (1 - 2*k)) * 0.5
        return n

    # Number of triangular points:
    T_n = int(_triangular_number(n=n, k=k))
    T_n_range = da.arange(T_n, dtype=int)

    #
    ii = T_n - 1 - T_n_range

    # Triangular root:
    K = da.floor(_triangular_root(ii, k=k))

    #
    jj = ii - _triangular_number(n=K, k=k)

    #
    i = n - 1 - K
    j = m - 1 - jj

    return (i, j)

Illviljan · 2021-01-10T14:12:03Z

I think this is ready for review. @jsignell maybe?

If anyone has an idea how to avoid the (not so) unknown chunk sizes in triu_indices that would be helpful:

da.triu_indices(2)
Out[70]: 
(dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>,
 dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>)

jsignell

This looks good to me. Sorry it took me so long to review.

* upstream/master: (43 commits) bump version to 2021.03.0 Bump minimum version of distributed (dask#7328) Fix `percentiles_summary` with `dask_cudf` (dask#7325) Temporarily revert recent Array.__setitem__ updates (dask#7326) Blockwise.clone (dask#7312) NEP-35 duck array update (dask#7321) Don't allow setting `.name` for array (dask#7222) Use nearest interpolation for creating percentiles of integer input (dask#7305) Test `exp` with CuPy arrays (dask#7322) Check that computed chunks have right size and dtype (dask#7277) pytest.mark.flaky (dask#7319) Contributing docs: add note to pull the latest git tags before pip installing Dask (dask#7308) Support for Python 3.9 (dask#7289) Add broadcast-based merge implementation (dask#7143) Add split_every to graph_manipulation (dask#7282) Typo in optimize docs (dask#7306) dask.graph_manipulation support for xarray.Dataset (dask#7276) Add plot width and height support for Bokeh 2.3.0 (dask#7297) Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (dask#6997) Remove "cleanup" task in dataframe on-disk shuffle. The partd directory (dask#7260) ...

Illviljan added 17 commits December 19, 2020 18:54

Add tri and use tri in tril and triu

834053e

Update creation.py

eb4bf8a

Update __init__.py

f68ea67

move around triu/l to avoid circular deps

2704871

move triu/l here to avoid circular deps

a34ce81

Update creation.py

efea4f3

Update test_creation.py

e4395e8

Update routines.py

1d454ba

move arg to match numpy

e3a92b2

numpy assumes N and M always comes first and second.

Seems like they support multiple dims now

bdb4de3

formatting

629c162

formatting

53199ea

formatting

2d5e296

use the min_int func

b31e32e

Add triu/l_indices and from

807bbd9

move tril/u tests to routines for consistency

debe494

Update test_creation.py

3b33e7d

Illviljan changed the title ~~Add tri()~~ Add tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from Dec 20, 2020

Update __init__.py

31e67cb

Illviljan changed the title ~~Add tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from~~ Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from Dec 20, 2020

Illviljan marked this pull request as ready for review December 20, 2020 21:22

Illviljan added 8 commits December 21, 2020 12:50

allow unsymmetric chunks

7485d32

better tri test

0b7b73d

black

d2970e2

blacks doesn't agree

36f4757

Update test_creation.py

9b47e31

follow normalize_chunks

672598f

use same chunks as the input array

9ad70bf

chunks should look the same as shape

d896fab

Illviljan closed this Dec 22, 2020

Illviljan reopened this Dec 22, 2020

Illviljan added 2 commits January 10, 2021 10:52

add chunks in indices

c659f6c

Cleaner test format

983ec6c

Illviljan mentioned this pull request Feb 27, 2021

Check that computed chunks have right size and dtype #7277

Merged

2 tasks

bmerry mentioned this pull request Feb 27, 2021

triu and tril have wrong dtypes for some chunks in old version of numpy #7287

Closed

jsignell approved these changes Mar 1, 2021

View reviewed changes

jsignell merged commit ff63ae7 into dask:master Mar 1, 2021

pentschev mentioned this pull request Mar 5, 2021

Failing CuPy tests #7324

Closed

This was referenced Mar 23, 2021

dask.array.triu does not works with cupy array #7452

Closed

Fix __array_function__ dispatching for tril/triu #7457

Merged

quasiben mentioned this pull request Mar 24, 2021

GPU CI dask/community#138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from#6997

Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from#6997
jsignell merged 28 commits intodask:masterfrom
Illviljan:Illviljan-tri_triu_tril

Illviljan commented Dec 19, 2020 •

edited

Loading

Uh oh!

Illviljan commented Dec 20, 2020

Uh oh!

Illviljan commented Jan 10, 2021

Uh oh!

jsignell left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Illviljan commented Dec 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Illviljan commented Dec 20, 2020

Uh oh!

Illviljan commented Jan 10, 2021

Uh oh!

jsignell left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Illviljan commented Dec 19, 2020 •

edited

Loading