Skip to content

Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from#6997

Merged
jsignell merged 28 commits intodask:masterfrom
Illviljan:Illviljan-tri_triu_tril
Mar 1, 2021
Merged

Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from#6997
jsignell merged 28 commits intodask:masterfrom
Illviljan:Illviljan-tri_triu_tril

Conversation

@Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Dec 19, 2020

Add the numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from.

Closes some of #2911.

  • [ x] Tests added / passed
  • [ x] Passes black dask / flake8 dask

da.triu and da.tril got simplified as well and looks very similar to the numpy version now. It was just an old implementation, right? Or was there a clever reason for all the loops?
Chunking in da.triu and da.tril is a bit different now because because they use da.tri which uses da.greater_equal.outer, not sure if this is better. Would love some thoughts on that.

I had to move around a few functions as well to avoid circular dependencies.

@Illviljan Illviljan changed the title Add tri() Add tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from Dec 20, 2020
@Illviljan
Copy link
Contributor Author

I gave a try calculating where triu_indices would be algebraically in hopes that it would be faster because

  • it is quite easy to visualize where the triangular arrays will be. Implementing it on the other hand wasn't as straightforward...
  • should not require to trigger compute_chunksize as all the sizes are already known.

But unfortunately I couldn't get it up to the same speed as the numpy way at all. It is also pretty tricky fitting in the non-triangular arrays when k is negative. Rectangular arrays is also tricky to fit in.

Algebraic triu_indices
import dask.array as da

def _triu_indices(n, k=0, m=None):
    # https://stackoverflow.com/questions/242711/algorithm-for-index-numbers-of-triangular-matrix-coefficients

    if m is None:
        m = n

    if k < 0 or m != n:
        # If k is negative or M does not match n then use normal method,
        # could not figure out how to calculate those algebraically:
        return da.nonzero(~da.tri(n, m, k=k - 1, dtype=bool))

    def _triangular_number(n, k=0):
        """Get the triangular number."""
        if k >= 0:
            T_n = (n - k) * (n - k + 1) * 0.5
        else:
            # When k is negative add triangular numbers from the other
            # side instead:
            k = -1 * k + 1
            T_n = (n - 0) * (n - 0 + 1) * 0.5
            T_n += (n - 1) * (n - 1 + 1) * 0.5
            T_n -= (n - k) * (n - k + 1) * 0.5

        return T_n

    def _triangular_root(T_n, k=0):
        """Solve _triangular_number for n instead."""
        n = (da.sqrt(8*T_n + (1 - 2*k)*(1 - 2*k) + 4*k - 4*k*k) - (1 - 2*k)) * 0.5
        return n

    # Number of triangular points:
    T_n = int(_triangular_number(n=n, k=k))
    T_n_range = da.arange(T_n, dtype=int)

    #
    ii = T_n - 1 - T_n_range

    # Triangular root:
    K = da.floor(_triangular_root(ii, k=k))

    #
    jj = ii - _triangular_number(n=K, k=k)

    #
    i = n - 1 - K
    j = m - 1 - jj

    return (i, j)

@Illviljan Illviljan changed the title Add tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from Dec 20, 2020
@Illviljan Illviljan marked this pull request as ready for review December 20, 2020 21:22
@Illviljan Illviljan closed this Dec 22, 2020
@Illviljan Illviljan reopened this Dec 22, 2020
@Illviljan
Copy link
Contributor Author

I think this is ready for review. @jsignell maybe?

If anyone has an idea how to avoid the (not so) unknown chunk sizes in triu_indices that would be helpful:

da.triu_indices(2)
Out[70]: 
(dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>,
 dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>)

Copy link
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Sorry it took me so long to review.

@jsignell jsignell merged commit ff63ae7 into dask:master Mar 1, 2021
@pentschev pentschev mentioned this pull request Mar 5, 2021
dcherian added a commit to dcherian/dask that referenced this pull request Mar 8, 2021
* upstream/master: (43 commits)
  bump version to 2021.03.0
  Bump minimum version of distributed (dask#7328)
  Fix `percentiles_summary` with `dask_cudf` (dask#7325)
  Temporarily revert recent Array.__setitem__ updates (dask#7326)
  Blockwise.clone (dask#7312)
  NEP-35 duck array update (dask#7321)
  Don't allow setting `.name` for array (dask#7222)
  Use nearest interpolation for creating percentiles of integer input (dask#7305)
  Test `exp` with CuPy arrays (dask#7322)
  Check that computed chunks have right size and dtype (dask#7277)
  pytest.mark.flaky (dask#7319)
  Contributing docs: add note to pull the latest git tags before pip installing Dask (dask#7308)
  Support for Python 3.9 (dask#7289)
  Add broadcast-based merge implementation (dask#7143)
  Add split_every to graph_manipulation (dask#7282)
  Typo in optimize docs (dask#7306)
  dask.graph_manipulation support for xarray.Dataset (dask#7276)
  Add plot width and height support for Bokeh 2.3.0 (dask#7297)
  Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (dask#6997)
  Remove "cleanup" task in dataframe on-disk shuffle. The partd directory (dask#7260)
  ...
@quasiben quasiben mentioned this pull request Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants