Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from#6997
Merged
jsignell merged 28 commits intodask:masterfrom Mar 1, 2021
Merged
Conversation
numpy assumes N and M always comes first and second.
Contributor
Author
|
I gave a try calculating where triu_indices would be algebraically in hopes that it would be faster because
But unfortunately I couldn't get it up to the same speed as the numpy way at all. It is also pretty tricky fitting in the non-triangular arrays when k is negative. Rectangular arrays is also tricky to fit in. Algebraic triu_indicesimport dask.array as da
def _triu_indices(n, k=0, m=None):
# https://stackoverflow.com/questions/242711/algorithm-for-index-numbers-of-triangular-matrix-coefficients
if m is None:
m = n
if k < 0 or m != n:
# If k is negative or M does not match n then use normal method,
# could not figure out how to calculate those algebraically:
return da.nonzero(~da.tri(n, m, k=k - 1, dtype=bool))
def _triangular_number(n, k=0):
"""Get the triangular number."""
if k >= 0:
T_n = (n - k) * (n - k + 1) * 0.5
else:
# When k is negative add triangular numbers from the other
# side instead:
k = -1 * k + 1
T_n = (n - 0) * (n - 0 + 1) * 0.5
T_n += (n - 1) * (n - 1 + 1) * 0.5
T_n -= (n - k) * (n - k + 1) * 0.5
return T_n
def _triangular_root(T_n, k=0):
"""Solve _triangular_number for n instead."""
n = (da.sqrt(8*T_n + (1 - 2*k)*(1 - 2*k) + 4*k - 4*k*k) - (1 - 2*k)) * 0.5
return n
# Number of triangular points:
T_n = int(_triangular_number(n=n, k=k))
T_n_range = da.arange(T_n, dtype=int)
#
ii = T_n - 1 - T_n_range
# Triangular root:
K = da.floor(_triangular_root(ii, k=k))
#
jj = ii - _triangular_number(n=K, k=k)
#
i = n - 1 - K
j = m - 1 - jj
return (i, j) |
Contributor
Author
|
I think this is ready for review. @jsignell maybe? If anyone has an idea how to avoid the (not so) unknown chunk sizes in triu_indices that would be helpful: da.triu_indices(2)
Out[70]:
(dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>,
dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>) |
2 tasks
jsignell
approved these changes
Mar 1, 2021
Member
jsignell
left a comment
There was a problem hiding this comment.
This looks good to me. Sorry it took me so long to review.
Closed
dcherian
added a commit
to dcherian/dask
that referenced
this pull request
Mar 8, 2021
* upstream/master: (43 commits) bump version to 2021.03.0 Bump minimum version of distributed (dask#7328) Fix `percentiles_summary` with `dask_cudf` (dask#7325) Temporarily revert recent Array.__setitem__ updates (dask#7326) Blockwise.clone (dask#7312) NEP-35 duck array update (dask#7321) Don't allow setting `.name` for array (dask#7222) Use nearest interpolation for creating percentiles of integer input (dask#7305) Test `exp` with CuPy arrays (dask#7322) Check that computed chunks have right size and dtype (dask#7277) pytest.mark.flaky (dask#7319) Contributing docs: add note to pull the latest git tags before pip installing Dask (dask#7308) Support for Python 3.9 (dask#7289) Add broadcast-based merge implementation (dask#7143) Add split_every to graph_manipulation (dask#7282) Typo in optimize docs (dask#7306) dask.graph_manipulation support for xarray.Dataset (dask#7276) Add plot width and height support for Bokeh 2.3.0 (dask#7297) Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (dask#6997) Remove "cleanup" task in dataframe on-disk shuffle. The partd directory (dask#7260) ...
This was referenced Mar 23, 2021
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add the numpy functions
tri,triu_indices,triu_indices_from,tril_indices,tril_indices_from.Closes some of #2911.
black dask/flake8 daskda.triuandda.trilgot simplified as well and looks very similar to the numpy version now. It was just an old implementation, right? Or was there a clever reason for all the loops?Chunking in
da.triuandda.trilis a bit different now because because they useda.triwhich usesda.greater_equal.outer, not sure if this is better. Would love some thoughts on that.I had to move around a few functions as well to avoid circular dependencies.