ENH: Add LU decomposition by sinhrks · Pull Request #885 · dask/dask

sinhrks · 2015-12-18T12:00:04Z

It is still under work. I'm going to built scipy.linalg.lu compat API using blocked gaussian elimination.

A = np.array([[7, 3, -1, 2], [3, 8, 1, -4], [-1, 1, 4, -1], [2, -4, -1, 6] ])
A
# array([[ 7,  3, -1,  2],
#        [ 3,  8,  1, -4],
#        [-1,  1,  4, -1],
#        [ 2, -4, -1,  6]])

dA = da.from_array(A, chunks=(2, 2))
dA
# dask.array<from-ar..., shape=(4, 4), dtype=int64, chunksize=(2, 2)>

p, l, u = da.linalg.lu(dA)
l
# dask.array<lu-l-6d..., shape=(4, 4), dtype=int64, chunksize=(2, 2)>

p.compute()
# array([[ 1.,  0.,  0.,  0.],
#        [ 0.,  1.,  0.,  0.],
#        [ 0.,  0.,  1.,  0.],
#        [ 0.,  0.,  0.,  1.]])

l.compute()
# array([[ 1.        ,  0.        ,  0.        ,  0.        ],
#        [ 0.42857143,  1.        ,  0.        ,  0.        ],
#        [-0.14285714,  0.21276596,  1.        ,  0.        ],
#        [ 0.28571429, -0.72340426,  0.08982036,  1.        ]])

u.compute()
# array([[ 7.        ,  3.        , -1.        ,  2.        ],
#        [ 0.        ,  6.71428571,  1.42857143, -4.85714286],
#        [ 0.        ,  0.        ,  3.55319149,  0.31914894],
#        [ 0.        ,  0.        ,  0.        ,  1.88622754]])

scipy results

p, l, u = scipy.linalg.lu(A)
p
# array([[ 1.,  0.,  0.,  0.],
#        [ 0.,  1.,  0.,  0.],
#        [ 0.,  0.,  1.,  0.],
#        [ 0.,  0.,  0.,  1.]])

l
# array([[ 1.        ,  0.        ,  0.        ,  0.        ],
#        [ 0.42857143,  1.        ,  0.        ,  0.        ],
#        [-0.14285714,  0.21276596,  1.        ,  0.        ],
#        [ 0.28571429, -0.72340426,  0.08982036,  1.        ]])

u
# array([[ 7.        ,  3.        , -1.        ,  2.        ],
#        [ 0.        ,  6.71428571,  1.42857143, -4.85714286],
#        [ 0.        ,  0.        ,  3.55319149,  0.31914894],
#        [ 0.        ,  0.        ,  0.        ,  1.88622754]])

mrocklin · 2015-12-18T14:14:57Z

I saw your tweets about this and was hoping that dask.array would get a PR :)

cc @jcrist

mrocklin · 2015-12-18T14:21:01Z

dask/array/linalg.py

Is np.linalg.inv efficient in the triangular case? I did a quick benchmark and I think that scipy.linalg.solve_triangular might be faster here.

In [1]: import numpy as np In [2]: import scipy.linalg In [3]: x = np.random.random((2000, 2000)) In [4]: y = np.random.random((2000, 2000)) In [5]: p, l, u = scipy.linalg.lu(x) In [6]: %time np.linalg.inv(l).dot(y) CPU times: user 7.05 s, sys: 295 ms, total: 7.35 s Wall time: 1.86 s Out[6]: array([[ 0.47500179, 0.49931231, 0.98423146, ..., 0.32664585, 0.6498253 , 0.36445447], [ 0.20188943, 0.42700747, 0.60685892, ..., 0.0924809 , 0.6496587 , 0.09958131], [ 0.88167014, 0.66419102, 0.75093161, ..., 0.21974697, 0.19297681, 0.59164822], ..., [ 7.13030456, -0.52782879, 0.53777843, ..., 4.07304968, 1.4036643 , 4.20918249], [ 5.25830631, 1.38309588, -3.68227833, ..., 11.42730652, 1.06180677, -0.80925082], [ 11.57086192, 9.53704602, -3.79445665, ..., 8.64013982, -2.69850912, 10.15152519]]) In [7]: %time scipy.linalg.solve_triangular(l, y, lower=True) CPU times: user 743 ms, sys: 7.88 ms, total: 751 ms Wall time: 755 ms Out[7]: array([[ 0.47500179, 0.49931231, 0.98423146, ..., 0.32664585, 0.6498253 , 0.36445447], [ 0.20188943, 0.42700747, 0.60685892, ..., 0.0924809 , 0.6496587 , 0.09958131], [ 0.88167014, 0.66419102, 0.75093161, ..., 0.21974697, 0.19297681, 0.59164822], ..., [ 7.13030456, -0.52782879, 0.53777843, ..., 4.07304968, 1.4036643 , 4.20918249], [ 5.25830631, 1.38309588, -3.68227833, ..., 11.42730652, 1.06180677, -0.80925082], [ 11.57086192, 9.53704602, -3.79445665, ..., 8.64013982, -2.69850912, 10.15152519]])

mrocklin · 2015-12-18T14:23:32Z

I'm curious about the permutation matrix part of this computation. How out-of-core is this algorithm? How does the size of the input array affect the memory requirements of the algorithm?

sinhrks · 2015-12-19T02:42:50Z

I may misunderstand what you're asking, but I think it's difficult to perform permutation (pivotting) over the dask chunks and results in error if permutation is needed.

If we do not permute between blocks, the algorithm perform each blocks from left-top to right-bottom in out-of-core. Assuming 3x3 chunks:

compute (0, 0) chunk
compute (0, 1), (0, 2), (1, 0), (2, 0) chunks in parallel/out-of-core
compute (1, 1) chunk
compute (2, 1) (1, 2) chunks in parallel/out-of-core
compute (2, 2) chunk
- http://homepages.math.uic.edu/~jan/mcs572/parallelLU.pdf

My main interest is to use LU as intermediate results to get inverse matrix.

mrocklin · 2015-12-19T03:49:38Z

What is the maximum number of chunks that we must have in memory at once?

For example, in the following computation

x = da.random.random((n, n), chunks=(n/k, n/k))
P, L, U = da.linalg.lu(x)
(L.sum() + U.sum()).compute()

What is maximum amount of memory used over time? At most this number is n by n but hopefully it is smaller. I'll take a look at the LU algorithm paper linked.

sinhrks · 2015-12-25T03:25:25Z

Because each computation is performed using blocks of left-side and top-side, maximum number is 2 * (k - 1) + 1 chunks at right-bottom chunk.

I found a bug of current impl when input has 3 or more chunks. Will fix after #886 (as #886 makes current impl little complex).

mrocklin · 2015-12-25T04:03:33Z

dask/array/linalg.py

You should be able to replace this with just (name_lu, i, j). The scheduler accepts direct name aliases.

In [1]: import dask In [2]: inc = lambda x: x + 1 In [3]: dsk = {'x': (inc, 1), 'y': 'x'} In [4]: dask.get(dsk, 'y') Out[4]: 2

Also, you don't need the parentheses when adding an element with a tuple key to a dict -- Python adds that automatically.

@mrocklin Yes your example should work, and #886 intends to the case like below. Because calculated L and U is referred by intermediate calculation.

dsk = {('x', 0): np.array([1, 2]), ('x', 1): ('x', 0)} darr = da.Array(dsk, 'x', chunks=(2, ), shape=(4, )) darr.compute() # KeyError: ('x', 0)

@shoyer Thanks, will fix that.

mrocklin · 2015-12-25T04:08:01Z

This looks pretty cool so far. Image from current state (after removing _dummy_identity)

In [1]: import dask.array as da
In [2]: x = da.random.random((16, 16), chunks=(4, 4))
In [3]: p, l, u = da.linalg.lu(x)
In [4]: from dask import visualize
In [6]: visualize(l, u, filename='dask.png')
Out[6]: <IPython.core.display.Image object>

The image corroborates your reasoning about out-of-core. The algorithm looks like it will be easy to compute with little memory.

@jcrist was working on a Cholesky decomposition a while ago. This might interest him.

shoyer · 2015-12-25T15:47:41Z

dask/array/linalg.py

Is this necessary? @mrocklin any thoughts on if it's dangerous to share a task dict between multiple arrays?

It's probably not necessary, so far all operations within dask are pure. However, it's also very cheap; we copy large dicts all the time.

In [1]: import dask.array as da In [2]: x = da.random.random((10000, 10000), chunks=(100, 100)) In [3]: %time y = ((x + 1)**2).sum(axis=0).mean() CPU times: user 473 ms, sys: 4.31 ms, total: 477 ms Wall time: 477 ms In [4]: %time _ = y.dask.copy() CPU times: user 6.56 ms, sys: 0 ns, total: 6.56 ms Wall time: 6 ms In [5]: len(y.dask) Out[5]: 40605

shoyer · 2015-12-25T15:48:54Z

It's pretty awesome that this works (mostly) out of core -- I didn't realize that that was possible! I would definitely document the memory requirements in the doc string for this function.

shoyer · 2015-12-27T12:33:02Z

dask/array/linalg.py

@jcrist Is there an easy way to plug into your new tree-reduction code here? Or would that be pointless?

Is there a reason why we're not using sum directly here regardless?

e.g. prevs = (sum, prevs)

Although, it's nice to break apart large computations like this into multiple keys. That helps debugging and diagnostic tools to properly classify what's going on. I wonder if each element in prevs should be a separate key in the graph followed by a final key for (sum, prev_keys).

@mrocklin I think the reason not to use sum directly is that it lets you handle the sum in a streaming fashion? Otherwise you'll need to load each of prevs into memory together.

At the moment these are all in the same task, so they'll all be loaded in at once. If we want to do streaming work then each computation will need to be a separate key-value pair in the dictionary.

mrocklin · 2016-01-04T00:36:18Z

Is there anything I can do to help with this PR?

sinhrks · 2016-01-04T14:27:29Z

Thanks, I still meet the KeyError and will check the reason.

mrocklin · 2016-01-04T17:54:43Z

I think that I see the problem.

mrocklin · 2016-01-04T18:31:14Z

Resolved in #903 I think

mrocklin · 2016-01-04T19:13:26Z

Tests pass

mrocklin · 2016-01-04T19:15:52Z

dask/array/linalg.py

I think that this should either be prevs = (sum, prevs) or it should be many small tasks within the graph.

sinhrks · 2016-01-05T21:56:11Z

@mrocklin Thanks, #903 works. I'll fix it in this weekend to:

use solve_triangular and np.sum
add more test cases with larger blocks

sinhrks · 2016-01-17T04:47:45Z

Sorry to take a long. Once updated to cover more larger matrices/chunks and looks OK. Could you review?

np.random.seed(10)
A = np.random.random_integers(0, 10, (100, 100))
A.shape
# (100, 100)

import dask.array as da
dA = da.from_array(A, chunks=(20, 20))
p, l, u = da.linalg.lu(dA)
np.allclose(p.dot(l.dot(u)).compute(), A)
# True

l.visualize()

mrocklin · 2016-01-17T18:07:27Z

@ahmadia this PR may interest you.

mrocklin · 2016-01-17T18:23:55Z

Profile plots: https://rawgit.com/mrocklin/f73b2967b224b41a62d2/raw/9057bb64edea67f66ecd50a305e614265d03df57/lu.html

In [1]: import dask.array as da

In [2]: x = da.random.random((10000, 1000), chunks=(1000, 1000))
In [3]: y = x.dot(x.T)
In [4]: p, l, u = da.linalg.lu(y)

In [5]: from dask.diagnostics import ResourceProfiler, Profiler, visualize
In [6]: with ResourceProfiler() as rprof, Profiler() as prof:
    l.sum().compute()
   ...:     

In [7]: visualize([rprof, prof], 'lu.html')
Out[7]: <bokeh.models.plots.GridPlot at 0x7f3290076780>

ahmadia · 2016-01-17T18:36:15Z

Indeed. Thanks for the ping :)
On Sun, Jan 17, 2016 at 1:24 PM Matthew Rocklin notifications@github.com
wrote:

Profile plots:
https://rawgit.com/mrocklin/f73b2967b224b41a62d2/raw/9057bb64edea67f66ecd50a305e614265d03df57/lu.html

In [1]: import dask.array as da

In [2]: x = da.random.random((10000, 1000), chunks=(1000, 1000))
In [3]: y = x.dot(x.T)
In [4]: p, l, u = da.linalg.lu(y)

In [5]: from dask.diagnostics import ResourceProfiler, Profiler, visualize
In [6]: with ResourceProfiler() as rprof, Profiler() as prof:
l.sum().compute()
...:

In [7]: visualize([rprof, prof], 'lu.html')
Out[7]: <bokeh.models.plots.GridPlot at 0x7f3290076780>

—
Reply to this email directly or view it on GitHub
#885 (comment).

ahmadia · 2016-01-18T03:59:58Z

From a parallel linear algebra standpoint, this is probably most interesting as an algorithm for solving dense (input) LU on low-memory compute nodes (not actually that uncommon of a situation) with the out-of-core pieces of the matrix stored somewhere with relatively high access cost compared to memory. This has implications for Hadoop-style clusters where the matrix to be solved for is not necessarily predistributed across the entire cluster.

It would be worth doing a performance comparison here to see how the algorithm performs against a completely in-memory solution using just scipy.linalg.lu and for different approaches to chunking/threading.

On a single-core machine, the approach seems straightforward. On a multi-core machine, one may tweak both the number of BLAS/LAPACK threads as well as the number of parallel processes. My intuition is that it will be fastest to use the largest possible blocks that would still fit in memory. I also would like to see some indication of the performance overhead to solve LU's out of core, so to speak, and where the costs are, if that's possible.

ahmadia · 2016-01-18T14:39:35Z

This conference paper provides more details on out-of-core parallel solver frameworks and is relatively recent: http://conferences.computer.org/sc/2012/papers/1000a062.pdf

mrocklin · 2016-01-18T15:17:59Z

This looks good to me. Merging in 24 hours if no comments.

sinhrks · 2016-01-20T13:11:21Z

Ah sorry, I've removed it because solve(lower=True) is meaningless without specifying sym_pos=True (am I misunderstood yet?)

http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.linalg.solve.html

For efficient calculation, we should use solve_triangular as initially pointed out. I found the previous LinAlgError error is caused by a permutation from the triangular to non-triangular. I've fixed the logic and there is no LinAlgError in test cases.

I've defined _solve_triangular_lower as module func as it is clear than including bool in tuples.

Could you take a look once again?

mrocklin · 2016-01-20T15:35:04Z

Ah, I incorrectly assumed that scipy.linalg.solve(..., lower=True) was scipy.linalg.solve_triangular(...). What you have here seems correct to me. Merging.

This is exciting work!

ENH: Add LU decomposition

shoyer · 2016-01-20T16:47:50Z

Nice work @sinhrks!
On Wed, Jan 20, 2016 at 7:35 AM Matthew Rocklin notifications@github.com
wrote:

Ah, I incorrectly assumed that scipy.linalg.solve(..., lower=True) was
scipy.linalg.solve_triangular(...). What you have here seems correct to
me. Merging.

This is exciting work!

—
Reply to this email directly or view it on GitHub
#885 (comment).

poulson · 2016-02-02T17:30:01Z

This is very impressive! Are there any (preliminary) GFlop/s tests before and after going out of core? I have always wondered what the performance would be for running a 100 GB LU off of a 1 TB external hard disk.

mrocklin · 2016-02-03T12:26:08Z

No numbers yet. I agree that this would be interesting. @jcrist any interest in benchmarking this?

* Make column projections stricter (dask#881) * Simplify again after lowering (dask#884) * Visual EXPLAIN (dask#885) * Fix merge predicate pushdowns with weird predicates (dask#888) * Handle futures that are put into map_partitions (dask#892) * Remove eager divisions from indexing (dask#891) * Add shuffle if objects are not aligned and partitions are unknown in assign (dask#887) Co-authored-by: Hendrik Makait <hendrik@makait.com> * Add support for dd.Aggregation (dask#893) * Fix random_split for series (dask#894) * Update dask version * Use Aggregation from dask/dask (dask#895) * Fix meta calculation error in groupby (dask#897) * Revert "Use Aggregation from dask/dask" (dask#898) * Parquet reader using Pyarrow FileSystem (dask#882) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Fix assign for empty indexer (dask#901) * Add dask.dataframe import at start (dask#903) * Add indicator support to merge (dask#902) * Fix detection of parquet filter pushdown (dask#910) * Speedup init of `ReadParquetPyarrowFS` (dask#909) * Don't rely on sets in are_co_aligned (dask#908) * Implement more efficient GroupBy.mean (dask#906) * Refactor GroupByReduction (dask#920) * Implement array inference in new_collection (dask#922) * Add support for convert string option (dask#912) * P2P shuffle drops partitioning column early (dask#899) * Avoid culling for SetIndexBlockwise with divisions (dask#925) * Re-run versioneer install to fix version number (tag_prefix) (dask#926) * Sort if split_out=1 in value_counts (dask#924) * Wrap fragments (dask#911) * Ensure that columns are copied in projection (dask#927) * Raise in map if pandas < 2.1 (dask#929) * Add _repr_html_ and updated __repr__ for FrameBase (dask#930) * Support token for map_partitions (dask#931) * Fix Copy-on-Write related bug in groupby.transform (dask#933) * Fix to_dask_dataframe test after switching to dask-expr by default (dask#935) * Use multi-column assign in groupby apply (dask#934) * Enable copy on write by default (dask#932) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Avoid fusing from_pandas ops to avoid duplicating data (dask#938) * Adjust automatic split_out parameter (dask#940) * Revert "Add _repr_html_ and updated __repr__ for FrameBase (dask#930)" (dask#941) * Remove repartition from P2P shuffle (dask#942) * [Parquet] Calculate divisions from statistics (dask#917) * Accept user arguments for arrow_to_pandas (dask#936) * Add _repr_html_ and prettier __repr__ w/o graph materialization (dask#943) * Add dask tokenize for fragment wrapper (dask#948) * Warn if annotations are ignored (dask#947) * Require `pyarrow>=7` (dask#949) * Implement string conversion for from_array (dask#950) * Add dtype and columns type check for shuffle (dask#951) * Concat arrow tables before converting to pandas (dask#928) * MINOR: Avoid confusion around shuffle method (dask#956) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Set pa cpu count (dask#954) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Update for pandas nighlies (dask#953) * Fix bug with split_out in groupby aggregate (dask#957) * Fix default observed value (dask#960) * Ensure that we respect shuffle in context manager (dask#958) Co-authored-by: Hendrik Makait <hendrik@makait.com> * Fix 'Empty' prefix to non-empty Series repr (dask#963) * Update README.md (dask#964) * Adjust split_out values to be consistent with other methods (dask#961) * bump version to 1.0 * Raise an error if the optimizer cannot terminate (dask#966) * Fix non-converging optimizer (dask#967) * Fixup filter pushdown through merges with ands and column reuse (dask#969) * Fix unique with shuffle and strings (dask#971) * Implement custom reductions (dask#970) * Fixup set_index with one partition but more divisions by user (dask#972) * Fixup predicate pushdown for query 19 (dask#973) Co-authored-by: Miles <miles59923@gmail.com> * Revert enabling pandas cow (dask#974) * Update changelog for 1.0.2 * Fix set-index preserving divisions for presorted (dask#977) * Fixup reduction with split_every=False (dask#978) * Release for dask 2024.3.1 * Raise better error for repartition on divisions with unknown divisions (dask#980) * Fix concat of series objects with column projection (dask#981) * Fix some reset_index optimization issues (dask#982) * Remove keys() (dask#983) * Ensure wrapping an array when comparing to Series works if columns are empty (dask#984) * Version v1.0.4 * Visual ANALYZE (dask#889) Co-authored-by: fjetter <fjetter@users.noreply.github.com> * Support ``prefix`` argument in ``from_delayed`` (dask#991) * Ensure drop matches column names exactly (dask#992) * Fix SettingWithCopyWarning in _merge.py (dask#990) * Update pyproject.toml (dask#994) * Allow passing of boolean index for column index in loc (dask#995) * Ensure that repr doesn't raise if an operand is a pandas object (dask#996) * Version v1.0.5 * Reduce coverage target a little bit (dask#999) * Nicer read_parquet prefix (dask#998) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Set divisions with divisions already known (dask#997) * Start building and publishing conda nightlies (dask#986) * Fix zero division error when reading index from parquet (dask#1000) * Rename overloaded `to/from_dask_dataframe` API (dask#987) * Register json and orc APIs for "pandas" dispatch (dask#1004) * Fix pyarrow fs reads for list of directories (dask#1006) * Release for dask 2024.4.0 * Fix meta caclulation in drop_duplicates (dask#1007) * Release 1.0.7 * Support named aggregations in groupby.aggregate (dask#1009) * Make release 1.0.9 * Adjust version number in changes * Make setattr work (dask#1011) * Release for dask 2024.4.1 * Fix head for npartitions=-1 and optimizer step (dask#1014) * Deprecate ``to/from_dask_dataframe`` API (dask#1001) * Fix projection for rename if projection isn't renamed (dask#1016) * Fix unique with numeric columns (dask#1017) * Add changes for new version * Fix column projections in merge when suffixes are relevant (dask#1019) * Simplify dtype casting logic for shuffle (dask#1012) * Use implicit knowledge about divisions for efficient grouping (dask#946) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Co-authored-by: Hendrik Makait <hendrik@makait.com> * Fix assign after set index incorrect projections (dask#1020) * Fix read_parquet if directory is empty (dask#1023) * Rename uniuqe_partition_mapping property and add docs (dask#1022) * Add docs for usefule optimizer methods (dask#1025) * Fix doc build error (dask#1026) * Fix error in analyze for scalar (dask#1027) * Add nr of columns to explain output for projection (dask#1030) Co-authored-by: Hendrik Makait <hendrik@makait.com> * Fuse more aggressively if parquet files are tiny (dask#1029) * Move IO docstrings over (dask#1033) * Release for dask 2024.4.2 * Add cudf support to ``to_datetime`` and ``_maybe_from_pandas`` (dask#1035) * Fix backend dispatching for `read_csv` (dask#1028) * Fix loc accessing index for element wise op (dask#1037) * Fix loc slicing with Datetime Index (dask#1039) * Fix shuffle after set_index from 1 partition df (dask#1040) * Bugfix release * Fix bug in ``Series`` reductions (dask#1041) * Fix shape returning integer (dask#1043) * Fix xarray integration with scalar columns (dask#1046) * Fix None min/max statistics and missing statistics generally (dask#1045) * Fix drop with set (dask#1047) * Fix delayed in fusing with multipled dependencies (dask#1038) * Add bugfix release * Optimize when from-delayed is called (dask#1048) * Fix default name conversion in `ToFrame` (dask#1044) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Add support for ``DataFrame.melt`` (dask#1049) * Fixup failing test (dask#1052) * Generalize ``get_dummies`` (dask#1053) * reduce pickle size of parquet fragments (dask#1050) * Add a bunch of docs (dask#1051) Co-authored-by: Hendrik Makait <hendrik@makait.com> * Release for dask 2024.5.0 * Fix to_parquet in append mode (dask#1057) * Fix sort_values for unordered categories (dask#1058) * Fix dropna before merge (dask#1062) * Fix non-integer divisions in FusedIO (dask#1063) * Add cache argument to ``lower_once`` (dask#1059) * Use ensure_deterministic kwarg instead of config (dask#1064) * Fix isin with strings (dask#1067) * Fix isin for head computation (dask#1068) * Fix read_csv with positional usecols (dask#1069) * Release for dask 2024.5.1 * Use `is_categorical_dtype` dispatch for `sort_values` (dask#1070) * Fix meta for string accessors (dask#1071) * Fix projection to empty from_pandas (dask#1072) * Release for dask 2024.5.2 * Fix categorize if columns are dropped (dask#1074) * Fix resample divisions propagation (dask#1075) * Release for dask 2024.6.0 * Fix get_group for multiple keys (dask#1080) * Skip distributed tests (dask#1081) * Fix cumulative aggregations for empty partitions (dask#1082) * Move another test to distributed folder (dask#1085) * Release 1.1.4 * Release for dask 2024.6.2 * Add minimal subset of interchange protocol (dask#1087) * Add from_map docstring (dask#1088) * Ensure 1 task group per from_delayed (dask#1084) * Advise against using from_delayed (dask#1089) * Refactor shuffle method to handle invalid columns (dask#1091) * Fix freq behavior on ci (dask#1092) * Add first array draft (dask#1090) * Fix array import stuff (dask#1094) * Add asarray (dask#1095) * Implement arange (dask#1097) * Implement linspace (dask#1098) * Implement zeros and ones (dask#1099) * Remvoe pandas 2 checks (dask#1100) * Add unify-chunks draft to arrays (dask#1101) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Release for dask 2024.7.0 * Skip test if optional xarray cannot be imported (dask#1104) * Fix deepcopying FromPandas class (dask#1105) * Fix from_pandas with chunksize and empty df (dask#1106) * Link fix in readme (dask#1107) * Fix shuffle blowing up the task graph (dask#1108) Co-authored-by: Hendrik Makait <hendrik@makait.com> * Release for dask 2024.7.1 * Fix some things for pandas 3 (dask#1110) * Fixup remaining upstream failures (dask#1111) * Release for dask 2024.8.0 * Drop support for Python 3.9 (dask#1109) Co-authored-by: James Bourbeau <jrbourbeau@gmail.com> * Fix tuples as on argument in merge (dask#1117) * Fix merging when index name in meta missmatches actual name (dask#1119) Co-authored-by: Hendrik Makait <hendrik@makait.com> * Register `read_parquet` and `read_csv` as "dispatchable" (dask#1114) * Fix projection for Index class in read_parquet (dask#1120) * Fix result index of merge (dask#1121) * Introduce `ToBackend` expression (dask#1115) * Avoid calling ``array`` attribute on ``cudf.Series`` (dask#1122) * Make split_out for categorical default smarter (dask#1124) * Release for dask 2024.8.1 * Fix scalar detection of columns coming from sql (dask#1125) * Bump `pyarrow>=14.0.1` minimum versions (dask#1127) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Fix concat axis 1 bug in divisions (dask#1128) * Release for dask 2024.8.2 * Use task-based rechunking as default (dask#1131) * Improve performance of `DelayedsExpr` through caching (dask#1132) * Import from tokenize (dask#1133) * Release for dask 2024.9.0 * Add concatenate flag to .compute() (dask#1138) * Release for dask 2024.9.1 * Fix displaying timestamp scalar (dask#1141) * Fix alignment issue with groupby index accessors (dask#1142) * Improve handling of optional dependencies in `analyze` and `explain` (dask#1146) * Switch from mambaforge to miniforge in CI (dask#1147) * Fix merge_asof for single partition (dask#1145) * Raise exception when calculating divisons (dask#1149) * Fix binary operations with scalar on the left (dask#1150) * Explicitly list setuptools as a build dependency in conda recipe (dask#1151) * Version v1.1.16 * Fix ``Merge`` divisions after filtering partitions (dask#1152) * Fix meta calculation for to_datetime (dask#1153) * Internal cleanup of P2P code (dask#1154) * Migrate P2P shuffle and merge to TaskSpec (dask#1155) * Improve Aggregation docstring explicitly mentionning SeriesGroupBy (dask#1156) * Migrate shuffle and merge to `P2PBarrierTask` (dask#1157) * Migrate Blockwise to use taskspec (dask#1159) * Add support for Python 3.13 (dask#1160) * Release for dask 2024.11.0 * Fix fusion calling things multiple times (dask#1161) * Version 1.1.18 * Version 1.1.19 * Fix orphaned dependencies in Fused expression (dask#1163) * Use Taskspec fuse implementation (dask#1162) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> * Introduce more caching when walking the expression (dask#1165) * Avoid exponentially growing graph for Assign-Projection combinations (dask#1164) * Remove ``from_dask_dataframe`` (dask#1167) * Deprecated and remove from_legacy_dataframe usage (dask#1168) Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com> * Remove recursion in task spec (dask#1158) * Fix value_counts with split_out != 1 (dask#1170) * Release 2024.12.0 * Use new blockwise unpack collection in array (dask#1173) * Propagate group_keys in DataFrameGroupBy (dask#1174) * Fix assign optimization when overwriting columns (dask#1176) * Remove custom read-csv stuff (dask#1178) * Fixup install paths (dask#1179) * Version 1.1.21 * Remove legacy conversion functions (dask#1177) * Remove duplicated files * Move repository * Clean up docs and imports * Clean up docs and imports --------- Co-authored-by: Hendrik Makait <hendrik@makait.com> Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com> Co-authored-by: Miles <miles59923@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com> Co-authored-by: Charles Blackmon-Luca <20627856+charlesbluca@users.noreply.github.com> Co-authored-by: James Bourbeau <jrbourbeau@gmail.com> Co-authored-by: alex-rakowski <alexrakowski90@gmail.com> Co-authored-by: Matthew Rocklin <mrocklin@gmail.com> Co-authored-by: Sandro <shfu29r4bu@liamekaens.com> Co-authored-by: Ben <55319792+benrutter@users.noreply.github.com> Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com> Co-authored-by: Guillaume Eynard-Bontemps <g.eynard.bontemps@gmail.com> Co-authored-by: Tom Augspurger <tom.augspurger88@gmail.com>

sinhrks mentioned this pull request Dec 18, 2015

API: KeyError raised when dask graph refers to its final output #886

Closed

mrocklin reviewed Dec 18, 2015
View reviewed changes

sinhrks force-pushed the lu branch from 8098684 to fe3c471 Compare December 25, 2015 03:25

mrocklin reviewed Dec 25, 2015
View reviewed changes

shoyer reviewed Dec 25, 2015
View reviewed changes

sinhrks force-pushed the lu branch 5 times, most recently from 260a385 to b6a1d51 Compare December 27, 2015 04:28

shoyer reviewed Dec 27, 2015
View reviewed changes

sinhrks force-pushed the lu branch from b6a1d51 to 8efe9e7 Compare January 4, 2016 14:17

mrocklin mentioned this pull request Jan 4, 2016

Respect output keys in optimizations #903

Merged

mrocklin reviewed Jan 4, 2016
View reviewed changes

sinhrks force-pushed the lu branch from 8efe9e7 to 8924a44 Compare January 5, 2016 13:34

sinhrks force-pushed the lu branch from 8924a44 to 12fddb4 Compare January 17, 2016 04:39

sinhrks force-pushed the lu branch 5 times, most recently from e6bf853 to 4ce7f35 Compare January 18, 2016 11:05

sinhrks mentioned this pull request Jan 19, 2016

ENH: Add linalg.solve_triangular #927

Merged

sinhrks force-pushed the lu branch 2 times, most recently from 0252b1d to 4ef5b32 Compare January 20, 2016 13:03

ENH: Add LU decomposition

25a0726

sinhrks force-pushed the lu branch from 4ef5b32 to 25a0726 Compare January 20, 2016 13:10

mrocklin added a commit that referenced this pull request Jan 20, 2016

Merge pull request #885 from sinhrks/lu

87cf3d4

ENH: Add LU decomposition

mrocklin merged commit 87cf3d4 into dask:master Jan 20, 2016

sinhrks deleted the lu branch January 20, 2016 20:20

sinhrks mentioned this pull request Jan 21, 2016

Fixes lu/tri array dtypes #933

Merged

jcrist mentioned this pull request Feb 2, 2016

Solving linear systems with symmetric positive definite matrices #476

Closed

sinhrks added this to the 0.8.0 milestone Feb 27, 2016

sinhrks mentioned this pull request Feb 13, 2017

Add a memory-efficient LU factorization #1978

Open

phofl pushed a commit to phofl/dask that referenced this pull request Dec 23, 2024

Visual EXPLAIN (dask#885)

07c78dc

Uh oh!

Conversation

sinhrks commented Dec 18, 2015

scipy results

Uh oh!

mrocklin commented Dec 18, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrocklin commented Dec 18, 2015

Uh oh!

sinhrks commented Dec 19, 2015

Uh oh!

mrocklin commented Dec 19, 2015

Uh oh!

sinhrks commented Dec 25, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrocklin commented Dec 25, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shoyer commented Dec 25, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrocklin commented Jan 4, 2016

Uh oh!

sinhrks commented Jan 4, 2016

Uh oh!

mrocklin commented Jan 4, 2016

Uh oh!

mrocklin commented Jan 4, 2016

Uh oh!

mrocklin commented Jan 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sinhrks commented Jan 5, 2016

Uh oh!

sinhrks commented Jan 17, 2016

Uh oh!

mrocklin commented Jan 17, 2016

Uh oh!

mrocklin commented Jan 17, 2016

Uh oh!

ahmadia commented Jan 17, 2016

Uh oh!

ahmadia commented Jan 18, 2016

Uh oh!

ahmadia commented Jan 18, 2016

Uh oh!

mrocklin commented Jan 18, 2016

Uh oh!

sinhrks commented Jan 20, 2016

Uh oh!

mrocklin commented Jan 20, 2016

Uh oh!

shoyer commented Jan 20, 2016

Uh oh!

poulson commented Feb 2, 2016

Uh oh!

mrocklin commented Feb 3, 2016