Fix reify to handle sparse arrays and other objects without __len__#12103
Fix reify to handle sparse arrays and other objects without __len__#12103jacobtomlinson merged 8 commits intodask:mainfrom
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 9 files ± 0 9 suites ±0 3h 15m 8s ⏱️ + 10m 13s Results for commit 3ccdaef. ± Comparison against base commit 0fe7582. ♻️ This comment has been updated with latest results. |
jacobtomlinson
left a comment
There was a problem hiding this comment.
Given that our CI environment has scipy installed could you add some tests that use some actual sparse arrays, rather than mocking everything?
jacobtomlinson
left a comment
There was a problem hiding this comment.
This seems fine to me, but I'd appreciate an additional review from another maintainer before merging this.
cc @TomAugspurger @jrbourbeau @quasiben if you're around
|
The latest test failures are unrelated to this PR btw |
TomAugspurger
left a comment
There was a problem hiding this comment.
I'm always looking for ways to reduce the amount of guessing / duck typing we do, but I'm not sure if there's a way around that here. Just one comment about a couple of checks in is_empty that might fail. Otherwise I think this is fine.
…simplify exception handling
Tests for the bags module pass:
Tests for the utils module pass:
pre-commit run --all-filesHow did I verify that the fix works:
I recreated the test mentioned in the issue above, here's the code:
if __name__ == "__main__": import dask.bag as db import numpy as np from dask import delayed from scipy.sparse import csr_array def add(x, y): return x + y @delayed def create_sparse_array_delayed(): return csr_array(np.random.random((10, 10))) @delayed def create_array_delayed(): return np.random.random((10, 10)) db.from_sequence( [csr_array(np.random.random((10, 10))), csr_array(np.random.random((10, 10)))]).fold( add).compute() # works with sparse arrays when created from sequence db.from_delayed([create_array_delayed(), create_array_delayed()]).fold(add).compute() # works with numpy arrays print(db.from_delayed([create_sparse_array_delayed(), create_sparse_array_delayed()]).fold(add).compute())This now returns the result instead of the bug: