Skip to content

Error using db.from_delayed with sparse arrays #11643

@joshua-gould

Description

@joshua-gould
File "python3.11/site-packages/dask/bag/core.py", line 1881, in reify
    if len(seq) and isinstance(seq[0], Iterator):
      ^^^^^^^^^^^^^^^^^
  File "python3.11/site-packages/scipy/sparse/_base.py", line 425, in __len__
    raise TypeError("sparse array length is ambiguous; use getnnz()"

Minimal Complete Verifiable Example:

import dask.bag as db
import numpy as np
from dask import delayed
from scipy.sparse import csr_array


def add(x, y):
    return x + y


@delayed
def create_sparse_array_delayed():
    return csr_array(np.random.random((10, 10)))


@delayed
def create_array_delayed():
    return np.random.random((10, 10))


db.from_sequence(
    [csr_array(np.random.random((10, 10))), csr_array(np.random.random((10, 10)))]).fold(
    add).compute()  # works with sparse arrays when created from sequence
db.from_delayed([create_array_delayed(), create_array_delayed()]).fold(add).compute()  # works with numpy arrays
db.from_delayed([create_sparse_array_delayed(), create_sparse_array_delayed()]).fold(add).compute()  # fails

Environment:

  • Dask version: 2024.12.0
  • Python version: 3.11
  • Operating System: Mac
  • Install: pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions