Skip to content

BUG: dia_array(numpy_data).tocoo() uses np.int64 indices while coo_array(numpy_data) uses np.int32 indices #19246

@ogrisel

Description

@ogrisel

Describe your issue.

This might be an surprising unintended consequence of #18509.

The .tocoo() method of dia_array calls np.arange() without a dtype argument causing the resulting coo_array to use np.int64 indices while creating the same COO array directly from the small numpy array would have used a more memory efficient np.int32-based indexing.

This was discovered in scikit-learn/scikit-learn#27240 (comment) when investigating unexpected test failures when adding sparse array support to scikit-learn.

See the reproducer below:

Reproducing Code Example

>>> import numpy as np
>>> from scipy.sparse import coo_array, dia_array
>>> from pprint import pprint
>>>
>>> small_array = np.ones(shape=(2, 3))
>>> pprint(dia_array(small_array).__dict__)
{'_shape': (2, 3),
 'data': array([[1., 0., 0.],
       [1., 1., 0.],
       [0., 1., 1.],
       [0., 0., 1.]]),
 'maxprint': 50,
 'offsets': array([-1,  0,  1,  2], dtype=int32)}
>>> pprint(coo_array(small_array).__dict__)
{'_shape': (2, 3),
 'col': array([0, 1, 2, 0, 1, 2], dtype=int32),
 'data': array([1., 1., 1., 1., 1., 1.]),
 'has_canonical_format': True,
 'maxprint': 50,
 'row': array([0, 0, 0, 1, 1, 1], dtype=int32)}

>>> pprint(dia_array(small_array).tocoo().__dict__)
{'_shape': (2, 3),
 'col': array([0, 0, 1, 1, 2, 2]),
 'data': array([1., 1., 1., 1., 1., 1.]),
 'has_canonical_format': False,
 'maxprint': 50,
 'row': array([1, 0, 1, 0, 1, 0])}

Error message

No error message.

SciPy/NumPy/Python version and system information

1.11.1 1.25.0 sys.version_info(major=3, minor=11, micro=0, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /Users/ogrisel/mambaforge/envs/dev/include
    lib directory: /Users/ogrisel/mambaforge/envs/dev/lib
    name: blas
    openblas configuration: unknown
    pc file directory: /Users/ogrisel/mambaforge/envs/dev/lib/pkgconfig
    version: 3.9.0
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /Users/ogrisel/mambaforge/envs/dev/include
    lib directory: /Users/ogrisel/mambaforge/envs/dev/lib
    name: lapack
    openblas configuration: unknown
    pc file directory: /Users/ogrisel/mambaforge/envs/dev/lib/pkgconfig
    version: 3.9.0
  pybind11:
    detection method: pkgconfig
    include directory: /Users/ogrisel/mambaforge/envs/dev/include
    name: pybind11
    version: 2.10.4
Compilers:
  c:
    commands: arm64-apple-darwin20.0.0-clang
    linker: ld64
    name: clang
    version: 15.0.7
  c++:
    commands: arm64-apple-darwin20.0.0-clang++
    linker: ld64
    name: clang
    version: 15.0.7
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 0.29.35
  fortran:
    commands: /Users/runner/miniforge3/conda-bld/scipy-split_1687995684757/_build_env/bin/arm64-apple-darwin20.0.0-gfortran
    linker: ld64
    name: gcc
    version: 12.2.0
  pythran:
    include directory: ../../../_build_env/venv/lib/python3.11/site-packages/pythran
    version: 0.13.1
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: darwin
  cross-compiled: true
  host:
    cpu: arm64
    endian: little
    family: aarch64
    system: darwin
Python Information:
  path: /Users/ogrisel/mambaforge/envs/dev/bin/python
  version: '3.11'

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectA clear bug or issue that prevents SciPy from being installed or used as expectedscipy.sparse

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions