Describe your issue.
This might be an surprising unintended consequence of #18509.
The .tocoo() method of dia_array calls np.arange() without a dtype argument causing the resulting coo_array to use np.int64 indices while creating the same COO array directly from the small numpy array would have used a more memory efficient np.int32-based indexing.
This was discovered in scikit-learn/scikit-learn#27240 (comment) when investigating unexpected test failures when adding sparse array support to scikit-learn.
See the reproducer below:
Reproducing Code Example
>>> import numpy as np
>>> from scipy.sparse import coo_array, dia_array
>>> from pprint import pprint
>>>
>>> small_array = np.ones(shape=(2, 3))
>>> pprint(dia_array(small_array).__dict__)
{'_shape': (2, 3),
'data': array([[1., 0., 0.],
[1., 1., 0.],
[0., 1., 1.],
[0., 0., 1.]]),
'maxprint': 50,
'offsets': array([-1, 0, 1, 2], dtype=int32)}
>>> pprint(coo_array(small_array).__dict__)
{'_shape': (2, 3),
'col': array([0, 1, 2, 0, 1, 2], dtype=int32),
'data': array([1., 1., 1., 1., 1., 1.]),
'has_canonical_format': True,
'maxprint': 50,
'row': array([0, 0, 0, 1, 1, 1], dtype=int32)}
>>> pprint(dia_array(small_array).tocoo().__dict__)
{'_shape': (2, 3),
'col': array([0, 0, 1, 1, 2, 2]),
'data': array([1., 1., 1., 1., 1., 1.]),
'has_canonical_format': False,
'maxprint': 50,
'row': array([1, 0, 1, 0, 1, 0])}
Error message
SciPy/NumPy/Python version and system information
1.11.1 1.25.0 sys.version_info(major=3, minor=11, micro=0, releaselevel='final', serial=0)
Build Dependencies:
blas:
detection method: pkgconfig
found: true
include directory: /Users/ogrisel/mambaforge/envs/dev/include
lib directory: /Users/ogrisel/mambaforge/envs/dev/lib
name: blas
openblas configuration: unknown
pc file directory: /Users/ogrisel/mambaforge/envs/dev/lib/pkgconfig
version: 3.9.0
lapack:
detection method: pkgconfig
found: true
include directory: /Users/ogrisel/mambaforge/envs/dev/include
lib directory: /Users/ogrisel/mambaforge/envs/dev/lib
name: lapack
openblas configuration: unknown
pc file directory: /Users/ogrisel/mambaforge/envs/dev/lib/pkgconfig
version: 3.9.0
pybind11:
detection method: pkgconfig
include directory: /Users/ogrisel/mambaforge/envs/dev/include
name: pybind11
version: 2.10.4
Compilers:
c:
commands: arm64-apple-darwin20.0.0-clang
linker: ld64
name: clang
version: 15.0.7
c++:
commands: arm64-apple-darwin20.0.0-clang++
linker: ld64
name: clang
version: 15.0.7
cython:
commands: cython
linker: cython
name: cython
version: 0.29.35
fortran:
commands: /Users/runner/miniforge3/conda-bld/scipy-split_1687995684757/_build_env/bin/arm64-apple-darwin20.0.0-gfortran
linker: ld64
name: gcc
version: 12.2.0
pythran:
include directory: ../../../_build_env/venv/lib/python3.11/site-packages/pythran
version: 0.13.1
Machine Information:
build:
cpu: x86_64
endian: little
family: x86_64
system: darwin
cross-compiled: true
host:
cpu: arm64
endian: little
family: aarch64
system: darwin
Python Information:
path: /Users/ogrisel/mambaforge/envs/dev/bin/python
version: '3.11'
Describe your issue.
This might be an surprising unintended consequence of #18509.
The
.tocoo()method ofdia_arraycallsnp.arange()without adtypeargument causing the resultingcoo_arrayto usenp.int64indices while creating the same COO array directly from the small numpy array would have used a more memory efficientnp.int32-basedindexing.This was discovered in scikit-learn/scikit-learn#27240 (comment) when investigating unexpected test failures when adding sparse array support to scikit-learn.
See the reproducer below:
Reproducing Code Example
Error message
SciPy/NumPy/Python version and system information