Skip to content

BUG: sparse.linalg: Cast index arrays to intc before calling SuperLU functions#18644

Merged
tylerjereddy merged 7 commits intoscipy:mainfrom
perimosocordiae:linsolve-intc
Jul 2, 2023
Merged

BUG: sparse.linalg: Cast index arrays to intc before calling SuperLU functions#18644
tylerjereddy merged 7 commits intoscipy:mainfrom
perimosocordiae:linsolve-intc

Conversation

@perimosocordiae
Copy link
Copy Markdown
Member

Reference issue

Fixes gh-18603.

What does this implement/fix?

The SuperLU code expects sparse matrix index arrays to have type intc, so this PR adds a cast prior to those calls. See the linked issue for context.

@perimosocordiae perimosocordiae added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse and removed scipy.sparse labels Jun 7, 2023
@perimosocordiae
Copy link
Copy Markdown
Member Author

Test failure seems to be unrelated, only appearing on "Linux Meson tests / Python-debug & ATLAS":

FAILED ../linalg/tests/test_decomp_cossin.py::test_cossin_separate[float64] - AssertionError: 
Not equal to tolerance rtol=0, atol=2.22045e-15

Mismatched elements: 3897 / 6400 (60.9%)
Max absolute difference: 0.72137716
Max relative difference: 2.
 x: array([[ 0.107602,  0.008004,  0.163714, ..., -0.094327, -0.021723,
        -0.074046],
       [ 0.044608, -0.063854,  0.031058, ...,  0.085616,  0.000242,...
 y: array([[ 0.107602,  0.008004,  0.163714, ..., -0.094327,  0.021723,
        -0.074046],
       [ 0.044608, -0.063854,  0.031058, ...,  0.085616, -0.000242,...
= 1 failed, 42037 passed, 2802 skipped, 147 xfailed, 8 xpassed in 1030.29s (0:17:10) =

@ilayn
Copy link
Copy Markdown
Member

ilayn commented Jun 7, 2023

For some reason that test is flaky on that run. I've restarted it. Typically it passes on the second run but I'll check if the lapack version is the issue

@WarrenWeckesser
Copy link
Copy Markdown
Member

From the little that is visible in the error log, it looks like a column of one of the results might have a flipped sign. Is this another case of the solution not being unique?

Comment thread scipy/sparse/linalg/_dsolve/linsolve.py
@ilayn
Copy link
Copy Markdown
Member

ilayn commented Jun 11, 2023

From the little that is visible in the error log, it looks like a column of one of the results might have a flipped sign. Is this another case of the solution not being unique?

I though the same but only this job is consistantly flaking. Hence I'm not sure what is happening there.

@dschult
Copy link
Copy Markdown
Contributor

dschult commented Jun 15, 2023

Looks like casting='safe' doesn't allow downcasting. I think we either want the default casting rule or to use casting=same_kind. (My reading of the astype docstring is that casting doesn't check values when casting. It only checks that the types are related according to the rule requested. And the casting='safe' rule doesn't allow int64 -> int32 because that might not preserve values. If we want to check the values to see whether preserving values is in danger, we'd have to do that ourselves.

I'd suggest we remove the casting='safe' argument in this PR (revert the latest commit?)

@stefanv
Copy link
Copy Markdown
Member

stefanv commented Jun 15, 2023

Looks like casting='safe' doesn't allow downcasting. I think we either want the default casting rule or to use casting=same_kind. (My reading of the astype docstring is that casting doesn't check values when casting. It only checks that the types are related according to the rule requested. And the casting='safe' rule doesn't allow int64 -> int32 because that might not preserve values. If we want to check the values to see whether preserving values is in danger, we'd have to do that ourselves.

I'd suggest we remove the casting='safe' argument in this PR (revert the latest commit?)

The problem this is trying to address:

In [3]: np.int64(123123123123123123).astype(np.int32)
Out[3]: 23327667

Indeed, in my initial tests, I tested on scalars and not arrays, which behave differently. Scalar behavior, desired:

In [4]: np.int64(123123123123123123).astype(np.int32, casting='safe')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 np.int64(123123123123123123).astype(np.int32, casting='safe')

TypeError: Cannot cast scalar from dtype('int64') to dtype('int32') according to the rule 'safe'`
In [5]: np.int64(12312312).astype(np.int32, casting='safe')
Out[5]: 12312312

Array behavior, not desired:

In [9]: np.array([12312312,], dtype=np.int64).astype(np.int32, casting='safe')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 np.array([12312312,], dtype=np.int64).astype(np.int32, casting='safe')

TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

I'm not sure what the idiomatic way is of spelling "cast when the values fit" as it does in the scalar case. Do we need an explicit value check? Something like:

ii = np.iinfo(np.int32)
np.all(x <= ii.max && x >= ii.min)

@dschult
Copy link
Copy Markdown
Contributor

dschult commented Jun 15, 2023

I think this PR isn't trying to fix the problem of downcasting. That problem already existed for this library. At least that is my understanding for this comment from #18603)
It "aggressively downcast" which I assume to mean downcast without checking.

I think this PR is trying to get back to the state before #18509 ... where the index is aggressively downcast so it can work with SPLU. That is, it is a quick fix. The slower fix would be to make SPLU work for int64. I guess the above suggestion about checking values before downcasting is a "middle-speed" fix.

What's the performance hit for

ii = np.iinfo(np.int32)
np.all(x <= ii.max && x >= ii.min)

Is it negligible compared to the SPLU computations?

Am I understanding the conversation correctly?

@stefanv
Copy link
Copy Markdown
Member

stefanv commented Jun 15, 2023

My concern is that downcasting aggressively will return some result, but it won't be a correct result for the case where int64 indices were needed (i.e., large sparse matrix). SuperLU cannot handle those arrays, so we should warn the users. But it is probably true that this incorrect behavior was in SciPy for many versions.

This reverts commit 59e7578.

Safe casting does not behave the way I thought, i.e. it does not take
values into consideration. Reverting this patch will cast large
indices to a wrapped around int32 version, which is incorrect, but
which has also been the status quo for a long time.

A future patch should put in place safeguards against incorrect index
casting.
@stefanv
Copy link
Copy Markdown
Member

stefanv commented Jun 15, 2023

I've reverted the patch anyway, since that was not the correct solution to the problem.

@perimosocordiae
Copy link
Copy Markdown
Member Author

Using casting='same_kind' would probably be the right thing here, but we know that sparse containers will only have int32 or int64 index arrays so it's not really that useful in practice.

Re: the scalar vs array behavior difference, that looks related to numpy/numpy#8987

If we want to check for overflow explicitly, we could do some fast checks up front before the slow check:

  1. indptr[-1] <= int32max because we know that indptr is always sorted
  2. max(*shape) <= int32max as a cheap way to avoid checking the full indices array
  3. indices.max() <= int32max only in cases where (1) was true and (2) was false

@seberg
Copy link
Copy Markdown
Contributor

seberg commented Jun 19, 2023

There is unfortunately no clear idiomatic way to spell "cast if fits". For advanced indexing (with arrays), NumPy even just uses "same kind" and gets away with it, although there we are talking about indices that are by definition impossibly large to be a valid index.

There was talk about adding an asindex() or so ufunc for the spefic purprose of doing this type of casts (I think also allowing floats if they are exact integers). I don't mind doing that, but I am not sure if that helps you.
In some cases np.int32(int(...)) might actually be a valid work-around (or operator.index() first, since int() is agressive).

Whether (or how) to apply cast safety especially to Python scalars is something I haven't quite figured out to be honest. (I am leaning to just considering them "safe", but raising anyway when it doesn't fit; of course the NumPy integers overflow on the same thing though :()

Sorry for probably too much details. Introducing a cast mode (or changing "same kind") that checks values may be a way forward. That does require explicit loops for that purprose, although unless we wish to support float -> int in this manner as well (which pandas might like I think), that shouldn't be super hard.

manually check for safe downcasting
Comment thread scipy/sparse/linalg/_dsolve/linsolve.py Outdated
@tupui tupui added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Jun 26, 2023
@tupui tupui added this to the 1.11.1 milestone Jun 26, 2023
@tupui
Copy link
Copy Markdown
Member

tupui commented Jun 26, 2023

AFAIU this would fix some new bugs in 1.11? I put the backport-candidate label so this can get in as soon as possible.

@tupui tupui linked an issue Jun 26, 2023 that may be closed by this pull request
@perimosocordiae
Copy link
Copy Markdown
Member Author

I just pushed a fix commit. Pending CI results, I think this should now be in good shape.

Note: The CircleCI build_scipy action failed with a suspicious message: no such option: --install-option (for the pip command).

@tupui
Copy link
Copy Markdown
Member

tupui commented Jun 28, 2023

For the issue in the CI, you need to merge main into your PR. CircleCI is not good with that 😅

@dschult
Copy link
Copy Markdown
Contributor

dschult commented Jun 28, 2023

Then it looks like the test failures are not indicating issues with this PR.
I agree with @perimosocordiae that this is in good shape.

@tylerjereddy tylerjereddy modified the milestones: 1.11.1, 1.11.2 Jun 28, 2023
Copy link
Copy Markdown
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like folks agree this should go in and get backported. The CI failures don't look related and when I rebase this branch locally on main it passes the full testsuite on x86_64 Linux.

@tylerjereddy tylerjereddy merged commit 8501b7c into scipy:main Jul 2, 2023
loganbvh added a commit to loganbvh/py-tdgl that referenced this pull request Jul 19, 2023
loganbvh added a commit to loganbvh/py-tdgl that referenced this pull request Jul 20, 2023
* Add current_through_paths

* Require scipy<1.11

Due to these bugs scipy/scipy#18644, scipy/scipy#18603

* Update test_solution.py

* Run notebooks
@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse.linalg scipy.sparse

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: test failures in NetworkX related to sparse.linalg.splu BUG: Floating point CSC with int64 indices doesn't work with linalg

9 participants