BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 by MilesCranmer · Pull Request #22878 · numpy/numpy

MilesCranmer · 2022-12-23T16:29:32Z

This fixes #22877 raised by @TonyXiang8787. The bug, introduced by #12065, results in integer overflows occurring in the following line:

numpy/numpy/lib/arraysetops.py

Lines 683 to 684 in 2b9851b

    
           outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] - 
        
                                                       ar2_min]

when mixed dtype input was passed to in1d.

The fix is to simply test for these in advance of the kind='table' method being used:

        #  2. Check overflows for (ar2 - ar2_min); dtype=ar2.dtype
        range_safe_from_overflow = ar2_range <= np.iinfo(ar2.dtype).max
        #  3. Check overflows for (ar1 - ar2_min); dtype=ar1.dtype
        range_safe_from_overflow &= int(ar1_max) - int(ar2_min) <= np.iinfo(ar1.dtype).max
        range_safe_from_overflow &= int(ar1_min) - int(ar2_min) >= np.iinfo(ar1.dtype).min

I also added some unittests to evaluate this behavior.

cc @seberg

MilesCranmer · 2022-12-23T17:02:50Z

Local tests pass. Ready for review @seberg

MilesCranmer · 2022-12-23T17:08:38Z

numpy/lib/arraysetops.py

+        #  1. Assert memory usage is not too large
        below_memory_constraint = ar2_range <= 6 * (ar1.size + ar2.size)
+        #  2. Check overflows for (ar2 - ar2_min); dtype=ar2.dtype
+        range_safe_from_overflow = ar2_range <= np.iinfo(ar2.dtype).max


This PR also corrects the bounds of the overflow check. It should have really been <= in #12065, rather than <. I noticed this after adding some new tests.

seberg · 2022-12-23T18:11:24Z

This is that last thing that would be great to have some fix for in 1.24.1 considering that it returns bad results. I expect this is good, but I need fresher eyes. But if it looks good to others: Maybe we should get it in, and I look at it again later and just follow up if I feel a different approach is better.

MilesCranmer · 2022-12-23T19:40:01Z

I'm thinking about this proposed change now and while it does fix the problem, it is more conservative than necessary. Recall we are looking for overflows in this calculation:

basic_mask = (ar1 <= ar2_max) & (ar1 >= ar2_min)
outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] - ar2_min]

basic_mask will trim ar1 to only the elements within the range of ar2. Thus, we only technically need to consider min(ar1_max, ar2_max), and max(ar1_min, ar2_min), in these calculations. i.e., the following:

# After masking, the range of ar1 is guaranteed to be
# within the range of ar2:
ar1_upper = min(int(ar1_max), int(ar2_max))
ar1_lower = max(int(ar1_min), int(ar2_min))

range_safe_from_overflow &= all((
    ar1_upper - int(ar2_min) <= np.iinfo(ar1.dtype).max,
    ar1_lower - int(ar2_min) >= np.iinfo(ar1.dtype).min
))

does that make sense?

Edit: pushed this change.

charris · 2022-12-24T22:28:26Z

I wonder why integers larger the uint16 are not tested, are they too big?

charris · 2022-12-25T18:44:30Z

Thanks @MilesCranmer. @seberg If you see anything that bothers you we can make another PR.

charris · 2022-12-25T18:44:39Z

Thanks @MilesCranmer. @seberg If you see anything that bothers you we can make another PR.

numpy#22878) * TST: Mixed integer types for in1d * BUG: Fix mixed dtype overflows for in1d (numpy#22877) * BUG: Type conversion for integer overflow check * MAINT: Fix linting issues in in1d * MAINT: ar1 overflow check only for non-empty array * MAINT: Expand bounds of overflow check * TST: Fix integer overflow in mixed boolean test * TST: Include test for overflow on mixed dtypes * MAINT: Less conservative overflow checks

MilesCranmer added 2 commits December 23, 2022 11:21

TST: Mixed integer types for in1d

54aa5bc

BUG: Fix mixed dtype overflows for in1d (numpy#22877)

dbfdcbd

MilesCranmer mentioned this pull request Dec 23, 2022

BUG: numpy.isin does not function correctly with two arrays with different integer type #22877

Closed

MilesCranmer changed the title ~~[WIP] Fix integer overflow in in1d for mixed integer dtypes #22877~~ BUG: [WIP] Fix integer overflow in in1d for mixed integer dtypes #22877 Dec 23, 2022

BUG: Type conversion for integer overflow check

f7a1439

MilesCranmer force-pushed the isin-fix-dtype branch from 83d5b2b to f7a1439 Compare December 23, 2022 16:42

MilesCranmer added 4 commits December 23, 2022 11:47

MAINT: Fix linting issues in in1d

c688977

MAINT: ar1 overflow check only for non-empty array

1d09d09

MAINT: Expand bounds of overflow check

b7f1701

TST: Fix integer overflow in mixed boolean test

96cd847

MilesCranmer changed the title ~~BUG: [WIP] Fix integer overflow in in1d for mixed integer dtypes #22877~~ BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 Dec 23, 2022

MilesCranmer commented Dec 23, 2022

View reviewed changes

charris added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported labels Dec 23, 2022

charris added this to the 1.24.1 release milestone Dec 23, 2022

TST: Include test for overflow on mixed dtypes

c8299cb

MAINT: Less conservative overflow checks

c8499c6

charris approved these changes Dec 24, 2022

View reviewed changes

charris merged commit 235dbe1 into numpy:main Dec 25, 2022

charris mentioned this pull request Dec 25, 2022

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 #22884

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Dec 25, 2022

charris removed this from the 1.24.1 release milestone Dec 25, 2022

MilesCranmer deleted the isin-fix-dtype branch December 25, 2022 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877#22878

BUG: Fix integer overflow in in1d for mixed integer dtypes #22877#22878
charris merged 9 commits intonumpy:mainfrom
MilesCranmer:isin-fix-dtype

MilesCranmer commented Dec 23, 2022 •

edited

Loading

Uh oh!

MilesCranmer commented Dec 23, 2022

Uh oh!

MilesCranmer Dec 23, 2022

Uh oh!

seberg commented Dec 23, 2022

Uh oh!

MilesCranmer commented Dec 23, 2022 •

edited

Loading

Uh oh!

charris commented Dec 24, 2022

Uh oh!

charris commented Dec 25, 2022

Uh oh!

charris commented Dec 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	outgoing_array[basic_mask] = isin_helper_ar[ar1[basic_mask] -
	ar2_min]

Uh oh!

Conversation

MilesCranmer commented Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MilesCranmer commented Dec 23, 2022

Uh oh!

MilesCranmer Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

seberg commented Dec 23, 2022

Uh oh!

MilesCranmer commented Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Dec 24, 2022

Uh oh!

charris commented Dec 25, 2022

Uh oh!

charris commented Dec 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MilesCranmer commented Dec 23, 2022 •

edited

Loading

MilesCranmer commented Dec 23, 2022 •

edited

Loading