Skip to content

Unexpected numpy.unique behavior with return_inverse=True on uint8 dtype #8664

@justinhx

Description

@justinhx

numpy.unique returns erroneous results when the number 63 is encountered in masked uint8 array with return_index=True:

>>> src_ds = '/Users/histo/S2_10_T_DN_2016_7_27_0_4328_repro.tif'
>>> src_ds = gdal.Open(src_ds)
>>> src = src_ds.GetRasterBand(1).ReadAsArray()
>>> src = np.ma.masked_equal(src, 0)
>>> src = src.ravel()
>>> s_values, s_idx, s_counts = np.unique(src, return_index=True, return_inverse=True)
>>> s_values
masked_array(data = [3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
 55 56 57 58 59 60 61 62 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63
 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 --
 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63
 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 --
 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63 -- 63
 -- 63 -- 63 -- 63 -- 63 -- `

This is reproducible:

import numpy as np

x = np.array([64, 0, 1, 2, 3, 62, 62, 0, 0, 0, 1, 2, 0, 62, 0], dtype='uint8')
y = np.ma.masked_equal(x, 0)
v, i, c = np.unique(y, return_index=True, return_counts=True)
print(v)
# [1 2 3 62 -- 64]
x = np.array([64, 0, 1, 2, 3, 63, 63, 0, 0, 0, 1, 2, 0, 63, 0], dtype='uint8')
y = np.ma.masked_equal(x, 0)
v, i, c = np.unique(y, return_index=True, return_counts=True)
print(v)
# [1 2 3 -- 63 -- 63 -- 64]
x = np.array([64, 0, 1, 2, 3, 63, 63, 0, 0, 0, 1, 2, 0, 63, 0], dtype='uint8')
y = np.ma.masked_equal(x, 0)
v = np.unique(y)
print(v)
# [1 2 3 63 64 --]

A workaround has been suggested on stackoverflow by changing the dtype to int16:

x = np.array([64, 0, 1, 2, 3, 63, 63, 0, 0, 0, 1, 2, 0, 63, 0], dtype='uint8')
y = np.ma.masked_equal(x.astype('int16'), 0)
v, i, c = np.unique(y, return_index=True, return_counts=True)
print(v)
# [1 2 3 63 64 --]

This works but greatly increases the memory footprint which is undesirable.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions