-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Description
The following test case will fail on main:
import numpy as np
# strings long enough to live in the arena
string_list = ["abc", "def", "ghi" * 10, "A¢☃€ 😊" * 100, "Abc" * 1000, "DEF"]
arr = np.array(string_list, dtype=np.dtypes.StringDType())
arr_rev = np.array(string_list[::-1], dtype=np.dtypes.StringDType())
arr_view = np.array(arr, copy=None, dtype=arr_rev.dtype)
np.testing.assert_array_equal(arr, arr_view)
This is happening because of the code path here:
numpy/numpy/_core/src/multiarray/multiarraymodule.c
Lines 1612 to 1636 in 6d4a0f7
| if (PyArray_EquivTypes(oldtype, dtype)) { | |
| if (copy != NPY_COPY_ALWAYS && STRIDING_OK(oparr, order)) { | |
| if (oldtype == dtype) { | |
| Py_INCREF(op); | |
| ret = oparr; | |
| } | |
| else { | |
| /* Create a new PyArrayObject from the caller's | |
| * PyArray_Descr. Use the reference `op` as the base | |
| * object. */ | |
| Py_INCREF(dtype); | |
| ret = (PyArrayObject *)PyArray_NewFromDescrAndBase( | |
| Py_TYPE(op), | |
| dtype, | |
| PyArray_NDIM(oparr), | |
| PyArray_DIMS(oparr), | |
| PyArray_STRIDES(oparr), | |
| PyArray_DATA(oparr), | |
| PyArray_FLAGS(oparr), | |
| op, | |
| op | |
| ); | |
| } | |
| goto finish; | |
| } |
In the test case, oldtype != dtype, so we enter the path that calls NewFromDescrAndBase with the user-provided dtype. The problem is the user-provided dtype doesn't know anything about the arena that lives on the original dtype, so blindly accepting the user-provided dtype and passing it to NewFromDescrAndBase corrupts the arena.
I think a way to fix this would be to call finalize_descr on the user-provided DType to get a fresh descriptor if its already owned by the array, and then force the user-provided dtype to share allocators with the original dtype. We could also copy the arena, but that goes against the user desire to not have a copy.
Another way to handle this is to make stringdtype always copy in this situation, and error if someone passes copy=False.
Originally posted by @ngoldbaum in #26082 (comment)