fix: VirtualArray's deep copy + accidental .data access#3599
fix: VirtualArray's deep copy + accidental .data access#3599ianna merged 5 commits intoscikit-hep:mainfrom
VirtualArray's deep copy + accidental .data access#3599Conversation
.data access in ak.merge_union_of_records.data attribute access in ak.merge_union_of_records
.data attribute access in ak.merge_union_of_recordsVirtualArray correctly by removing accidental .data access in ak.merge_union_of_records
VirtualArray correctly by removing accidental .data access in ak.merge_union_of_recordsVirtualArray correctly by removing accidental .data access in ak.merge_union_of_records
|
Added tests but ci will fail until #3600 is merged |
Uhhmm I think this makes it less clear. This PR just removes a |
I see your point, but I actually think keeping these changes together is clearer in this case. The Let’s keep this as a single PR that cleanly fixes the root issue and demonstrates it with tests. |
VirtualArray correctly by removing accidental .data access in ak.merge_union_of_recordsVirtualArray's .copy() should be a deep copy + handle VirtualArray correctly by removing accidental .data access in ak.merge_union_of_records`
VirtualArray's .copy() should be a deep copy + handle VirtualArray correctly by removing accidental .data access in ak.merge_union_of_records`VirtualArray's .copy() should be a deep copy + handle VirtualArray correctly by removing accidental .data access in ak.merge_union_of_records
|
I put everything here and closed #3600 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
|
This is what the copy change fixes btw: |
There is already a deep copy solution for that: >>> v
VirtualArray(array=[0 1 2 3 4 5 6 7 8 9], dtype('int32'), shape=(10,))
>>> v2 = ak.copy(v)
>>> v2
VirtualArray(array=[0 1 2 3 4 5 6 7 8 9], dtype('int32'), shape=(10,))
>>> v[0]=10
>>> v2
VirtualArray(array=[0 1 2 3 4 5 6 7 8 9], dtype('int32'), shape=(10,))
>>> v
VirtualArray(array=[10 1 2 3 4 5 6 7 8 9], dtype('int32'), shape=(10,))The bigger issue is the following. Since Awkward Arrays are immutable by design, explicit copying is only necessary when you want to ensure no shared underlying memory (e.g. copying from a NumPy-backed array that may be mutated externally). I'm not convinced that we should allow users to mutate awkward arrays (even if they are virtual). Also this is kind of bad behavior: >>> v
VirtualArray(array=UNMATERIALIZED, dtype('int32'), shape=(awkward._nplikes.shape.unknown_length,))
>>> v.materialize()
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
>>> v[0]
np.int32(0)
>>> v[0] = 1.1
>>> v
VirtualArray(array=[1 1 2 3 4 5 6 7 8 9], dtype('int32'), shape=(10,)) |
|
|
I think this not about a highlevel interface for users to copy things. This fix is needed because awkward-array itself does inplace mutation and copies. We have to make sure that VirtualArrays behave in these operations the same as numpy arrays, and this PR goes in this direction / does that. |
This is not our choice. This is standard numpy behavior In [3]: x = np.arange(10, dtype=np.int32)
In [4]: x
Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [5]: x[0] = 1.1
In [6]: x
Out[6]: array([1, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32) |
This is our choice: >>> array = ak.Array([1,2,3,4,5])
>>> array
<Array [1, 2, 3, 4, 5] type='5 * int64'>
>>> array[0]=0
Traceback (most recent call last):
File "<python-input-30>", line 1, in <module>
array[0]=0
~~~~~^^^
File "/Users/yana/Projects/refactoring/original/awkward/src/awkward/highlevel.py", line 1193, in __setitem__
with ak._errors.OperationErrorContext(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
"ak.Array.__setitem__",
^^^^^^^^^^^^^^^^^^^^^^^
(self,),
^^^^^^^^
{"where": where, "what": what},
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/Users/yana/Projects/refactoring/original/awkward/src/awkward/_errors.py", line 80, in __exit__
raise self.decorate_exception(exception_type, exception_value)
File "/Users/yana/Projects/refactoring/original/awkward/src/awkward/highlevel.py", line 1202, in __setitem__
raise TypeError("only fields may be assigned in-place (by field name)")
TypeError: only fields may be assigned in-place (by field name)
This error occurred while calling
ak.Array.__setitem__(
<Array [1, 2, 3, 4, 5] type='5 * int64'>
where = 0
what = 0
) |
|
I think there is some confusion here in the whole discussion, a |
No, a |
VirtualArray's .copy() should be a deep copy + handle VirtualArray correctly by removing accidental .data access in ak.merge_union_of_recordsVirtualArray's deep copy + accidental .data access
|
Fix in #3631 |
.copy()of anndarrayis a deep copy. It should be like that forVirtualArraytoo.Also
ak.merge_union_of_recordsasks for.datain two lines when the objects it asks for the.dataattribute are already buffers (notak.contents.Contentorak.index.Index). This accidentally returns amemoryviewobject which numpy and cupy can handle inasarray, but It doesn't look intended. Also the buffers of other backends don't implement a.dataproperty which means this won't work for virtual and typetracer arrays.