fix: VirtualArray's .copy() method should be a deep copy#3600
fix: VirtualArray's .copy() method should be a deep copy#3600ikrommyd wants to merge 1 commit intoscikit-hep:mainfrom
VirtualArray's .copy() method should be a deep copy#3600Conversation
VirtualArray's .copy() should be a deep copy
VirtualArray's .copy() should be a deep copyVirtualArray's .copy() method should be a deep copy
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
ianna
left a comment
There was a problem hiding this comment.
@ikrommyd - Thanks for catching the bug! I'm a bit concerned that the tests are being adjusted after the fact. Let's aim to batch small VirtualArray-related changes into fewer, more complete PRs—unless it's something blocking or critical. Thanks!
Yes the original tests are wrong. When I wrote them I probably thought that |
|
Closing to add everything in #3599 |
For Since It’s worth profiling specific use cases to see if this impacts performance noticeably in your context. Another point: moving to a deep copy could invalidate This could lead to subtle bugs or performance issues in JIT-compiled code relying on those pointers. It’s important to consider this when changing from shallow to deep copies, especially in performance-critical sections using Numba. |
|
Well we could potentially implement In [1]: import awkward as ak
...: import numpy as np
...:
...: from awkward._nplikes.virtual import VirtualArray, materialize_if_virtual
...: from awkward._nplikes.numpy import Numpy
...: from awkward._nplikes.shape import unknown_length
...:
...: numpy = Numpy.instance()
...:
...: def shape_generator():
...: print("CALLING SHAPE GENERATOR!")
...: return (10,)
...:
...: def data_generator():
...: print("CALLING DATA GENERATOR!")
...: return np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=np.int32)
...:
...: v = VirtualArray(numpy, (unknown_length,), np.int32, data_generator, shape_generator)
In [2]: v.materialize()
CALLING DATA GENERATOR!
Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [3]: v2 = v.copy()
In [4]: v2[0] = 111
In [5]: v
Out[5]: VirtualArray(array=[111 1 2 3 4 5 6 7 8 9], dtype('int32'), shape=(10,))
In [6]: v2
Out[6]: VirtualArray(array=[111 1 2 3 4 5 6 7 8 9], dtype('int32'), shape=(10,))Regarding numba, we do not let virtual arrays exists in numba compiled function and ask the user to materialize the array before passing it in. |
While adding tests for #3599, I found this bug/typo.
.copy()of anndarrayis a deep copy. It should be like that forVirtualArraytoo. This is semi-critical too. It's probably not being hit often but I just hit it while writing tests and was getting wrong results.