Say I have a NumPy array:
>>> X = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
>>> X
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
and an array of indexes that I want to select for each row:
>>> ixs = np.array([[1, 3], [0, 1], [1, 2]])
>>> ixs
array([[1, 3],
[0, 1],
[1, 2]])
How do I index the array X so that for every row in X I select the two indices specified in ixs?
So for this case, I want to select element 1 and 3 for the first row, element 0 and 1 for the second row, and so on. The output should be:
array([[2, 4],
[5, 6],
[10, 11]])
A slow solution would be something like this:
output = np.array([row[ix] for row, ix in zip(X, ixs)])
however this can get kinda slow for extremely long arrays. Is there a faster way to do this without a loop using NumPy?
EDIT: Some very approximate speed tests on a 2.5K * 1M array (10GB):
np.array([row[ix] for row, ix in zip(X, ixs)]) 0.16s
X[np.arange(len(ixs)), ixs.T].T 0.175s
X.take(idx+np.arange(0, X.shape[0]*X.shape[1], X.shape[1])[:,None]) 33s
np.fromiter((X[i, j] for i, row in enumerate(ixs) for j in row), dtype=X.dtype).reshape(ixs.shape) 2.4s
Solution:
You can use this:
X[np.arange(len(ixs)), ixs.T].T
Here is the reference for complex indexing.