Selecting Random Windows from Multidimensional Numpy Array Rows

I have a large array where each row is a time series and thus needs to stay in order.

I want to select a random window of a given size for each row.

Example:

>>>import numpy as np
>>>arr = np.array(range(42)).reshape(6,7)
>>>arr
array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41]])
>>># What I want to do:
>>>select_random_windows(arr, window_size=3)
array([[ 1,  2,  3],
       [11, 12, 13],
       [14, 15, 16],
       [22, 23, 24],
       [38, 39, 40]])

What an ideal solution would look like to me:

def select_random_windows(arr, window_size):
    offsets = np.random.randint(0, arr.shape[0] - window_size, size = arr.shape[1])
    return arr[:, offsets: offsets + window_size]

But unfortunately this does not work

What I’m going with right now is terribly slow:

def select_random_windows(arr, wndow_size):
    result = []
    offsets = np.random.randint(0, arr.shape[0]-window_size, size = arr.shape[1])
    for row, offset in enumerate(start_indices):
        result.append(arr[row][offset: offset + window_size])
    return np.array(result)

Sure, I could do the same with a list comprehension (and get a minimal speed boost), but I was wondering wether there is some super smart numpy vectorized way to do this.

Solution:

Here’s one leveraging np.lib.stride_tricks.as_strided

def random_windows_per_row_strided(arr, W=3):
    idx = np.random.randint(0,arr.shape[1]-W+1, arr.shape[0])
    strided = np.lib.stride_tricks.as_strided 
    m,n = arr.shape
    s0,s1 = arr.strides
    windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1))
    return windows[np.arange(len(idx)), idx]

Runtime test on bigger array with 10,000 rows –

In [469]: arr = np.random.rand(100000,100)

# @Psidom's soln
In [470]: %timeit select_random_windows(arr, window_size=3)
100 loops, best of 3: 7.41 ms per loop

In [471]: %timeit random_windows_per_row_strided(arr, W=3)
100 loops, best of 3: 6.84 ms per loop

# @Psidom's soln
In [472]: %timeit select_random_windows(arr, window_size=30)
10 loops, best of 3: 26.8 ms per loop

In [473]: %timeit random_windows_per_row_strided(arr, W=30)
100 loops, best of 3: 9.65 ms per loop

# @Psidom's soln
In [474]: %timeit select_random_windows(arr, window_size=50)
10 loops, best of 3: 41.8 ms per loop

In [475]: %timeit random_windows_per_row_strided(arr, W=50)
100 loops, best of 3: 10 ms per loop

Cycling Slicing in Python

I’ve come up with this question while trying to apply a Cesar Cipher to a matrix with different shift values for each row, i.e. given a matrix X

array([[1, 0, 8],
   [5, 1, 4],
   [2, 1, 1]])

with shift values of S = array([0, 1, 1]), the output needs to be

array([[1, 0, 8],
   [1, 4, 5],
   [1, 1, 2]])

This is easy to implement by the following code:

Y = []
for i in range(X.shape[0]):
    if (S[i] > 0):
        Y.append( X[i,S[i]::].tolist() + X[i,:S[i]:].tolist() )
    else:
        Y.append(X[i,:].tolist())
Y = np.array(Y)

This is a left-cycle-shift. I wonder how to do this in a more efficient way using numpy arrays?

Update: This example applies the shift to the columns of a matrix. Suppose that we have a 3D array

array([[[8, 1, 8],
        [8, 6, 2],
        [5, 3, 7]],

       [[4, 1, 0],
        [5, 9, 5],
        [5, 1, 7]],

       [[9, 8, 6],
        [5, 1, 0],
        [5, 5, 4]]])

Then, the cyclic right shift of S = array([0, 0, 1]) over the columns leads to

array([[[8, 1, 7],
        [8, 6, 8],
        [5, 3, 2]],

       [[4, 1, 7],
        [5, 9, 0],
        [5, 1, 5]],

       [[9, 8, 4],
        [5, 1, 6],
        [5, 5, 0]]])

Solution:

Approach #1 : Use modulus to implement the cyclic pattern and get the new column indices and then simply use advanced-indexing to extract the elements, giving us a vectorized solution, like so –

def cyclic_slice(X, S):
    m,n = X.shape
    idx = np.mod(np.arange(n) + S[:,None],n)
    return X[np.arange(m)[:,None], idx]

Approach #2 : We can also leverage the power of strides for further speedup. The idea would be to concatenate the sliced off portion from the start and append it at the end, then create sliding windows of lengths same as the number of cols and finally index into the appropriate window numbers to get the same rolled over effect. The implementation would be like so –

def cyclic_slice_strided(X, S):
    X2 = np.column_stack((X,X[:,:-1]))
    s0,s1 = X2.strides
    strided = np.lib.stride_tricks.as_strided 

    m,n1 = X.shape
    n2 = X2.shape[1]
    X2_3D = strided(X2, shape=(m,n2-n1+1,n1), strides=(s0,s1,s1))
    return X2_3D[np.arange(len(S)),S]

Sample run –

In [34]: X
Out[34]: 
array([[1, 0, 8],
       [5, 1, 4],
       [2, 1, 1]])

In [35]: S
Out[35]: array([0, 1, 1])

In [36]: cyclic_slice(X, S)
Out[36]: 
array([[1, 0, 8],
       [1, 4, 5],
       [1, 1, 2]])

Runtime test –

In [75]: X = np.random.rand(10000,100)
    ...: S = np.random.randint(0,100,(10000))

# @Moses Koledoye's soln
In [76]: %%timeit
    ...: Y = []
    ...: for i, x in zip(S, X):
    ...:     Y.append(np.roll(x, -i))
10 loops, best of 3: 108 ms per loop

In [77]: %timeit cyclic_slice(X, S)
100 loops, best of 3: 14.1 ms per loop

In [78]: %timeit cyclic_slice_strided(X, S)
100 loops, best of 3: 4.3 ms per loop

Adaption for 3D case

Adapting approach #1 for the 3D case, we would have –

shift = 'left'
axis = 1 # axis along which S is to be used (axis=1 for rows)
n = X.shape[axis]
if shift == 'left':
    Sa = S
else:
    Sa = -S    

# For rows
idx = np.mod(np.arange(n)[:,None] + Sa,n)
out = X[:,idx, np.arange(len(S))]

# For columns
idx = np.mod(Sa[:,None] + np.arange(n),n)
out = X[:,np.arange(len(S))[:,None], idx]

# For axis=0
idx = np.mod(np.arange(n)[:,None] + Sa,n)
out = X[idx, np.arange(len(S))]

There could be a way to have a generic solution for a generic axis, but I will keep it to this point.