On Numpy 1.14.2 I get the following:
A = np.random.rand(1024,256,256,3)*255 # similar to a 1024 256x256 images tensor
print(np.mean(A,axis=(0,1,2))) # 64 bit works fine
print(np.mean(A.astype(np.float32),axis=(0,1,2))) # 32 bit works fails
print(np.mean(A.astype(np.float32))) # 32 bit works fine if without axis selection
results in:
[127.50656009 127.49165182 127.51390158]
[64. 64. 64.]
127.50413
Even considering float32 precision, this type of failure seems odd, especially given that the entire array's mean can be calculated succesfully