When storing bytes in np array, they cannot be consistently recovered, e.g., when using in the 'bytes' argument to UUID. I assume all of this is a victim of the 2to3 transition and some things still need to be sorted out.
The following is on Fedora 19, numpy 1.7.1 (last stable), python 3.3.2
In [25]: x = np.array(b'\0' * 16)
In [26]: x
Out[26]: array(b'', dtype='|S16')
In [27]: bytes(x)
Out[27]: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
In [38]: x = np.array([b'\0' * 16])
In [39]: x[0]
Out[39]: b''
Obviously, it would return the same, no matter how many \x00 bytes I stored in it
In [40]: x = np.array([b'\0' * 16] * 5)
In [41]: x
Out[41]: array([b'', b'', b'', b'', b''], dtype='|S16')
In [42]: x.tolist()
Out[42]: [b'', b'', b'', b'', b'']
Some various examples you can try yourself, numpy will drop all trailing zero bytes.
In [43]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10])
In [44]: x[0]
Out[44]: b'\x00\x00\x00\x00\x001'
Even more confusing hence is
In [49]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10])
In [50]: bytes(x[0])
Out[50]: b'\x00\x00\x00\x00\x001'
In [51]: bytes(x)
Out[51]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
It's obvious why it does that, but is should not. Possibly I am overlooking a functionality - but how can I recover all the bytes including trailing \x00 in each array location using the array interface?
I think the behavior numpy now has for bytes would be OK for strings - but for bytes it should be returning the full data. Similar to what you could do with "void", but it should transparently return just all the bytes.
In [52]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10], dtype=np.void)
In [53]: x
Out[53]: array([[ 0 0 0 0 0 49 0 0 0 0 0 0 0 0 0 0]], dtype='|V16')
In [54]: bytes(x)
Out[54]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
In [55]: bytes(x[0])
Out[55]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
-Alexander
When storing bytes in np array, they cannot be consistently recovered, e.g., when using in the 'bytes' argument to UUID. I assume all of this is a victim of the 2to3 transition and some things still need to be sorted out.
The following is on Fedora 19, numpy 1.7.1 (last stable), python 3.3.2
In [25]: x = np.array(b'\0' * 16)
In [26]: x
Out[26]: array(b'', dtype='|S16')
In [27]: bytes(x)
Out[27]: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
In [38]: x = np.array([b'\0' * 16])
In [39]: x[0]
Out[39]: b''
Obviously, it would return the same, no matter how many \x00 bytes I stored in it
In [40]: x = np.array([b'\0' * 16] * 5)
In [41]: x
Out[41]: array([b'', b'', b'', b'', b''], dtype='|S16')
In [42]: x.tolist()
Out[42]: [b'', b'', b'', b'', b'']
Some various examples you can try yourself, numpy will drop all trailing zero bytes.
In [43]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10])
In [44]: x[0]
Out[44]: b'\x00\x00\x00\x00\x001'
Even more confusing hence is
In [49]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10])
In [50]: bytes(x[0])
Out[50]: b'\x00\x00\x00\x00\x001'
In [51]: bytes(x)
Out[51]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
It's obvious why it does that, but is should not. Possibly I am overlooking a functionality - but how can I recover all the bytes including trailing \x00 in each array location using the array interface?
I think the behavior numpy now has for bytes would be OK for strings - but for bytes it should be returning the full data. Similar to what you could do with "void", but it should transparently return just all the bytes.
In [52]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10], dtype=np.void)
In [53]: x
Out[53]: array([[ 0 0 0 0 0 49 0 0 0 0 0 0 0 0 0 0]], dtype='|V16')
In [54]: bytes(x)
Out[54]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
In [55]: bytes(x[0])
Out[55]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
-Alexander