Skip to content

information loss with bytes type in array elements when accessing through array interface #3878

@2sn

Description

@2sn

When storing bytes in np array, they cannot be consistently recovered, e.g., when using in the 'bytes' argument to UUID. I assume all of this is a victim of the 2to3 transition and some things still need to be sorted out.

The following is on Fedora 19, numpy 1.7.1 (last stable), python 3.3.2

In [25]: x = np.array(b'\0' * 16)

In [26]: x
Out[26]: array(b'', dtype='|S16')

In [27]: bytes(x)
Out[27]: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

In [38]: x = np.array([b'\0' * 16])
In [39]: x[0]
Out[39]: b''

Obviously, it would return the same, no matter how many \x00 bytes I stored in it

In [40]: x = np.array([b'\0' * 16] * 5)

In [41]: x
Out[41]: array([b'', b'', b'', b'', b''], dtype='|S16')

In [42]: x.tolist()
Out[42]: [b'', b'', b'', b'', b'']

Some various examples you can try yourself, numpy will drop all trailing zero bytes.

In [43]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10])

In [44]: x[0]
Out[44]: b'\x00\x00\x00\x00\x001'

Even more confusing hence is

In [49]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10])

In [50]: bytes(x[0])
Out[50]: b'\x00\x00\x00\x00\x001'

In [51]: bytes(x)
Out[51]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

It's obvious why it does that, but is should not. Possibly I am overlooking a functionality - but how can I recover all the bytes including trailing \x00 in each array location using the array interface?

I think the behavior numpy now has for bytes would be OK for strings - but for bytes it should be returning the full data. Similar to what you could do with "void", but it should transparently return just all the bytes.

In [52]: x = np.array([b'\0' * 5 + b'1' + b'\0' * 10], dtype=np.void)

In [53]: x
Out[53]: array([[ 0 0 0 0 0 49 0 0 0 0 0 0 0 0 0 0]], dtype='|V16')

In [54]: bytes(x)
Out[54]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

In [55]: bytes(x[0])
Out[55]: b'\x00\x00\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

-Alexander

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions