Skip to content

Inconsistency in the length of buffers of unicode scalars (Trac #525) #1123

@numpy-gitbot

Description

@numpy-gitbot

Original ticket http://projects.scipy.org/numpy/ticket/525 on 2007-05-22 by @FrancescAlted, assigned to unknown.

I think there is an inconsistency here:

>>> import sys
>>> sys.maxunicode
65535  # using python with UCS2 here
>>> u=numpy.unicode_('popo')
>>> u.data
<read-only buffer for 0x82276e0, size 16, offset 0 at 0xb7dc3780>
>>> len(u.data)
8  # should be 16!
>>> u=numpy.array(u'popo')
>>> u.data
<read-write buffer for 0x82454d0, size 16, offset 0 at 0xb7dc3780>
>>> len(u.data)
16    # This works fine for 0-dim arrays

This prevents thinks like:

>>> numpy.ndarray(buffer=numpy.unicode_('popo'), dtype='uint32', shape=4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: buffer is too small for requested array

to work correctly.

However, the above works well in UCS4 Python interpreters:

>>> import sys
>>> sys.maxunicode
1114111
>>> import numpy
>>> u=numpy.unicode_('popo')
>>> u.data
<read-only buffer for 0x8203b60, size 16, offset 0 at 0xb7283640>
>>> len(u.data)
16
>>> numpy.ndarray(buffer=u, dtype='uint32', shape=4)
array([112, 111, 112, 111], dtype=uint32)   # Works fine

Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions