Python 3: ctypes.c_wchar has a fixed size of 2 bytes whereas py3 strings have variable character length

_Originally posted in https://github.com/nvaccess/nvda/pull/8953#issuecomment-441386821_

In python 3 unicode strings, 32 bit unicode characters are treated as one character in a string.

## Steps to reproduce
Python 2:
```
>>> len(u"👍👍👍")
6
```

Python 3:
```
>>> len(u"� � � ")
3
```

## Expected problems
It is likely that code involving offsets textInfos will break in a major way on python 3, especially for cases where getTextRange will get the requested offsets based on storyText, such as in simple edit controls. In the example above, storyText will be 3 characters long, whereas storyLength will be at least 6. This will result in broken behavior when reading through text in Notepad.

On Windows, c widechars (c_wchar) are 2 bytes in size. On python 2, the size of unicode characters in unicode strings is also two bytes. However, in Python 3, the unicode character length is variable.
[this](https://www.b-list.org/weblog/2017/sep/05/how-python-does-unicode/) is a lovely article on how Python does unicode:

> In Python 3.3 and later, the internal storage of Unicode is now dynamic and chosen on a per-string basis. Here’s how it works:
> 1. Python parses source code on the assumption that it’s UTF-8.
> 2. When it needs to create string objects, Python determines the highest code point in the string, and looks at the size of the encoding needed to store that code point as-is.
> 3. Python then chooses that encoding — which will be one of latin-1, UCS-2, or UCS-4 — to store the string.

### Discussion
@Jsteh said in https://github.com/nvaccess/nvda/pull/8953#issuecomment-441406191: I think we're going to need to have a way to fetch text as UTF-16 bytes arrays, do the work with those and then convert to strings only when returning text for presentation.

This makes sense to do, however when we convert a wchar array to a python bytes array, we'll have to do the string termination by ourselves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python 3: ctypes.c_wchar has a fixed size of 2 bytes whereas py3 strings have variable character length #8981

Steps to reproduce

Expected problems

Discussion

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Python 3: ctypes.c_wchar has a fixed size of 2 bytes whereas py3 strings have variable character length #8981

Description

Steps to reproduce

Expected problems

Discussion

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions