Skip to content

Python 3: update string literals to have either u, r, or b prefix #9637

@feerrenrut

Description

@feerrenrut

Problem:

In python 3, plain string literals become unicode strings. In some parts of NVDA we use plain strings for binary data. This will cause errors and strange behaviour.

To gain confidence in a Python 3 release of NVDA we need consider these strings, and how they are used, and decide if the literal should be prefixed with 'u', 'r', or 'b'.
There are more than 1000 unspecified string literals in the NVDA codebase. To count this I used regex [^u\n"]"[^"\n] on *.py files in the repo/source directory, pycharm IDE also allows to ignore results in comments.

Suggested approach:

  • Existing cases specifying 'u' are ok, they were already intended to be used as unicode strings.
  • Existing cases specifying 'r' are higher risk, they may be used for binary data. We should look at these first.
    • Using regex r".+" there seems to be just under 5000 of these strings.
  • Cases with no prefix, the vast majority will be ok to be unicode strings. There are certainly some cases that are used as bytes / binary data. These will be the hardest to find.

Looking at each string individually will take weeks, I suggest we see how we can exclude low risk areas:

  • translated strings ( _("blah"), pgettext("blah")) can be ignored. Whether these have 'u', 'r', or no prefix. We can be quite confident they will not be bytes.

Open questions:

  • How are we going to keep track of what has been looked at / excluded?
    • Is it feasible to use regex to automate adding string literals to areas that we are confident in?
  • How do we identify general cases we can exclude?

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions