Weird behaviour of non-ASCII Python identifiers

I have learnt from PEP 3131 that non-ASCII identifiers were supported in Python, though it’s not considered best practice.

However, I get this strange behaviour, where my 𝜏 identifier (U+1D70F) seems to be automatically converted to τ (U+03C4).

class Base(object):
    def __init__(self):
        self.𝜏 = 5 # defined with U+1D70F

a = Base()
print(a.𝜏)     # 5             # (U+1D70F)
print(a.τ)     # 5 as well     # (U+03C4) ? another way to access it?
d = a.__dict__ # {'τ':  5}     # (U+03C4) ? seems converted
print(d['τ'])  # 5             # (U+03C4) ? consistent with the conversion
print(d['𝜏'])  # KeyError: '𝜏' # (U+1D70F) ?! unexpected!

Is that expected behaviour? Why does this silent conversion occur? Does it have anything to see with NFKC normalization? I thought this was only for canonically ordering Unicode character sequences

Solution:

Per the documentation on identifiers:

All identifiers are converted into the normal form NFKC while parsing;
comparison of identifiers is based on NFKC.

You can see that U+03C4 is the appropriate result using unicodedata:

>>> import unicodedata
>>> unicodedata.normalize('NFKC', '𝜏')
'τ'

However, this conversion doesn’t apply to string literals, like the one you’re using as a dictionary key, hence it’s looking for the unconverted character in a dictionary that only contains the converted character.

self.𝜏 = 5  # implicitly converted to "self.τ = 5"
a.𝜏  # implicitly converted to "a.τ"
d['𝜏']  # not converted

You can see similar problems with e.g. string literals used with getattr:

>>> getattr(a, '𝜏')
Traceback (most recent call last):
  File "python", line 1, in <module>
AttributeError: 'Base' object has no attribute '𝜏'
>>> getattr(a, unicodedata.normalize('NFKD', '𝜏'))
5

Concatenate string literals to generate variable name

Question

In python, what is the shortest/easiest way to create a function which allows me to access a variable by concatenating two string literals to generate the variable’s name?


Background

In C, I can do something like so:

#define CONCAT(x,y) x ## _ ## y

Then, later in my code, I could do something like:

int i = CONCAT(PRODUCT,BANANA).

Assuming a macro exists with the name PRODUCT_BANANA, its value is now assigned to i. I can accomplish something similar in shell scripts via indirection.


Question – Redux

How can I accomplish this same functionality in python? I’m doing this because I have a python class with thousands of variables for dozens of different products, i.e.

class A(object):
    BANANA_ADDRESS0 = 0xABCD;
    PINEAPPLE_ADDRESS0 = 0x1234;
    BANANA_ADDRESS1 = 0x4567;
    PINEAPPLE_ADDRESS1 = 0x1000;

I’d like to be able to have a function that can be, for example, executed via someFunc("BANANA", "ADDRESS0"), resolve the value as A.BANANA_ADDRESS0, and return the associated value (0xABCD, in this case).


Extra

Assuming the above is possible, is it possible to have the function always interpret the supplied function arguments as string literals, so function calls don’t need the arguments wrapped in single/double quotes? i.e. so it can be called via someFunc(BANANA, ADDRESS0), rather than someFunc("BANANA", "ADDRESS0")?

Solution:

The first part is easy:

class A(object):
    BANANA_ADDRESS0 = 0xABCD;
    PINEAPPLE_ADDRESS0 = 0x1234;
    BANANA_ADDRESS1 = 0x4567;
    PINEAPPLE_ADDRESS1 = 0x1000;

    @classmethod
    def some_func(cls, name_a, name_b):
        name = '{}_{}'.format(name_a, name_b)
        return getattr(cls, name)

value = A.some_func('BANANA', 'ADDRESS1')

But the second part is not possible unless you have a limited set of names, in which case you would also have to have

BANANA = 'BANANA'
PINEAPPLE = 'PINEAPPLE'

etc