Skip to content

ENH: Set Py_TPFLAGS_SEQUENCE for ndarray #30519

@scoder

Description

@scoder

Proposed new feature or change:

The type flags Py_TPFLAGS_SEQUENCE and Py_TPFLAGS_MAPPING were added in Python 3.10 to disambiguate sequence and mapping types for pattern matching. In Cython 3.3, we would like to use them to avoid passing through the costly mapping protocol when we have a C integer for indexing into a sequence, but the sequence also implements the mapping lookup for extended slicing, like NumPy's ndarray. As of today, we have to create a Python int object for the C index value just to pass it through the mapping lookup and have the other side unpack it into a C index again.

Going straight for the sequence protocol makes indexing into (untyped) Python array.array and memoryview objects ~60% faster in my micro benchmarks. Obviously, unpacking the data into a typed Cython memoryview makes the access still much faster than that, but for cases where the unpacking itself if too costly in comparison to the lookup, and for cases where the type of the input object is not statically known at compile time, a fast, untyped lookup can still be beneficial.

I'd like to ask for NumPy's ndarray type to set the Py_TPFLAGS_SEQUENCE type flag as well in order to benefit from optimisations in this direction.

Note that this flag is inherited, so subtypes of ndarray will then also be marked as sequence, unless they state otherwise. There might be cases where user code does not expect this.

References:
https://docs.python.org/3/c-api/typeobj.html#c.Py_TPFLAGS_SEQUENCE

Related old tickets:
cython/cython#1807
pandas-dev/pandas#55915
pandas-dev/pandas#55179 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions