Speed up sequence subscripting using Py_TPFLAGS_SEQUENCE by scoder · Pull Request #7432 · cython/cython

scoder · 2025-12-26T07:14:05Z

Python semantics dictate that we first try the mapping protocol and then the sequence protocol for subscripting. When the index is a C integer, we can optimise perfectly for list/tuple, but all other sequences suffer from having to build a Python int object for the index to pass it through the mapping lookup if they implement that (e.g. to support extended slicing, like NumPy arrays).

Python 3.10 added type markers (for pattern matching) for explicitly declaring a type as sequence or mapping, called Py_TPFLAGS_SEQUENCE and Py_TPFLAGS_MAPPING, which can now be checked for quite quickly. If a type is marked as sequence but still implements mapping lookups for slicing, and it supports sequence subscripting, we can avoid the Python int creation of the mapping protocol and go straight through the sequence index lookup.

With this change, indexing into Python's array.array and memoryview types is ~60% faster in a micro-benchmark.
Using a C integer as dict key got slightly slower but is resolved by adding a separate up-front special case.

Benchmark is based on #7431

See
https://docs.python.org/3/c-api/typeobj.html#c.Py_TPFLAGS_SEQUENCE
#1807
pandas-dev/pandas#55915
pandas-dev/pandas#55179 (comment)

…st benchmark instead of the slowest. Scale back the timings of sub-benchmarks to the outer scale count to report comparable timings.

da-woods · 2025-12-26T11:43:35Z

I think this is reasonable although it's arguably a slight abuse of the flag (which is really only documented as being for pattern matching).

I think a type type that gives a different answer if you go through tp_sequence and tp_mapping is pretty broken. So from that point of view it seems fine to treat this flag as a strong hint and if anyone actually notices a difference in behaviour it's most likely on them.

…alue.

…ed, and also time the subscripting from Python.

Cython/Utility/ModuleSetupCode.c

Co-authored-by: da-woods <dw-git@d-woods.co.uk>

…alue.

…ed, and also time the subscripting from Python.

…e should be covered separately by the dedicated list subscript functions.

… run. This gives quicker insights long before the complete results are printed at the end of the run.

scoder · 2026-01-22T09:05:23Z

NumPy will set the sequence flag in the future: numpy/numpy#30519
Given the expected speedups, I'll merge this as soon as CI is happy.

scoder added 4 commits December 24, 2025 10:07

Improve benchmarks scaling for sub-benchmarks by scaling to the faste…

6cb62cd

…st benchmark instead of the slowest. Scale back the timings of sub-benchmarks to the outer scale count to report comparable timings.

Prefer sq_item() over mp_subscript() if Py_TPFLAGS_SEQUENCE is set.

4d1b340

Add a micro benchmark for subscripting.

6dddf38

Tweak to avoid a slight performance regression for dict.

c732b6e

scoder added this to the 3.3 milestone Dec 26, 2025

scoder added enhancement Optimization labels Dec 26, 2025

scoder added 7 commits December 26, 2025 17:59

Benchmarks: Allow debug output from benchmark modules via stderr.

281e361

Benchmarks: Reduce maximum variance of timings to a more reasonable v…

7f1a8a6

…alue.

Add NumPy arrays to getitem benchmark collections.

76eeed1

Benchmarks: Terminate each sub-benchmark after at most 60 seconds.

7a8fe80

Remove unfair overhead from benchmark.

6674a0e

Extend benchmark to make actual use of @collection_type(), if support…

035bf42

…ed, and also time the subscripting from Python.

Typo.

a4a3b68

da-woods reviewed Jan 2, 2026

View reviewed changes

Cython/Utility/ModuleSetupCode.c Outdated Show resolved Hide resolved

Cython/Utility/ModuleSetupCode.c Outdated Show resolved Hide resolved

scoder and others added 14 commits January 4, 2026 09:27

Do not set type flags in Limited API.

d9aba51

Co-authored-by: da-woods <dw-git@d-woods.co.uk>

Add a micro benchmark for subscripting.

4174584

Benchmarks: Allow debug output from benchmark modules via stderr.

90c6e7f

Benchmarks: Reduce maximum variance of timings to a more reasonable v…

b1b253c

…alue.

Add NumPy arrays to getitem benchmark collections.

be8930a

Benchmarks: Terminate each sub-benchmark after at most 60 seconds.

02b4b6f

Extend benchmark.

92a26d2

Remove unfair overhead from benchmark.

268ab3c

Extend benchmark to make actual use of @collection_type(), if support…

47ee772

…ed, and also time the subscripting from Python.

Shorten benchmark names.

8e9a9c3

Remove 'is_list' flag in optimised subscript functions since that cas…

8357a8c

…e should be covered separately by the dedicated list subscript functions.

Typo.

d3e90f7

Use existing functions for special list/tuple cases.

87db508

Reduce benchmark runtimes by skipping the initial external warmup run.

05e333f

scoder added 9 commits January 5, 2026 11:07

Speed up benchmark autoranging.

e2a7868

Exclude slow benchmark outlier runs (> +15%).

aaa275c

Speed up benchmark autoranging by avoiding the final step adaptation.

9ed4db6

Merge branch 'getitem_bench' into seq_getitem

41e3102

Log benchmark results right after finishing a single benchmark module…

9b6808a

… run. This gives quicker insights long before the complete results are printed at the end of the run.

Prevent scale <= 0 as result of autoscaling.

d989342

Merge branch 'getitem_bench' into seq_getitem

7bd2e09

Stop autoscaling when the change goes against 0.

a38ea7b

Merge branch 'getitem_bench' into seq_getitem

e52e65e

da-woods mentioned this pull request Jan 9, 2026

Do not implement mapping slots if sequence type method has sequence index argument types #7435

Merged

scoder added 3 commits January 22, 2026 09:26

Merge branch 'master' into seq_getitem

c3f2594

Merge branch 'master' into seq_getitem

4f6a2ab

Merge branch 'master' into seq_getitem

ce274c7

scoder enabled auto-merge (squash) January 22, 2026 09:08

scoder merged commit e39d094 into cython:master Jan 22, 2026
89 checks passed

scoder deleted the seq_getitem branch January 29, 2026 21:27

scoder mentioned this pull request Feb 13, 2026

[BUG] Cython performance regression with untyped NumPy array #5719

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up sequence subscripting using Py_TPFLAGS_SEQUENCE#7432

Speed up sequence subscripting using Py_TPFLAGS_SEQUENCE#7432
scoder merged 37 commits intocython:masterfrom
scoder:seq_getitem

scoder commented Dec 26, 2025

Uh oh!

da-woods commented Dec 26, 2025

Uh oh!

Uh oh!

Uh oh!

scoder commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

scoder commented Dec 26, 2025

Uh oh!

da-woods commented Dec 26, 2025

Uh oh!

Uh oh!

Uh oh!

scoder commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants