Fix Cython 3.0 regression with time_loc_dups by WillAyd · Pull Request #55915 · pandas-dev/pandas

WillAyd · 2023-11-10T15:52:27Z

From discussion in cython/cython#1807 (comment) it looks like Cython prior to 3.0 would always use the sequence protocol for indexing with an integral value. However, Python prefers the object protocol first if available, and Cython switched to match that logic with 3.0

NumPy arrays implement both the sequence and the mapping protocol. In cases where we have untyped arrays that fall back to Python calls we will see a performance regression since this will now route through the mapping space

The changes in this PR are not meant to be an exhaustive review of the codebase, rather just a quick POC to reset the time_loc_dups benchmark

jbrockmendel · 2023-11-10T20:47:32Z

LGTM, bummer that its necessary though.

mroeschke · 2023-11-10T22:49:08Z

Thanks @WillAyd

Python semantics dictate that we first try the mapping protocol and then the sequence protocol for subscripting. When the index is a C integer, we can optimise perfectly for list/tuple, but all other sequences suffer from having to build a Python `int` object for the index to pass it through the mapping lookup if they implement that (e.g. to support extended slicing, like NumPy arrays). Python 3.10 added type markers (for pattern matching) for explicitly declaring a type as sequence or mapping, called `Py_TPFLAGS_SEQUENCE` and `Py_TPFLAGS_MAPPING`, which can now be checked for quite quickly. If a type is marked as sequence but still implements mapping lookups for slicing, and it supports sequence subscripting, we can avoid the Python `int` creation of the mapping protocol and go straight through the sequence index lookup. With this change, indexing into Python's `array.array` and `memoryview` types is ~60% faster in a micro-benchmark. Using a C integer as dict key got slightly slower but is resolved by adding a separate up-front special case. Future NumPy versions are expected to set the sequence flag and should therefore benefit from this change as well. See numpy/numpy#30519 Benchmark is based on #7431 See https://docs.python.org/3/c-api/typeobj.html#c.Py_TPFLAGS_SEQUENCE #1807 pandas-dev/pandas#55915 pandas-dev/pandas#55179 (comment)

WillAyd added 2 commits November 10, 2023 10:01

fix performance regression with time_loc_dups and Cython 3

323c13c

replace all index calls with PySequence_GetItem

e52a6c3

mroeschke added this to the 2.2 milestone Nov 10, 2023

mroeschke added Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation labels Nov 10, 2023

mroeschke approved these changes Nov 10, 2023

View reviewed changes

mroeschke merged commit d650212 into pandas-dev:main Nov 10, 2023

WillAyd deleted the fix-dup-perf branch November 11, 2023 00:07

rhshadrach mentioned this pull request Nov 12, 2023

DEP: Use Cython 3.0 #55179

Merged

5 tasks

This was referenced Dec 26, 2025

ENH: Set Py_TPFLAGS_SEQUENCE for ndarray numpy/numpy#30519

Closed

Speed up sequence subscripting using Py_TPFLAGS_SEQUENCE cython/cython#7432

Merged

scoder mentioned this pull request Jan 21, 2026

Do not implement mapping slots if sequence type method has sequence index argument types cython/cython#7435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Cython 3.0 regression with time_loc_dups#55915

Fix Cython 3.0 regression with time_loc_dups#55915
mroeschke merged 2 commits intopandas-dev:mainfrom
WillAyd:fix-dup-perf

WillAyd commented Nov 10, 2023 •

edited

Loading

Uh oh!

jbrockmendel commented Nov 10, 2023

Uh oh!

mroeschke commented Nov 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

WillAyd commented Nov 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Nov 10, 2023

Uh oh!

mroeschke commented Nov 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WillAyd commented Nov 10, 2023 •

edited

Loading