Fix Cython 3.0 regression with time_loc_dups#55915
Merged
mroeschke merged 2 commits intopandas-dev:mainfrom Nov 10, 2023
Merged
Fix Cython 3.0 regression with time_loc_dups#55915mroeschke merged 2 commits intopandas-dev:mainfrom
mroeschke merged 2 commits intopandas-dev:mainfrom
Conversation
Member
|
LGTM, bummer that its necessary though. |
mroeschke
approved these changes
Nov 10, 2023
Member
|
Thanks @WillAyd |
This was referenced Dec 26, 2025
scoder
added a commit
to cython/cython
that referenced
this pull request
Jan 22, 2026
Python semantics dictate that we first try the mapping protocol and then the sequence protocol for subscripting. When the index is a C integer, we can optimise perfectly for list/tuple, but all other sequences suffer from having to build a Python `int` object for the index to pass it through the mapping lookup if they implement that (e.g. to support extended slicing, like NumPy arrays). Python 3.10 added type markers (for pattern matching) for explicitly declaring a type as sequence or mapping, called `Py_TPFLAGS_SEQUENCE` and `Py_TPFLAGS_MAPPING`, which can now be checked for quite quickly. If a type is marked as sequence but still implements mapping lookups for slicing, and it supports sequence subscripting, we can avoid the Python `int` creation of the mapping protocol and go straight through the sequence index lookup. With this change, indexing into Python's `array.array` and `memoryview` types is ~60% faster in a micro-benchmark. Using a C integer as dict key got slightly slower but is resolved by adding a separate up-front special case. Future NumPy versions are expected to set the sequence flag and should therefore benefit from this change as well. See numpy/numpy#30519 Benchmark is based on #7431 See https://docs.python.org/3/c-api/typeobj.html#c.Py_TPFLAGS_SEQUENCE #1807 pandas-dev/pandas#55915 pandas-dev/pandas#55179 (comment)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
re #55179
From discussion in cython/cython#1807 (comment) it looks like Cython prior to 3.0 would always use the sequence protocol for indexing with an integral value. However, Python prefers the object protocol first if available, and Cython switched to match that logic with 3.0
NumPy arrays implement both the sequence and the mapping protocol. In cases where we have untyped arrays that fall back to Python calls we will see a performance regression since this will now route through the mapping space
The changes in this PR are not meant to be an exhaustive review of the codebase, rather just a quick POC to reset the time_loc_dups benchmark