Skip to content

python: use PyUnicode_FromStringAndSize()#14895

Merged
Mytherin merged 1 commit intoduckdb:mainfrom
methane:python-use-capi
Nov 19, 2024
Merged

python: use PyUnicode_FromStringAndSize()#14895
Mytherin merged 1 commit intoduckdb:mainfrom
methane:python-use-capi

Conversation

@methane
Copy link
Copy Markdown
Contributor

@methane methane commented Nov 19, 2024

DuckDB introduced optimization for UTF-8 decoder.
It is up to 40% faster for short non-ASCII case.
But it is 4x slower for long ASCII case.

Python has optimized code to decode ASCII. So decoding UTF-8 containing long ASCII part is faster than UTF8Proc::UTF8ToCodepoint.
And I am optimizing short non-ASCII case handling in CPython.

ref: python/cpython#126025 (comment)

Background

  • Using PEP 393 based API that heavily depending on current CPython internal in 3rd party code makes difficult to evolve Python internal (e.g. use UTF-8 as internal representation of Unicode).
  • Using PEP 393 slows down Python implementations other than CPython that use UTF-8 string representations. e.g. PyPy.
  • PyUnicode_FromStringAndSize is Stable ABI. Moving from non-Stable ABI to Stable ABI makes you possible to build Python modules that works with several Python versions.

@Mytherin
Copy link
Copy Markdown
Collaborator

Thanks! Playing around with this locally it seems to perform around as well as the current implementation, in which case definitely agreed we should switch back to the stable API.

@Mytherin Mytherin merged commit fbbfc4a into duckdb:main Nov 19, 2024
@methane methane deleted the python-use-capi branch November 20, 2024 03:44
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 24, 2024
Top-N: Improve performance with large heaps, and correctly call Reduce (duckdb/duckdb#14900)
python: use PyUnicode_FromStringAndSize() (duckdb/duckdb#14895)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 24, 2024
Top-N: Improve performance with large heaps, and correctly call Reduce (duckdb/duckdb#14900)
python: use PyUnicode_FromStringAndSize() (duckdb/duckdb#14895)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants