perf: use cached Latin-1 strings in CPyStr_GetItem by tobymao · Pull Request #1 · VaggelisD/sqlglot-mypy

tobymao · 2026-03-18T06:36:03Z

For characters < 256, use PyUnicode_FromOrdinal() which returns CPython's cached single-char Latin-1 string objects instead of allocating a new PyUnicode object on every str[i] access. This avoids allocation+deallocation overhead in character-scanning hot loops.

Benchmark (mypyc-compiled tokenizer, tpch query x5000):
Before: 0.337 ms/iter
After: 0.223 ms/iter (-34%)

Isolated micro-benchmark (scan 15K chars, 1000 iters):
Before: 0.60s
After: 0.26s (-57%)

Characters >= 256 (BMP, supplementary) keep the original PyUnicode_New allocation path unchanged.

For characters < 256, use PyUnicode_FromOrdinal() which returns CPython's cached single-char Latin-1 string objects instead of allocating a new PyUnicode object on every str[i] access. This avoids allocation+deallocation overhead in character-scanning hot loops. Benchmark (mypyc-compiled tokenizer, tpch query x5000): Before: 0.337 ms/iter After: 0.223 ms/iter (-34%) Isolated micro-benchmark (scan 15K chars, 1000 iters): Before: 0.60s After: 0.26s (-57%) Characters >= 256 (BMP, supplementary) keep the original PyUnicode_New allocation path unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…sqlglot) This is the minimal set of fixes needed for `separate=True` to build and run correctly against sqlglot, a ~100-module project with cross-group class inheritance, generator helper classes, non-ext subclasses with fast methods, and mutually-dependent compiled modules. Each of the fixes below is a real bug that was never hit by mypy itself (mypy's setup.py uses multi_file on Windows only, never separate=True) or by the toy fixtures in mypyc's TestRunSeparate. 1. Non-extension classes never have vtables -- short-circuit is_method_final to True for them so codegen doesn't try to index into a vtable that compute_vtable skipped. 2. emit_method_call: under separate=True, a method's FuncIR body may live in another group while only its FuncDecl is visible here. Use method_decl(name) instead of get_method(name).decl -- the decl is enough to emit a direct C call. Split native_function_type to accept a decl too. 3. Cross-group native/Python-wrapper calls weren't routing through the exports-table indirection at a dozen sites in emitwrapper / emitfunc / emitclass. Added Emitter.native_function_call(decl) and Emitter.wrapper_function_call(decl) helpers and migrated all offending sites. Also made CPyPy_* wrapper declarations needs_export=True so those symbols reach the exports table. 4. Defer cross-group imports to shim load time. The shared lib's exec_ function used to PyImport_ImportModule sibling groups at PyInit time, which re-enters the enclosing package's __init__.py mid-flight and blows up on partial-init attribute walks. Split exec_ into a self-contained capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>() (runs from the shim just before per-module init). Shim uses PyImport_ImportModuleLevel with a non-empty fromlist so the lookup returns the leaf directly via sys.modules, and fetches capsules via PyObject_GetAttrString instead of PyCapsule_Import (which itself performs the same dotted attribute walk). 5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried PyObject_GetItem(module, fullname) where it intended PyImport_GetModule (comment says as much). Modules don't implement __getitem__, so the fallback always raised TypeError. Also Py_XDECREF the potentially-NULL package_path in the error path. 6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded SCCs can resolve cross-SCC references. load_type_map tolerates mypy's synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no corresponding mypyc ClassIR. Also adds three regression tests targeted to fail on TestRunSeparate without the fixes above: - testSeparateCrossGroupEnumMethod exercises fix #1. - testSeparateCrossGroupGenerator exercises fix #2. - testSeparateCrossGroupInheritedInit exercises fix #3.

VaggelisD merged commit 31ba829 into VaggelisD:release-1.19 Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use cached Latin-1 strings in CPyStr_GetItem#1

perf: use cached Latin-1 strings in CPyStr_GetItem#1
VaggelisD merged 1 commit into
VaggelisD:release-1.19from
tobymao:toby/str-getitem-cache

tobymao commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tobymao commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants