perf: use cached Latin-1 strings in CPyStr_GetItem#1
Merged
Conversation
For characters < 256, use PyUnicode_FromOrdinal() which returns CPython's cached single-char Latin-1 string objects instead of allocating a new PyUnicode object on every str[i] access. This avoids allocation+deallocation overhead in character-scanning hot loops. Benchmark (mypyc-compiled tokenizer, tpch query x5000): Before: 0.337 ms/iter After: 0.223 ms/iter (-34%) Isolated micro-benchmark (scan 15K chars, 1000 iters): Before: 0.60s After: 0.26s (-57%) Characters >= 256 (BMP, supplementary) keep the original PyUnicode_New allocation path unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VaggelisD
added a commit
that referenced
this pull request
Apr 23, 2026
…sqlglot) This is the minimal set of fixes needed for `separate=True` to build and run correctly against sqlglot, a ~100-module project with cross-group class inheritance, generator helper classes, non-ext subclasses with fast methods, and mutually-dependent compiled modules. Each of the fixes below is a real bug that was never hit by mypy itself (mypy's setup.py uses multi_file on Windows only, never separate=True) or by the toy fixtures in mypyc's TestRunSeparate. 1. Non-extension classes never have vtables -- short-circuit is_method_final to True for them so codegen doesn't try to index into a vtable that compute_vtable skipped. 2. emit_method_call: under separate=True, a method's FuncIR body may live in another group while only its FuncDecl is visible here. Use method_decl(name) instead of get_method(name).decl -- the decl is enough to emit a direct C call. Split native_function_type to accept a decl too. 3. Cross-group native/Python-wrapper calls weren't routing through the exports-table indirection at a dozen sites in emitwrapper / emitfunc / emitclass. Added Emitter.native_function_call(decl) and Emitter.wrapper_function_call(decl) helpers and migrated all offending sites. Also made CPyPy_* wrapper declarations needs_export=True so those symbols reach the exports table. 4. Defer cross-group imports to shim load time. The shared lib's exec_ function used to PyImport_ImportModule sibling groups at PyInit time, which re-enters the enclosing package's __init__.py mid-flight and blows up on partial-init attribute walks. Split exec_ into a self-contained capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>() (runs from the shim just before per-module init). Shim uses PyImport_ImportModuleLevel with a non-empty fromlist so the lookup returns the leaf directly via sys.modules, and fetches capsules via PyObject_GetAttrString instead of PyCapsule_Import (which itself performs the same dotted attribute walk). 5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried PyObject_GetItem(module, fullname) where it intended PyImport_GetModule (comment says as much). Modules don't implement __getitem__, so the fallback always raised TypeError. Also Py_XDECREF the potentially-NULL package_path in the error path. 6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded SCCs can resolve cross-SCC references. load_type_map tolerates mypy's synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no corresponding mypyc ClassIR. Also adds three regression tests targeted to fail on TestRunSeparate without the fixes above: - testSeparateCrossGroupEnumMethod exercises fix #1. - testSeparateCrossGroupGenerator exercises fix #2. - testSeparateCrossGroupInheritedInit exercises fix #3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For characters < 256, use PyUnicode_FromOrdinal() which returns CPython's cached single-char Latin-1 string objects instead of allocating a new PyUnicode object on every str[i] access. This avoids allocation+deallocation overhead in character-scanning hot loops.
Benchmark (mypyc-compiled tokenizer, tpch query x5000):
Before: 0.337 ms/iter
After: 0.223 ms/iter (-34%)
Isolated micro-benchmark (scan 15K chars, 1000 iters):
Before: 0.60s
After: 0.26s (-57%)
Characters >= 256 (BMP, supplementary) keep the original PyUnicode_New allocation path unchanged.