Skip to content

perf: use cached Latin-1 strings in CPyStr_GetItem#1

Merged
VaggelisD merged 1 commit into
VaggelisD:release-1.19from
tobymao:toby/str-getitem-cache
Mar 18, 2026
Merged

perf: use cached Latin-1 strings in CPyStr_GetItem#1
VaggelisD merged 1 commit into
VaggelisD:release-1.19from
tobymao:toby/str-getitem-cache

Conversation

@tobymao

@tobymao tobymao commented Mar 18, 2026

Copy link
Copy Markdown
Collaborator

For characters < 256, use PyUnicode_FromOrdinal() which returns CPython's cached single-char Latin-1 string objects instead of allocating a new PyUnicode object on every str[i] access. This avoids allocation+deallocation overhead in character-scanning hot loops.

Benchmark (mypyc-compiled tokenizer, tpch query x5000):
Before: 0.337 ms/iter
After: 0.223 ms/iter (-34%)

Isolated micro-benchmark (scan 15K chars, 1000 iters):
Before: 0.60s
After: 0.26s (-57%)

Characters >= 256 (BMP, supplementary) keep the original PyUnicode_New allocation path unchanged.

For characters < 256, use PyUnicode_FromOrdinal() which returns
CPython's cached single-char Latin-1 string objects instead of
allocating a new PyUnicode object on every str[i] access. This
avoids allocation+deallocation overhead in character-scanning
hot loops.

Benchmark (mypyc-compiled tokenizer, tpch query x5000):
  Before: 0.337 ms/iter
  After:  0.223 ms/iter  (-34%)

Isolated micro-benchmark (scan 15K chars, 1000 iters):
  Before: 0.60s
  After:  0.26s  (-57%)

Characters >= 256 (BMP, supplementary) keep the original
PyUnicode_New allocation path unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@VaggelisD VaggelisD merged commit 31ba829 into VaggelisD:release-1.19 Mar 18, 2026
VaggelisD added a commit that referenced this pull request Apr 23, 2026
…sqlglot)

This is the minimal set of fixes needed for `separate=True` to build and run
correctly against sqlglot, a ~100-module project with cross-group class
inheritance, generator helper classes, non-ext subclasses with fast methods,
and mutually-dependent compiled modules. Each of the fixes below is a real
bug that was never hit by mypy itself (mypy's setup.py uses multi_file on
Windows only, never separate=True) or by the toy fixtures in mypyc's
TestRunSeparate.

1. Non-extension classes never have vtables -- short-circuit is_method_final
   to True for them so codegen doesn't try to index into a vtable that
   compute_vtable skipped.

2. emit_method_call: under separate=True, a method's FuncIR body may live in
   another group while only its FuncDecl is visible here. Use method_decl(name)
   instead of get_method(name).decl -- the decl is enough to emit a direct C
   call. Split native_function_type to accept a decl too.

3. Cross-group native/Python-wrapper calls weren't routing through the
   exports-table indirection at a dozen sites in emitwrapper / emitfunc /
   emitclass. Added Emitter.native_function_call(decl) and
   Emitter.wrapper_function_call(decl) helpers and migrated all offending
   sites. Also made CPyPy_* wrapper declarations needs_export=True so those
   symbols reach the exports table.

4. Defer cross-group imports to shim load time. The shared lib's exec_
   function used to PyImport_ImportModule sibling groups at PyInit time,
   which re-enters the enclosing package's __init__.py mid-flight and blows
   up on partial-init attribute walks. Split exec_ into a self-contained
   capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>()
   (runs from the shim just before per-module init). Shim uses
   PyImport_ImportModuleLevel with a non-empty fromlist so the lookup
   returns the leaf directly via sys.modules, and fetches capsules via
   PyObject_GetAttrString instead of PyCapsule_Import (which itself performs
   the same dotted attribute walk).

5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried
   PyObject_GetItem(module, fullname) where it intended PyImport_GetModule
   (comment says as much). Modules don't implement __getitem__, so the
   fallback always raised TypeError. Also Py_XDECREF the potentially-NULL
   package_path in the error path.

6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now
   syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded
   SCCs can resolve cross-SCC references. load_type_map tolerates mypy's
   synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no
   corresponding mypyc ClassIR.

Also adds three regression tests targeted to fail on TestRunSeparate
without the fixes above:

- testSeparateCrossGroupEnumMethod exercises fix #1.
- testSeparateCrossGroupGenerator exercises fix #2.
- testSeparateCrossGroupInheritedInit exercises fix #3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants