feat: add native C compilation mode to mypyc#2
Closed
tobymao wants to merge 1 commit into
Closed
Conversation
639564c to
e50d920
Compare
fce452e to
ca4d1e6
Compare
Adds a new rawc mode to mypyc that compiles Python class methods to pure C — no Python C API in hot paths, no external dependencies. Architecture: - mypyc compiles scaffolding (class types, module init, dunder methods) - rawc compiles function bodies via IR-driven C emitter - Vtable forwarding: mypyc stubs forward to rawc_py_* bridge functions - Bridge auto-discovers classes, methods, and module globals from IR Key features: - Arena allocator with mark/reset (replaces refcounting in hot paths) - Hash table dicts/sets with cached hashes for O(1) lookups - UCS-4 string representation for direct character access - setjmp/longjmp exception handling - Tagged int elimination — plain int64_t everywhere All 1055 sqlglot tests pass. All 1634 mypyc tests pass. Zero external dependencies (no Boehm GC, no homebrew).
VaggelisD
added a commit
that referenced
this pull request
Apr 23, 2026
…sqlglot) This is the minimal set of fixes needed for `separate=True` to build and run correctly against sqlglot, a ~100-module project with cross-group class inheritance, generator helper classes, non-ext subclasses with fast methods, and mutually-dependent compiled modules. Each of the fixes below is a real bug that was never hit by mypy itself (mypy's setup.py uses multi_file on Windows only, never separate=True) or by the toy fixtures in mypyc's TestRunSeparate. 1. Non-extension classes never have vtables -- short-circuit is_method_final to True for them so codegen doesn't try to index into a vtable that compute_vtable skipped. 2. emit_method_call: under separate=True, a method's FuncIR body may live in another group while only its FuncDecl is visible here. Use method_decl(name) instead of get_method(name).decl -- the decl is enough to emit a direct C call. Split native_function_type to accept a decl too. 3. Cross-group native/Python-wrapper calls weren't routing through the exports-table indirection at a dozen sites in emitwrapper / emitfunc / emitclass. Added Emitter.native_function_call(decl) and Emitter.wrapper_function_call(decl) helpers and migrated all offending sites. Also made CPyPy_* wrapper declarations needs_export=True so those symbols reach the exports table. 4. Defer cross-group imports to shim load time. The shared lib's exec_ function used to PyImport_ImportModule sibling groups at PyInit time, which re-enters the enclosing package's __init__.py mid-flight and blows up on partial-init attribute walks. Split exec_ into a self-contained capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>() (runs from the shim just before per-module init). Shim uses PyImport_ImportModuleLevel with a non-empty fromlist so the lookup returns the leaf directly via sys.modules, and fetches capsules via PyObject_GetAttrString instead of PyCapsule_Import (which itself performs the same dotted attribute walk). 5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried PyObject_GetItem(module, fullname) where it intended PyImport_GetModule (comment says as much). Modules don't implement __getitem__, so the fallback always raised TypeError. Also Py_XDECREF the potentially-NULL package_path in the error path. 6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded SCCs can resolve cross-SCC references. load_type_map tolerates mypy's synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no corresponding mypyc ClassIR. Also adds three regression tests targeted to fail on TestRunSeparate without the fixes above: - testSeparateCrossGroupEnumMethod exercises fix #1. - testSeparateCrossGroupGenerator exercises fix #2. - testSeparateCrossGroupInheritedInit exercises fix #3.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
nativemode to mypyc that compiles Python modules to pure C code — no Python C API dependency, no external libraries.Benchmarks (sqlglot tokenizer, TPC-H Q1-Q5)
Files (2,269 lines total)
mypyc/lib-rt/native/native_rt.hmypyc/lib-rt/native/native_compat.hmypyc/codegen/emit_native.pymypyc/native_build.pymypyc/irbuild/native_mapper.pyTest plan
attrmodule)