FFI in Python: From ctypes to Rust, and the Post-GIL Future

Every Python program runs inside a boundary. On one side sits the interpreter — dynamic, reflective, garbage-collected. On the other side sits the operating system, the CPU, and libraries compiled to native machine code. The Foreign Function Interface is the gate between these two worlds. Understanding how that gate works, what it costs to pass through, and which tools build the best bridges is what separates a Python developer who writes fast code from one who rewrites everything in another language.

This article explores how Python’s FFI ecosystem evolved from a single C-calling module in 2001 to a rich constellation of tools that span C, C++, Rust, and even alternative Python runtimes. It also examines how Python 3.13 and 3.14’s free-threading support — the biggest runtime change since Python 3.0 — reshapes when and why you reach for FFI at all.

What FFI actually means#

A Foreign Function Interface is a mechanism for calling code written in one language from code written in another. The word foreign refers to the callee: a C function invoked from Python is foreign to Python.

The concept traces back to the late 1970s and Common Lisp, but it became mainstream with C’s dominance in systems programming. C established the de facto ABI (Application Binary Interface) that nearly every FFI system targets today. When you call a “native library” from Python, Java, Ruby, or Go, you are almost always calling through the C ABI — even if the library was written in C++, Rust, or Fortran.

This is a crucial distinction. FFI is not about calling C. It is about calling through the C calling convention. Rust’s extern "C" functions, C++’s extern "C" blocks, and Fortran’s ISO_C_BINDING module all produce symbols compatible with the C ABI. The language behind the symbol is irrelevant to the caller.

Think of the C ABI as a lingua franca. Not every country speaks it natively, but every diplomat knows it.

Why Python needs FFI#

Python is an interpreted, dynamically typed language with significant per-instruction overhead compared to compiled languages. A tight loop in Python runs roughly 10–100× slower than the equivalent loop in C or Rust, depending on the workload.

Three forces push Python developers toward FFI:

  1. Raw throughput. A recursive Fibonacci function in pure Python runs 27× slower than the same function compiled to C and called via ctypes (see the benchmarks below). For numerical computation, image processing, or cryptography, this gap is the difference between practical and unusable.

  2. Ecosystem access. Critical libraries ship as native code: OpenSSL, SQLite, BLAS/LAPACK, libgit2, system APIs across Linux, macOS, and Windows. FFI lets Python call these without waiting for a pure-Python reimplementation.

  3. Memory-safe, high-performance extensions. Rust and modern C++ produce code that is both fast and safe. PyO3 and nanobind exist because developers want the speed of compiled code with the safety guarantees that C never provided.

The result is that FFI sits at the foundation of Python’s most popular libraries. NumPy calls into BLAS and LAPACK. The cryptography package wraps OpenSSL via both cffi and Rust (through PyO3). Polars, the dataframe library, is written entirely in Rust and exposed to Python through PyO3. Pydantic v2 rewrote its core validation engine in Rust for the same reason. Tiktoken (OpenAI’s tokenizer) and orjson (a fast JSON library) are both Rust extensions built with PyO3 and maturin.

FFI is not a niche optimization trick. It is the substrate on which modern Python performance stands.

The classic duo: ctypes and cffi#

ctypes#

Python added ctypes to the standard library in version 2.5 (2006). It loads compiled shared libraries (.so on Linux, .dylib on macOS, .dll on Windows) and calls their exported functions at runtime, with no compilation step on the Python side.

import ctypes

lib = ctypes.cdll.LoadLibrary("./mylib.so")
lib.add.argtypes = (ctypes.c_int, ctypes.c_int)
lib.add.restype = ctypes.c_int

result = lib.add(2, 3)  # returns 5

ctypes is always available — no pip install, no build toolchain, no C compiler. That makes it the right choice for quick prototyping and for calling stable system libraries whose ABI will not change.

The trade-offs are real, though. Every call crosses the Python-to-C boundary with full argument marshalling, and ctypes provides no compile-time safety: pass the wrong type, and you get a segfault, not an error message. As of Python 3.14, ctypes gained support for free-threaded builds and complex C types (c_float_complex, c_double_complex), but its fundamental architecture — runtime function lookup and dynamic type conversion — remains unchanged.

cffi#

cffi appeared in 2013 as a third-party library (now at v2.0.0). It takes a different approach: you feed it C declarations as strings, and it generates the binding code for you.

import cffi

ffi = cffi.FFI()
ffi.cdef("int add(int a, int b);")

lib = ffi.dlopen("./mylib.so")
result = lib.add(2, 3)  # returns 5

cffi also offers an out-of-line mode that compiles a small C extension at build time, eliminating the per-call overhead of dynamic lookup. This makes it faster than ctypes for repeated calls. The PyPy interpreter uses cffi as its recommended FFI mechanism because its JIT compiler can optimize cffi calls far more aggressively than ctypes calls.

The key difference between the two: ctypes is a runtime binding tool; cffi is a binding generator that can produce compiled extensions. When you need to call a C library from CPython and want the smallest dependencies, reach for ctypes. When you need cross-interpreter compatibility (CPython and PyPy), or when call frequency matters, reach for cffi.

Beyond C: the modern binding toolkit#

The original FFI question — “how do I call a C function from Python?” — now has a richer framing: “how do I write a fast extension for Python, in any compiled language, with good ergonomics and safety?” Several tools answer this question, each with a different philosophy.

Cython#

Cython is a superset of Python that compiles to C. You write code that looks almost like Python, add type annotations, and Cython generates C code that links against the CPython API. This makes it less of an FFI tool and more of a compiler for Python-shaped code — but the result is the same: native speed, callable from Python.

Cython 3.2 (released November 2025) added first-class support for Python 3.13’s free-threaded mode via the freethreading_compatible directive. It also gained pymutex support and improvements to the Limited C API, which produces extensions that work across multiple CPython versions without recompilation.

Cython shines when you have existing Python code that needs to go faster without a full rewrite. Add type annotations to the hot path, compile, and the tight loops run at C speed. The cost is a more complex build pipeline and a dialect that diverges from standard Python in subtle ways.

PyO3 and maturin#

PyO3 (v0.24, mid-2025) provides Rust bindings for Python. You write Rust code, annotate functions with #[pyfunction], and compile with maturin, a build tool that produces a Python wheel directly from a Rust crate.

use pyo3::prelude::*;

#[pyfunction]
fn add(a: i64, b: i64) -> i64 {
    a + b
}

#[pymodule]
fn mylib(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(add, m)?)?;
    Ok(())
}

Build and install in one command:

maturin develop

The appeal is Rust’s combination of C-level performance and compile-time memory safety. No segfaults from dangling pointers, no buffer overflows, no data races — the Rust compiler rejects them before the code ships. PyO3 supports CPython 3.7+, PyPy 7.3+, and GraalPy 24.0+.

The ecosystem speaks for itself. Polars (dataframes), pydantic-core (data validation), tiktoken (tokenization), orjson (JSON), the cryptography package (TLS and X.509), and granian (ASGI server) all use PyO3. When a Python library’s release notes mention a “Rust rewrite” of the hot path, PyO3 and maturin are almost always the tools involved.

nanobind#

nanobind is a C++17 binding library created by Wenzel Jakob, the author of pybind11. It targets the same problem — exposing C++ code to Python — but with dramatically lower overhead: up to 4× faster compilation, 5× smaller binaries, and 10× lower runtime overhead compared to pybind11 according to nanobind’s own benchmarks. It also outperforms Cython on binary size (3–12× smaller) and compilation time (1.6–4× faster).

nanobind is the right choice when you have C++ code that needs Python bindings and you want the smallest possible extension module. It requires C++17, supports Python 3.9+, and already includes support for free-threaded Python.

HPy#

HPy takes a different approach entirely. Rather than wrapping the existing CPython C API, it defines a new, universal ABI for Python extensions. An extension compiled against HPy’s universal ABI runs unmodified on CPython, PyPy, and GraalPy — no recompilation needed.

As of v0.9 (late 2023), HPy is still pre-stable but actively developed, with ports of NumPy, Matplotlib, and kiwi-solver. On CPython, HPy extensions run at the same speed as traditional C API extensions. On PyPy and GraalPy, they run dramatically faster because HPy avoids the emulation overhead of CPython’s C API.

HPy is a long-term bet. If Python’s extension ecosystem eventually unifies on a stable, runtime-agnostic ABI, HPy or something like it will be the foundation. For new projects targeting multiple Python implementations, it deserves serious evaluation.

Choosing a tool#

The right FFI tool depends on what you are calling and what you are willing to maintain.

Scenario Recommended tool Reason
Call a system .so/.dll at runtime ctypes Zero dependencies, ships with Python
Wrap a C library with stable ABI cffi (out-of-line mode) Cross-interpreter support, compiled speed
Speed up Python code in place Cython Minimal rewrite, familiar syntax
Write a new extension in Rust PyO3 + maturin Memory safety, single maturin develop workflow
Bind existing C++ code nanobind Smallest binaries, lowest overhead
Target CPython + PyPy + GraalPy HPy Universal ABI, no per-runtime recompilation

These tools are not mutually exclusive. The cryptography package, for example, uses cffi for OpenSSL bindings and Rust (via PyO3/maturin) for its newer cryptographic primitives.

Performance: how much does FFI buy you?#

To make the performance gap concrete, consider recursive Fibonacci — a deliberately CPU-bound, call-heavy workload that isolates interpretation overhead from I/O.

Pure Python:

def fibonacci(n: int):
    if n < 2:
        return 1
    return fibonacci(n - 2) + fibonacci(n - 1)

for _ in range(1_000_000):
    fibonacci(12)
$ /usr/bin/time nice python fibonacci.py
     29.66 real        29.52 user         0.06 sys

C via ctypes:

int fibonacci(int n) {
    if (n < 2) return 1;
    return fibonacci(n - 2) + fibonacci(n - 1);
}
import ctypes

C = ctypes.cdll.LoadLibrary("./fibonacci.so")
C.fibonacci.argtypes = (ctypes.c_int,)
C.fibonacci.restype = ctypes.c_int

for _ in range(1_000_000):
    C.fibonacci(12)
$ /usr/bin/time nice python fibonacci_ffi.py
      1.09 real         1.01 user         0.01 sys

The C version runs in 1.09 seconds — roughly 27× faster. The entire cost of each fibonacci(12) call tree (287 recursive calls) shifts from interpreted Python bytecode to compiled machine code. The Python side only pays for the initial boundary crossing per call.

This ratio is representative for compute-bound work. For I/O-bound code — network calls, file system access, database queries — the gap narrows dramatically because the bottleneck is waiting, not computing. FFI improves throughput, not latency.

The free-threading revolution#

For two decades, every discussion of Python performance has mentioned the GIL (Global Interpreter Lock): a mutex that prevents multiple threads from executing Python bytecode simultaneously. The GIL made CPython’s memory management safe but capped true CPU parallelism to a single core for pure Python code.

FFI was one of the classic escape routes. C extensions could release the GIL during long computations, letting other Python threads run in parallel. NumPy does this. So does every serious database driver and network library.

Python 3.13 (October 2024) changed the equation. PEP 703 introduced an experimental free-threaded build (python3.13t) that removes the GIL entirely, allowing true multi-core parallelism for pure Python code. Python 3.14 (2025) elevated free-threading from experimental to officially supported (PEP 779), with single-threaded overhead reduced to roughly 5–10%.

This matters for FFI in two ways:

  1. FFI is no longer the only path to parallelism. If your workload is embarrassingly parallel and written in pure Python, you can now run it across multiple cores without dropping to C or Rust. The motivation “I need FFI because of the GIL” weakens.

  2. Extensions must adapt. Free-threaded Python removes the assumption that the GIL serializes access to Python objects. Extensions that relied on the GIL for thread safety need explicit locking. Cython 3.2 added freethreading_compatible and pymutex support for this reason. PyO3, nanobind, and pybind11 are also adding free-threading support. ctypes itself gained free-threading support in Python 3.14.

Free-threading does not eliminate the need for FFI. A 27× speed ratio does not shrink to 1× because you removed a lock. But it does change the cost–benefit analysis. Before 3.13, developers sometimes wrote C extensions purely to get parallelism, not speed. That reason is fading.

Multiple interpreters: the other concurrency model#

Python 3.14 also introduced concurrent.interpreters (PEP 734), which exposes sub-interpreters — multiple isolated Python runtimes within a single process — to Python code for the first time. Each interpreter has its own GIL (in GIL-enabled builds) or runs independently (in free-threaded builds), enabling CSP-style concurrency without shared mutable state.

For FFI authors, this means extensions may need to support isolation: separate state per interpreter, no global mutable variables. The work to isolate an extension module overlaps significantly with the work to support free-threading, so the ecosystem is converging on both simultaneously.

When not to reach for FFI#

FFI is an optimization — and premature optimization carries well-known costs. Here are the situations where FFI adds more complexity than value:

  • A pure-Python library already exists. If someone has written a well-tested Python package that solves your problem, wrapping a C library yourself adds a maintenance burden with no user-visible benefit.

  • The bottleneck is I/O, not compute. Waiting for a network response or a disk read does not become faster because you rewrote the wait in C. Use asyncio or threading instead.

  • Your team does not read C, C++, or Rust. FFI bugs — mismatched types, dangling pointers, double frees — produce segfaults, not Python tracebacks. Debugging them requires native-code skills. If nobody on the team has those skills, the trade-off is unlikely to pay off.

  • Portability matters more than speed. A compiled extension must be built for each target platform and Python version. Pure Python runs everywhere. Wheels and maturin reduce this pain, but they do not eliminate it.

  • Security exposure is high. FFI calls bypass Python’s memory safety guarantees. Calling untrusted or poorly audited native code opens the door to buffer overflows, use-after-free bugs, and other memory corruption exploits. Rust extensions via PyO3 mitigate this — but ctypes and raw C do not.

Where the ecosystem is heading#

Several trends are shaping the future of FFI in Python:

  • Rust adoption is accelerating. The number of crates on crates.io depending on PyO3 grows every year. Rust gives Python the performance of C with the safety of a managed language, and maturin makes the build-and-publish workflow close to frictionless.

  • Free-threading will trim the fat. Some extensions that exist only for GIL-release parallelism will be replaced by pure Python code once free-threaded builds mature. Extensions that exist for raw speed will remain.

  • The Limited C API is gaining ground. CPython’s Limited API (also called the Stable ABI) lets extensions compile once and run across multiple CPython versions. Cython, nanobind, and HPy all invest in Limited API support, reducing the per-version wheel explosion that has plagued PyPI.

  • Python is getting faster on its own. CPython 3.14 ships a tail-call interpreter that yields 3–5% faster execution on standard benchmarks. The experimental JIT compiler (PEP 744), included in official macOS and Windows binaries, shows 10–20% improvements on some workloads. These gains chip away at the speed gap that motivates FFI — slowly, but measurably.

None of these trends make FFI obsolete. They raise the bar for when FFI is worth the added complexity, and they improve the tools available when it is.

References#

Recent Posts