Summary
Python 3.13+ ships an optional free-threaded build (--disable-gil) where the GIL is removed entirely. In free-threaded Python, plain threads already run in parallel — subinterpreters are no longer needed for parallelism, only isolation. bocpy should offer an optimized execution path for this runtime that avoids subinterpreter and XIData overhead while preserving BOC's deadlock-freedom guarantees.
This work is gated on the free-threaded ecosystem stabilizing. The free-threaded build is still experimental as of Python 3.15 and the subinterpreter/XIData APIs continue to evolve. We should not invest in a second backend until the APIs are stable and the free-threaded build is no longer opt-in.
Motivation
On free-threaded Python, bocpy's current architecture pays significant overhead for no parallelism benefit:
- XIData serialization/deserialization on every cown transfer (pickle round-trip for complex types)
- Subinterpreter lifecycle management (create, run_string, destroy per worker)
- Transpiler/AST export step to make closures importable across interpreters
- Module re-import in each worker interpreter
All of this machinery exists to work around the per-interpreter GIL. Without a GIL, workers can be plain threads operating directly on shared Python objects, and the cown/2PL protocol itself provides the necessary thread safety.
BOC's value proposition — deadlock and data-race freedom by construction — is arguably more valuable on free-threaded Python, where programmers face genuine shared-memory concurrency hazards that the GIL previously masked.
Design
At runtime, detect the threading model and select the appropriate backend:
import sys
if hasattr(sys, '_is_gil_enabled') and not sys._is_gil_enabled():
# Free-threaded: use direct threading
else:
# GIL build: use subinterpreters (current path)
Shared components (unchanged in either mode):
_core.c MPSC message queue (already lock-free C11 atomics)
_core.c 2PL scheduler and BOCBehavior/BOCCown request machinery
behaviors.py Behaviors scheduler thread
Cown[T] public API
Free-threaded mode changes:
- Workers become plain
threading.Threads running in the main interpreter
- Cowns store
PyObject* directly instead of going through XIData — acquire/release uses PyMutex or equivalent
- Behaviors execute closures directly — no transpiler, no AST export, no module re-import
_core.c internal state protected with Py_BEGIN_CRITICAL_SECTION where dicts/lists are accessed concurrently
- The
BOCRecycleQueue (XIData GC) becomes unnecessary
Summary
Python 3.13+ ships an optional free-threaded build (
--disable-gil) where the GIL is removed entirely. In free-threaded Python, plain threads already run in parallel — subinterpreters are no longer needed for parallelism, only isolation. bocpy should offer an optimized execution path for this runtime that avoids subinterpreter and XIData overhead while preserving BOC's deadlock-freedom guarantees.This work is gated on the free-threaded ecosystem stabilizing. The free-threaded build is still experimental as of Python 3.15 and the subinterpreter/XIData APIs continue to evolve. We should not invest in a second backend until the APIs are stable and the free-threaded build is no longer opt-in.
Motivation
On free-threaded Python, bocpy's current architecture pays significant overhead for no parallelism benefit:
All of this machinery exists to work around the per-interpreter GIL. Without a GIL, workers can be plain threads operating directly on shared Python objects, and the cown/2PL protocol itself provides the necessary thread safety.
BOC's value proposition — deadlock and data-race freedom by construction — is arguably more valuable on free-threaded Python, where programmers face genuine shared-memory concurrency hazards that the GIL previously masked.
Design
At runtime, detect the threading model and select the appropriate backend:
Shared components (unchanged in either mode):
_core.cMPSC message queue (already lock-free C11 atomics)_core.c2PL scheduler andBOCBehavior/BOCCownrequest machinerybehaviors.pyBehaviorsscheduler threadCown[T]public APIFree-threaded mode changes:
threading.Threads running in the main interpreterPyObject*directly instead of going through XIData — acquire/release usesPyMutexor equivalent_core.cinternal state protected withPy_BEGIN_CRITICAL_SECTIONwhere dicts/lists are accessed concurrentlyBOCRecycleQueue(XIData GC) becomes unnecessary