Skip to content

Support free-threaded Python with a direct-threading backend #5

@matajoh

Description

@matajoh

Summary

Python 3.13+ ships an optional free-threaded build (--disable-gil) where the GIL is removed entirely. In free-threaded Python, plain threads already run in parallel — subinterpreters are no longer needed for parallelism, only isolation. bocpy should offer an optimized execution path for this runtime that avoids subinterpreter and XIData overhead while preserving BOC's deadlock-freedom guarantees.

This work is gated on the free-threaded ecosystem stabilizing. The free-threaded build is still experimental as of Python 3.15 and the subinterpreter/XIData APIs continue to evolve. We should not invest in a second backend until the APIs are stable and the free-threaded build is no longer opt-in.

Motivation

On free-threaded Python, bocpy's current architecture pays significant overhead for no parallelism benefit:

  • XIData serialization/deserialization on every cown transfer (pickle round-trip for complex types)
  • Subinterpreter lifecycle management (create, run_string, destroy per worker)
  • Transpiler/AST export step to make closures importable across interpreters
  • Module re-import in each worker interpreter

All of this machinery exists to work around the per-interpreter GIL. Without a GIL, workers can be plain threads operating directly on shared Python objects, and the cown/2PL protocol itself provides the necessary thread safety.

BOC's value proposition — deadlock and data-race freedom by construction — is arguably more valuable on free-threaded Python, where programmers face genuine shared-memory concurrency hazards that the GIL previously masked.

Design

At runtime, detect the threading model and select the appropriate backend:

import sys
if hasattr(sys, '_is_gil_enabled') and not sys._is_gil_enabled():
    # Free-threaded: use direct threading
else:
    # GIL build: use subinterpreters (current path)

Shared components (unchanged in either mode):

  • _core.c MPSC message queue (already lock-free C11 atomics)
  • _core.c 2PL scheduler and BOCBehavior/BOCCown request machinery
  • behaviors.py Behaviors scheduler thread
  • Cown[T] public API

Free-threaded mode changes:

  • Workers become plain threading.Threads running in the main interpreter
  • Cowns store PyObject* directly instead of going through XIData — acquire/release uses PyMutex or equivalent
  • Behaviors execute closures directly — no transpiler, no AST export, no module re-import
  • _core.c internal state protected with Py_BEGIN_CRITICAL_SECTION where dicts/lists are accessed concurrently
  • The BOCRecycleQueue (XIData GC) becomes unnecessary

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions