Skip to content

Spin-then-park receive, free-threaded Python compatibility.#6

Merged
matajoh merged 1 commit intomicrosoft:mainfrom
matajoh:free-threading
Apr 1, 2026
Merged

Spin-then-park receive, free-threaded Python compatibility.#6
matajoh merged 1 commit intomicrosoft:mainfrom
matajoh:free-threading

Conversation

@matajoh
Copy link
Copy Markdown
Member

@matajoh matajoh commented Mar 30, 2026

Spin-then-park receive; free-threaded Python compatibility.

Improvements

  • Added CownCapsule.disown() — abandons a cown's value without
    serializing it and resets ownership to NO_OWNER. Used during worker
    cleanup to safely discard orphan cowns before the owning interpreter
    is destroyed, preventing dangling Python object references.
  • Rewrote receive to use a two-phase spin-then-park strategy for
    single-tag untimed receives. Phase 1 spins for BOC_SPIN_COUNT
    iterations; Phase 2 parks the thread on a per-queue condvar, eliminating
    busy-wait CPU burn. Timed receives and multi-tag receives use
    spin-then-backoff with exponential sleep (1 µs → 1 ms cap).
  • Added platform-abstracted condvar primitives (BOCParkMutex /
    BOCParkCond) with implementations for Windows (SRWLOCK /
    CONDITION_VARIABLE), macOS (pthreads), and Linux (C11 threads).
  • Each BOCQueue now carries a waiters counter, park_mutex, and
    park_cond. Producers signal parked receivers after enqueue;
    drain and set_tags broadcast to wake all parked threads.
  • Replaced the fixed thrd_sleep in send with a sched_yield /
    SwitchToThread, reducing send-side latency.
  • Refactored the monolithic _core_receive into receive_single_tag
    and receive_multi_tag, each with its own backoff/parking logic.
  • Moved the BOC_QUEUE_DISABLED check earlier in get_queue_for_tag
    so callers skip disabled queues instead of returning NULL after
    tag resolution.
  • Added Windows-compatible atomic_load_explicit /
    atomic_fetch_add_explicit / atomic_fetch_sub_explicit macros
    using InterlockedExchangeAdd64.
  • Declared Py_mod_gil = Py_MOD_GIL_NOT_USED in both _core and
    _math C extensions so that importing bocpy on a free-threaded
    Python build (3.13t+) does not re-enable the GIL.
  • Replaced PyDict_GetItem (borrowed reference) with
    PyDict_GetItemRef (strong reference) in BOCRecycleQueue_recycle
    on Python 3.13+, improving forward-compatibility with free-threaded
    builds.

Bug Fixes

  • Fixed a deadlock when the same cown is passed multiple times to @when
    (e.g. @when(c, c)). Duplicate requests for the same cown caused the
    MCS-queue-based two-phase locking to spin-wait on itself. Requests are
    now deduplicated by target cown in Behavior.__init__, with
    compensating resolve_one calls to maintain the behavior count
    invariant.

Tests

  • TestLostWakeStress: single-producer random delays, bursty producer,
    and repeated single-message wake to detect lost-wake races.
  • TestMultiTagBackoff: multi-tag receive correctness — second-tag hit,
    delayed arrival, per-tag FIFO ordering, timeout, and interleaved
    producers.
  • TestTimeoutAccuracy: lower-bound / upper-bound wall-clock checks and
    zero-timeout immediacy.
  • Added tests for duplicate cowns in @when: same cown twice, thrice,
    non-adjacent duplicates, duplicates within a group, and mutation
    aliasing semantics.

CI

  • Added a free-threaded CI job that tests against Python 3.13t and
    3.14t on Linux, with explicit assertions that the GIL remains disabled
    after import.

@matajoh matajoh requested a review from mjp41 April 1, 2026 00:26
@matajoh matajoh changed the title Free-threaded Python compatibility. Spin-then-park, free-threaded Python compatibility. Apr 1, 2026
@matajoh matajoh changed the title Spin-then-park, free-threaded Python compatibility. Spin-then-park receive, free-threaded Python compatibility. Apr 1, 2026
Improvements:

- Added CownCapsule.disown() — abandons a cown's value without
  serializing it and resets ownership to NO_OWNER. Used during worker
  cleanup to safely discard orphan cowns before the owning interpreter
  is destroyed, preventing dangling Python object references.
- Rewrote receive to use a two-phase spin-then-park strategy for
  single-tag untimed receives. Phase 1 spins for BOC_SPIN_COUNT
  iterations; Phase 2 parks the thread on a per-queue condvar, eliminating
  busy-wait CPU burn. Timed receives and multi-tag receives use
  spin-then-backoff with exponential sleep (1 µs → 1 ms cap).
- Added platform-abstracted condvar primitives (BOCParkMutex /
  BOCParkCond) with implementations for Windows (SRWLOCK /
  CONDITION_VARIABLE), macOS (pthreads), and Linux (C11 threads).
- Each BOCQueue now carries a waiters counter, park_mutex, and
  park_cond. Producers signal parked receivers after enqueue;
  drain and set_tags broadcast to wake all parked threads.
- Replaced the fixed thrd_sleep in send with a sched_yield /
  SwitchToThread, reducing send-side latency.
- Refactored the monolithic _core_receive into receive_single_tag
  and receive_multi_tag, each with its own backoff/parking logic.
- Moved the BOC_QUEUE_DISABLED check earlier in get_queue_for_tag
  so callers skip disabled queues instead of returning NULL after
  tag resolution.
- Added Windows-compatible atomic_load_explicit /
  atomic_fetch_add_explicit / atomic_fetch_sub_explicit macros
  using InterlockedExchangeAdd64.
- Declared Py_mod_gil = Py_MOD_GIL_NOT_USED in both _core and
  _math C extensions so that importing bocpy on a free-threaded
  Python build (3.13t+) does not re-enable the GIL.
- Replaced PyDict_GetItem (borrowed reference) with
  PyDict_GetItemRef (strong reference) in BOCRecycleQueue_recycle
  on Python 3.13+, improving forward-compatibility with free-threaded
  builds.

Bug Fixes:

- Fixed a deadlock when the same cown is passed multiple times to @when
  (e.g. @when(c, c)). Duplicate requests for the same cown caused the
  MCS-queue-based two-phase locking to spin-wait on itself. Requests are
  now deduplicated by target cown in Behavior.__init__, with
  compensating resolve_one calls to maintain the behavior count
  invariant.

Tests:

- TestLostWakeStress: single-producer random delays, bursty producer,
  and repeated single-message wake to detect lost-wake races.
- TestMultiTagBackoff: multi-tag receive correctness — second-tag hit,
  delayed arrival, per-tag FIFO ordering, timeout, and interleaved
  producers.
- TestTimeoutAccuracy: lower-bound / upper-bound wall-clock checks and
  zero-timeout immediacy.
- Added tests for duplicate cowns in @when: same cown twice, thrice,
  non-adjacent duplicates, duplicates within a group, and mutation
  aliasing semantics.

CI:

- Added a free-threaded CI job that tests against Python 3.13t and
  3.14t on Linux, with explicit assertions that the GIL remains disabled
  after import.

Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>
@matajoh matajoh merged commit 7e52702 into microsoft:main Apr 1, 2026
26 checks passed
@matajoh matajoh deleted the free-threading branch April 1, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant