Skip to content

Releases: microsoft/bocpy

v0.5.0

05 May 10:28
d9116c2

Choose a tag to compare

Highlights

This release delivers a Verona-RT-style work-stealing scheduler, a global noticeboard (shared key-value store), removal of the central scheduler thread in favour of direct dispatch, and a major C source refactor into per-subsystem translation units with a portable atomics layer.


New Features

  • Work-stealing scheduler — the single behavior queue is replaced with a distributed scheduler. Each worker owns an MPMC behavior queue, pops locally first, and steals from peers when idle. Idle workers park on per-worker condition variables and are signalled directly by producer/victim.
  • Per-worker fairness tokens — a token node advances through each worker's queue so long-running behaviors cannot monopolise dispatch slots; also drives cooperative shutdown.
  • Noticeboard — a shared key-value store (up to 64 keys) readable/writable without acquiring cowns. Writes are non-blocking; reads return a cached per-behavior snapshot. Includes notice_write, notice_read, notice_update, notice_delete, notice_sync, noticeboard_version, and the REMOVED sentinel.
  • Distributed scheduler — two-phase locking, request linking, and dispatch run directly on the caller's thread in C; cown release runs on the executing worker. MCS-style intrusive linked list per cown for zero-bounce handoff.
  • Cown.exception property — indicates whether the held value is from an unhandled exception.
  • compat.h / compat.c portability layer — uniform BOCMutex, BOCCond, boc_atomic_*_explicit, monotonic-time, and sleep primitives across MSVC, pthreads, and C11 <threads.h>.
  • xidata.h cross-interpreter shim — centralised _PyXIData_* / _PyCrossInterpreterData_* version ladders for CPython 3.12–3.15 (including free-threaded builds).
  • fanout_benchmark example — fan-out/fan-in benchmark exercising scheduler throughput under heavy producer load.
  • Prime factor example (examples/prime_factor.py) — parallel factorisation via Pollard's rho with noticeboard-coordinated early termination.
  • Benchmark harness (examples/benchmark.py) — micro-benchmarks for scheduling throughput, message-queue latency, and noticeboard contention.

Bug Fixes

  • Transpiler aliased importsvisit_Import / visit_ImportFrom now track alias names (import X as Y), preventing spurious "name not found" errors and duplicate whencall injection.
  • Global variable capture@when closure capture falls back to frame.f_globals when a name is not in any local scope, fixing NameError for module-level variables.

Improvements

  • In-memory transpiled-module loading — workers exec the transpiled source from a string literal instead of writing to disk, eliminating filesystem round-trips and leftover .py files.
  • Nested @when capture — the transpiler recurses into nested @when-decorated functions when computing outer captures, so child behaviors can close over the outer frame.
  • C extension split_core.c reduced from ~5,000 to ~3,500 lines by extracting sched.{c,h}, noticeboard.{c,h}, terminator.{c,h}, tags.{c,h}, cown.h, compat.{c,h}, and xidata.h.
  • Direct dispatch on cown releasebehavior_release_all hands resolved successors directly to workers via boc_sched_dispatch, removing one queue hop per handoff.
  • Cooperative worker shutdownboc_sched_worker_request_stop_all / boc_sched_unpause_all provide a clean stop/drain protocol.
  • Matrix docstrings — all Matrix C methods now carry built-in docstrings.
  • Examples package relocated — moved to top-level examples/ directory (still importable as bocpy.examples).
  • Filtered PyPI READMEsetup.py strips <!-- pypi-skip-start --> regions before publishing.
  • Documentation refresh — expanded coverage of noticeboard, distributed scheduler, and new APIs.

Internal Test Modules (opt-in via BOCPY_BUILD_INTERNAL_TESTS=1)

  • _internal_test_atomics — correctness tests for compat.h typed-atomics.
  • _internal_test_bq — torture tests for the MPMC behavior queue.
  • _internal_test_wsq — tests for work-stealing primitives (fast pop, slow pop, steal, park/unpark).

Test Suite

  • test_noticeboard.py — snapshot semantics, notice_update atomicity, REMOVED, notice_sync, version monotonicity.
  • test_scheduler_integration.py, test_scheduler_stats.py, test_scheduler_steal.py — end-to-end and per-primitive scheduler tests.
  • test_compat_atomics.py — portable atomics smoke tests.
  • test_stop_retry_composition.pystop()/start()/wait() retry composition.
  • test_scheduling_stress.py — expanded with fan-out, work-stealing, and shutdown stress scenarios.
  • test_transpiler.py — AST extraction, capture rewriting, aliased imports, module export.

Full changelog: v0.3.1...v0.5.0

v0.3.1

07 Apr 12:50
5eaf8fc

Choose a tag to compare

CownCapsule serialization support for nested cowns.

Bug Fixes

  • Removed the ownership check in _cown_shared that prevented a
    CownCapsule from being serialized to XIData when it was the value
    of another Cown. The check was unnecessary — _cown_shared only
    stores a pointer and ownership is enforced at acquire time.

Improvements

  • Added CownCapsule.__reduce__ with COWN_INCREF pinning so that a
    CownCapsule embedded in a container (dict, list, etc.) can survive
    the pickle round-trip used by object_to_xidata. A module-level
    reconstructor (_cown_capsule_from_pointer) inherits the pin without
    a redundant COWN_INCREF, and validates the process ID on unpickle to
    guard against cross-process misuse.

v0.3.0

01 Apr 22:42
7e52702

Choose a tag to compare

Improvements

  • Added CownCapsule.disown() — abandons a cown's value without
    serializing it and resets ownership to NO_OWNER. Used during worker
    cleanup to safely discard orphan cowns before the owning interpreter
    is destroyed, preventing dangling Python object references.
  • Rewrote receive to use a two-phase spin-then-park strategy for
    single-tag untimed receives. Phase 1 spins for BOC_SPIN_COUNT
    iterations; Phase 2 parks the thread on a per-queue condvar, eliminating
    busy-wait CPU burn. Timed receives and multi-tag receives use
    spin-then-backoff with exponential sleep (1 µs → 1 ms cap).
  • Added platform-abstracted condvar primitives (BOCParkMutex /
    BOCParkCond) with implementations for Windows (SRWLOCK /
    CONDITION_VARIABLE), macOS (pthreads), and Linux (C11 threads).
  • Each BOCQueue now carries a waiters counter, park_mutex, and
    park_cond. Producers signal parked receivers after enqueue;
    drain and set_tags broadcast to wake all parked threads.
  • Replaced the fixed thrd_sleep in send with a sched_yield /
    SwitchToThread, reducing send-side latency.
  • Refactored the monolithic _core_receive into receive_single_tag
    and receive_multi_tag, each with its own backoff/parking logic.
  • Moved the BOC_QUEUE_DISABLED check earlier in get_queue_for_tag
    so callers skip disabled queues instead of returning NULL after
    tag resolution.
  • Added Windows-compatible atomic_load_explicit /
    atomic_fetch_add_explicit / atomic_fetch_sub_explicit macros
    using InterlockedExchangeAdd64.
  • Declared Py_mod_gil = Py_MOD_GIL_NOT_USED in both _core and
    _math C extensions so that importing bocpy on a free-threaded
    Python build (3.13t+) does not re-enable the GIL.
  • Replaced PyDict_GetItem (borrowed reference) with
    PyDict_GetItemRef (strong reference) in BOCRecycleQueue_recycle
    on Python 3.13+, improving forward-compatibility with free-threaded
    builds.

Bug Fixes

  • Fixed a deadlock when the same cown is passed multiple times to @when
    (e.g. @when(c, c)). Duplicate requests for the same cown caused the
    MCS-queue-based two-phase locking to spin-wait on itself. Requests are
    now deduplicated by target cown in Behavior.__init__, with
    compensating resolve_one calls to maintain the behavior count
    invariant.

Tests

  • TestLostWakeStress: single-producer random delays, bursty producer,
    and repeated single-message wake to detect lost-wake races.
  • TestMultiTagBackoff: multi-tag receive correctness — second-tag hit,
    delayed arrival, per-tag FIFO ordering, timeout, and interleaved
    producers.
  • TestTimeoutAccuracy: lower-bound / upper-bound wall-clock checks and
    zero-timeout immediacy.
  • Added tests for duplicate cowns in @when: same cown twice, thrice,
    non-adjacent duplicates, duplicates within a group, and mutation
    aliasing semantics.

CI

  • Added a free-threaded CI job that tests against Python 3.13t and
    3.14t on Linux, with explicit assertions that the GIL remains disabled
    after import.

Full Changelog: v0.2.2...v0.3.0

v0.2.2

18 Mar 13:12
81ede4b

Choose a tag to compare

Improvements

  • Added an ASAN/UBSAN CI job that builds CPython 3.14.2 from source with AddressSanitizer and UndefinedBehaviorSanitizer, then runs the full test suite against instrumented builds of bocpy.
  • Updated GitHub Actions to latest versions (actions/checkout@v6, actions/setup-python@v5).

Bug Fixes

  • Fixed a false positive warning message for deallocation of xidata on the main
    interpreter after module shutdown.
  • Changed the clear logic when recycling

v0.2.0

06 Mar 01:23
cc32479

Choose a tag to compare

Bugfix release including some minor improvements.

Improvements

  • Examples are now included in the package, with script entrypoints for each.
  • The drain low-level API function is now exposed at the package level
  • wait() will now acquire frame-local Cown objects before shutting down the workers

Dev Tools

  • Added an internal cown and behavior reference tracking utility

Bug Fixes

  • Fixed a reference counting bug with cown lists
  • Fixed an issue where the boids example did not run on windows due a font
    setting.

v0.1.0 - Initial Release

02 Mar 02:29
d5d5eb2

Choose a tag to compare

Signed-off-by: Matthew A Johnson <matjoh@microsoft.com>