Summary
A deterministic SIGSEGV (or double free or corruption on glibc) hits during Python garbage collection while a warp.capture_while region is active on the CPU device. The same workload runs cleanly on 1.13.0.dev20260421 and crashes on 1.13.0.dev20260422, so it's a one-day regression.
The crash profile points at a missed reference-retention path in the new CPU graph-capture / APIC replay work added in #1349.
Reproducer
Any workload that drives mujoco-warp on CPU inside a graph capture reproduces. Concrete example (from newton-physics/newton):
git clone https://github.com/newton-physics/newton.git
cd newton
# pin the bad nightly
uv lock --upgrade-package warp-lang
# ensure warp-lang==1.13.0.dev20260422 is resolved
CUDA_VISIBLE_DEVICES= uv run --extra dev -m newton.tests \
-k test_selection.example_selection_cartpole_cpu
Expected: test passes.
Actual: SIGSEGV (-11) or SIGABRT (-6) partway through the simulation loop. Happens across Ubuntu x86_64, Ubuntu arm64, macOS (arm64), Windows, and on a CUDA box when CUDA is hidden.
Stack traces
Two observed crash sites, both during Garbage-collecting — which is characteristic of heap corruption rather than a single logic bug (GC just happens to be where the trap fires):
Site A — GC during array construction
Fatal Python error: Segmentation fault
Current thread 0x... (most recent call first):
Garbage-collecting
File ".../warp/_src/types.py", line 2304 in __init__
File ".../warp/_src/types.py", line 3878 in __ctype__
File ".../warp/_src/context.py", line 7610 in pack_arg
File ".../warp/_src/context.py", line 8292 in pack_args
File ".../warp/_src/context.py", line 8297 in launch
File ".../mujoco_warp/_src/solver.py", line 3240 in _solver_iteration
File ".../warp/_src/context.py", line 9743 in capture_while
File ".../mujoco_warp/_src/solver.py", line 3335 in _solve
...
Site B — GC during kernel compilation
Fatal Python error: Segmentation fault
Current thread 0x... (most recent call first):
Garbage-collecting
File ".../ast.py", line 52 in parse
File ".../warp/_src/codegen.py", line 1038 in __init__ # Adjoint.__init__
File ".../warp/_src/context.py", line 781 in __init__ # Function.__init__
File ".../warp/_src/context.py", line 1362 in wrapper
File ".../mujoco_warp/_src/solver.py", line 1809 in update_constraint_efc
File ".../warp/_src/context.py", line 9743 in capture_while
...
Both traces have warp.capture_while live on the CPU path above the crashing frame.
Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _warp_fastcall, _cbor2
Bisect
| Nightly |
Result |
1.13.0.dev20260421 |
✅ passes full Newton suite |
1.13.0.dev20260422 |
❌ segfault in test_selection.example_selection_cartpole_cpu |
Nothing between those nightlies touches the CPU execution path except the work landed for #1349 ("Add graph capture for APIC serialization and CPU replay"). The commit message explicitly calls out reference retention for CPU capture (base array in _regions, ModuleExec on the graph) and mentions a follow-up to "Retain ModuleExec on CPU APIC graphs to prevent use-after-unload" — our symptom looks like a retention site that was missed for the per-launch array views / packed kernel args that mujoco-warp creates.
Environment
- OS: reproduced on Ubuntu 22.04 x86_64, Ubuntu 24.04 arm64, Windows, macOS (arm64), and a CUDA runner with
CUDA_VISIBLE_DEVICES=""
- Python: 3.12
- warp-lang:
1.13.0.dev20260422 (installed from https://pypi.nvidia.com/)
- mujoco-warp:
3.7.0.1
- Device: CPU (the test forces
--device cpu)
Workaround
Pin to warp-lang==1.13.0.dev20260421 until a fix lands.
Suspected cause
Heap/refcount bug in the new CPU APIC capture/replay machinery — something a kernel launch needs to stay alive for the duration of capture_while (likely an array view, apic_array_t, a per-launch param record, or a function-pointer-carrying struct) is being reaped by Python GC before the recorded operation runs, and later use of the dangling pointer segfaults.
Summary
A deterministic SIGSEGV (or
double free or corruptionon glibc) hits during Python garbage collection while awarp.capture_whileregion is active on the CPU device. The same workload runs cleanly on1.13.0.dev20260421and crashes on1.13.0.dev20260422, so it's a one-day regression.The crash profile points at a missed reference-retention path in the new CPU graph-capture / APIC replay work added in #1349.
Reproducer
Any workload that drives mujoco-warp on CPU inside a graph capture reproduces. Concrete example (from
newton-physics/newton):Expected: test passes.
Actual: SIGSEGV (-11) or SIGABRT (-6) partway through the simulation loop. Happens across Ubuntu x86_64, Ubuntu arm64, macOS (arm64), Windows, and on a CUDA box when CUDA is hidden.
Stack traces
Two observed crash sites, both during
Garbage-collecting— which is characteristic of heap corruption rather than a single logic bug (GC just happens to be where the trap fires):Site A — GC during array construction
Site B — GC during kernel compilation
Both traces have
warp.capture_whilelive on the CPU path above the crashing frame.Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _warp_fastcall, _cbor2Bisect
1.13.0.dev202604211.13.0.dev20260422test_selection.example_selection_cartpole_cpuNothing between those nightlies touches the CPU execution path except the work landed for #1349 ("Add graph capture for APIC serialization and CPU replay"). The commit message explicitly calls out reference retention for CPU capture (base array in
_regions, ModuleExec on the graph) and mentions a follow-up to "Retain ModuleExec on CPU APIC graphs to prevent use-after-unload" — our symptom looks like a retention site that was missed for the per-launch array views / packed kernel args that mujoco-warp creates.Environment
CUDA_VISIBLE_DEVICES=""1.13.0.dev20260422(installed fromhttps://pypi.nvidia.com/)3.7.0.1--device cpu)Workaround
Pin to
warp-lang==1.13.0.dev20260421until a fix lands.Suspected cause
Heap/refcount bug in the new CPU APIC capture/replay machinery — something a kernel launch needs to stay alive for the duration of
capture_while(likely an array view,apic_array_t, a per-launch param record, or a function-pointer-carrying struct) is being reaped by Python GC before the recorded operation runs, and later use of the dangling pointer segfaults.