Refactor CLI tools and generate benchmark documentation by erikfrey · Pull Request #1301 · google-deepmind/mujoco_warp

erikfrey · 2026-04-20T18:35:00Z

This PR completes the refactor of CLI tools (testspeed.py, record.py, viewer.py) to use a centralized cli.py module, and adds automated documentation generation for benchmarks.

Key Changes:

CLI Refactor

Centralized init_structs and unroll logic in _src/cli.py.
Updated testspeed.py and viewer.py to use shared flags and functions from cli.py.
Added record.py tool for generating videos of benchmarks, using Pillow for saving .webp and .gif files.

Benchmark Documentation

Created README.md in each benchmark directory with model info and descriptions.
Generated .webp animations for each benchmark using mjwarp-record.
Improved model info tables in READMEs to show DoFs, Integrator, and accurate Matrix Format (Sparse/Dense).
Updated benchmarks/README.md to reflect the new Python-based framework.

Dependency Updates

Added pillow to dev dependencies.

Cleans up testspeed to be only human readable output. Moves benchmark running to benchmarks/run.py with example in humanoid.

…inary

thowell · 2026-04-24T21:31:46Z

nice refactor! lgtm

Semantic port of mar-yan24's mark/autodifferentiation2 (131 commits behind) onto the rebased autodiff branch. Preserves the design intact: - solver_implicit_adjoint: tape-backward hook solving H v = adj_qacc with the retained Newton Hessian, writing adj_qacc_smooth = M v - GPU tile-Cholesky adjoint paths (small-nv single tile, blocked solve-only with stored factor, blocked full factorize+solve) - Data.solver_h / solver_hfactor / solver_Jaref retained state, allocated in make_data/put_data, aliased by the solver context Adaptations to current main: - create_blocked_cholesky_func no longer exists upstream (replaced by fused create_blocked_cholesky_factorize_solve_func in 083e5ef); _adjoint_cholesky_full_blocked uses the fused func - dropped his update_constraint_gauss_cost kernel (belonged to the pre-google-deepmind#1301 solver context which carried ctx.cost; current main computes gauss cost inside the linesearch tiles) - dropped his support.py next_act removal (current main's derivative.py imports it) - solver retained fields placed at the end of Data (types_test enforces warp-only fields after MuJoCo fields, with matching docstring order) - test_solver_retained_state used an undefined SPARSE_CONSTRAINT_JACOBIAN global; replaced with m.is_sparse Verification: full suite 1078 passed / 23 skipped, including his 4 GradSolverAdjointTest cases. Cross-validated independently: the GPU adjoint mapped to force space (M^-1 M H^-1 g) matches central finite differences through the full GPU solver to 8e-5 max relative error on a persistent-contact scene. Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Co-authored-by: Mark Yang <markyang2005@gmail.com>

Semantic port of mar-yan24's mark/autodifferentiation3 (123 commits behind) onto the 2/3 port. Preserves the design intact: - collision_smooth.py: smooth analytic distance functions overwrite discrete contact geometry (dist/pos/frame) during taped forwards; differentiable constraint assembly (smooth_contact_to_efc) recomputes efc.J/efc.pos on an AD-visible path. Plane-sphere, sphere-sphere, sphere-capsule, capsule-capsule, plane-capsule supported; other geom pairs pass through with zero gradient. - adjoint.py: efc-level adjoint kernels (_efc_J_grad_kernel, _efc_pos_grad_kernel) connecting the phase-2 KKT adjoint to contact geometry; smooth/surrogate friction adjoint variants. - forward.py: tape-aware Phase 3 hooks after collision and make_constraint; _record_solver_adjoint/_record_fwd_accel_adjoint/ _record_euler_damp_adjoint tape callbacks; _isolate_intermediates_for_ad per-substep array isolation; freejoint zerograd fix (is_free flag instead of continue) was already present upstream. - grad.py: COLLISION_GRAD_FIELDS registry + dotted-path _resolve_field. Adaptations to current main: - _record_euler_damp_adjoint captures damp_deriv (polynomial damping, computed from the substep's qvel at record time) instead of the old constant dof_damping, matching the forward euler solve; sparse kernel call updated to the M_rownnz/M_rowadr signature - _qfrc_smooth factory kernel re-enabled for backward (pure elementwise sum; main had enable_backward=False which severed the qfrc_smooth -> qfrc_actuator/ctrl gradient chain) - jac_dof calls updated for the body_isdofancestor parameter - _solve_LD_sparse: kept main's fused fast path for non-grad solves, adopted the nograd-copy + manual-adjoint design for grad solves - warmstart copy uses d.qacc (solver solution) not integrator-local qacc, restoring upstream/sleep semantics; sleep.wake retained in forward() - step-level xpos-loss tests refresh kinematics inside the tape (step() does not recompute xpos post-integration; FD agrees it would otherwise be identically zero) and the nonzero-gradient guard is scaled by the FD norm (single-step dL/dctrl ~ dt^2 ~ 2e-7 < the old 1e-6 threshold) - d.qM -> d.M renames; dropped stale pre-google-deepmind#1301 hunks (gauss_cost kernel, old next_act relocation, old collision_flex/passive/types drift) Verification: full suite 1093 passed / 23 skipped, including all 29 grad tests (solver adjoint, euler damp stress, smooth contact, friction surrogate, freejoint kinematics). Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

…signment Semantic re-implementation of mar-yan24's count->scan->emit pipeline (google-deepmind#1300) on top of the post-a23500c constraint kernels, since the original branch predates the efc_contact rewrite and 95 other upstream commits and cannot be rebased mechanically. When opt.deterministic=True, the racy wp.atomic_add(nefc/efc_nnz) slot allocation in every constraint family is replaced with: a count kernel that writes per-thread row/nnz counts, a per-world exclusive scan that converts counts to offsets and bumps the totals, and emit kernels that read base+offset instead of atomic results. Constraint rows therefore land at identical positions on every run of the same input. Differences from the original google-deepmind#1300 branch: - Contact families use a single _efc_contact_count kernel mirroring the unified _efc_contact_init (upstream a23500c replaced the per-cone _contact_pyramidal/_contact_elliptic kernels this no longer touches). - _equality_flex_count honors eq_active (upstream 80bbba7). - All new kernel factories are wrapped in @cache_kernel (upstream f0e2d81) so repeated step() calls do not recreate kernels. - Host-side overflow validation is skipped while a CUDA graph capture is active, making opt.deterministic capture-safe (the original branch documented capture as unsupported). - Ports the expanded determinism regression tests, minus the benchmark coverage that targeted the pre-google-deepmind#1301 benchmark API. Verified: full suite 1101 passed / 23 skipped; constraint rows (nefc, efc.type/id, sparse J structure and values) bitwise stable across same-process trials with CUDA graph capture; ~9% overhead at 1 world, ~24% at 256 worlds on RTX PRO 6000 Blackwell. Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Co-authored-by: Mark Yang <markyang2005@gmail.com>

Semantic port of mar-yan24's mark/autodifferentiation3 (123 commits behind) onto the 2/3 port. Preserves the design intact: - collision_smooth.py: smooth analytic distance functions overwrite discrete contact geometry (dist/pos/frame) during taped forwards; differentiable constraint assembly (smooth_contact_to_efc) recomputes efc.J/efc.pos on an AD-visible path. Plane-sphere, sphere-sphere, sphere-capsule, capsule-capsule, plane-capsule supported; other geom pairs pass through with zero gradient. - adjoint.py: efc-level adjoint kernels (_efc_J_grad_kernel, _efc_pos_grad_kernel) connecting the phase-2 KKT adjoint to contact geometry; smooth/surrogate friction adjoint variants. - forward.py: tape-aware Phase 3 hooks after collision and make_constraint; _record_solver_adjoint/_record_fwd_accel_adjoint/ _record_euler_damp_adjoint tape callbacks; _isolate_intermediates_for_ad per-substep array isolation; freejoint zerograd fix (is_free flag instead of continue) was already present upstream. - grad.py: COLLISION_GRAD_FIELDS registry + dotted-path _resolve_field. Adaptations to current main: - _record_euler_damp_adjoint captures damp_deriv (polynomial damping, computed from the substep's qvel at record time) instead of the old constant dof_damping, matching the forward euler solve; sparse kernel call updated to the M_rownnz/M_rowadr signature - _qfrc_smooth factory kernel re-enabled for backward (pure elementwise sum; main had enable_backward=False which severed the qfrc_smooth -> qfrc_actuator/ctrl gradient chain) - jac_dof calls updated for the body_isdofancestor parameter - _solve_LD_sparse: kept main's fused fast path for non-grad solves, adopted the nograd-copy + manual-adjoint design for grad solves - warmstart copy uses d.qacc (solver solution) not integrator-local qacc, restoring upstream/sleep semantics; sleep.wake retained in forward() - step-level xpos-loss tests refresh kinematics inside the tape (step() does not recompute xpos post-integration; FD agrees it would otherwise be identically zero) and the nonzero-gradient guard is scaled by the FD norm (single-step dL/dctrl ~ dt^2 ~ 2e-7 < the old 1e-6 threshold) - d.qM -> d.M renames; dropped stale pre-google-deepmind#1301 hunks (gauss_cost kernel, old next_act relocation, old collision_flex/passive/types drift) Verification: full suite 1093 passed / 23 skipped, including all 29 grad tests (solver adjoint, euler damp stress, smooth contact, friction surrogate, freejoint kinematics). Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Co-authored-by: Mark Yang <markyang2005@gmail.com>

Initial stab at moving benchmark logic to python.

60c530f

Cleans up testspeed to be only human readable output. Moves benchmark running to benchmarks/run.py with example in humanoid.

erikfrey requested a review from thowell April 20, 2026 18:35

thowell reviewed Apr 20, 2026

View reviewed changes

Comment thread benchmarks/humanoid/__init__.py

thowell reviewed Apr 20, 2026

View reviewed changes

Comment thread benchmarks/humanoid/__init__.py Outdated

erikfrey added 2 commits April 20, 2026 17:48

Support for asset fetching from git repos.

4348c31

Merge remote-tracking branch 'origin' into benchmark_python

8eecf48

thowell approved these changes Apr 22, 2026

View reviewed changes

erikfrey added 7 commits April 23, 2026 16:45

Refactor CLI tools: centralize shared code in cli.py and add record b…

40787db

…inary

fix ruff

36a7d1f

More changes and refactors.

b742643

Merge remote-tracking branch 'origin/main' into benchmark_python

5ed44b1

Remove mediapy, introduces too many deps.

c5adb22

Update uv.lock to HEAD

893ba51

Better rollout video for aloha_cloth

b30bc15

erikfrey changed the title ~~Initial stab at moving benchmark logic to python.~~ Refactor CLI tools and generate benchmark documentation Apr 24, 2026

erikfrey marked this pull request as ready for review April 24, 2026 20:27