Skip to content

Refactor CLI tools and generate benchmark documentation#1301

Merged
thowell merged 13 commits into
google-deepmind:mainfrom
erikfrey:benchmark_python
Apr 28, 2026
Merged

Refactor CLI tools and generate benchmark documentation#1301
thowell merged 13 commits into
google-deepmind:mainfrom
erikfrey:benchmark_python

Conversation

@erikfrey

@erikfrey erikfrey commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator

This PR completes the refactor of CLI tools (testspeed.py, record.py, viewer.py) to use a centralized cli.py module, and adds automated documentation generation for benchmarks.

Key Changes:

CLI Refactor

  • Centralized init_structs and unroll logic in _src/cli.py.
  • Updated testspeed.py and viewer.py to use shared flags and functions from cli.py.
  • Added record.py tool for generating videos of benchmarks, using Pillow for saving .webp and .gif files.

Benchmark Documentation

  • Created README.md in each benchmark directory with model info and descriptions.
  • Generated .webp animations for each benchmark using mjwarp-record.
  • Improved model info tables in READMEs to show DoFs, Integrator, and accurate Matrix Format (Sparse/Dense).
  • Updated benchmarks/README.md to reflect the new Python-based framework.

Dependency Updates

  • Added pillow to dev dependencies.

Cleans up testspeed to be only human readable output.  Moves
benchmark running to benchmarks/run.py with example in humanoid.
@erikfrey erikfrey requested a review from thowell April 20, 2026 18:35
Comment thread benchmarks/humanoid/__init__.py
Comment thread benchmarks/humanoid/__init__.py Outdated
@erikfrey erikfrey changed the title Initial stab at moving benchmark logic to python. Refactor CLI tools and generate benchmark documentation Apr 24, 2026
@erikfrey erikfrey marked this pull request as ready for review April 24, 2026 20:27
Comment thread benchmarks/aloha_pot/README.md Outdated
Comment thread benchmarks/aloha_pot/README.md
Comment thread benchmarks/aloha_sdf/assets/spot.obj Outdated
Comment thread benchmarks/aloha_sdf/assets/spot.png Outdated
Comment thread benchmarks/aloha_sdf/README.md Outdated
Comment thread benchmarks/run.py
Comment thread mujoco_warp/_src/cli.py Outdated
Comment thread mujoco_warp/record.py Outdated
Comment thread mujoco_warp/record.py
@thowell

thowell commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

nice refactor! lgtm

@thowell thowell merged commit fc8b0d6 into google-deepmind:main Apr 28, 2026
10 checks passed
This was referenced Apr 29, 2026
johnnynunez added a commit to johnnynunez/mujoco_warp that referenced this pull request Jun 11, 2026
Semantic port of mar-yan24's mark/autodifferentiation2 (131 commits
behind) onto the rebased autodiff branch. Preserves the design intact:

- solver_implicit_adjoint: tape-backward hook solving H v = adj_qacc
  with the retained Newton Hessian, writing adj_qacc_smooth = M v
- GPU tile-Cholesky adjoint paths (small-nv single tile, blocked
  solve-only with stored factor, blocked full factorize+solve)
- Data.solver_h / solver_hfactor / solver_Jaref retained state,
  allocated in make_data/put_data, aliased by the solver context

Adaptations to current main:
- create_blocked_cholesky_func no longer exists upstream (replaced by
  fused create_blocked_cholesky_factorize_solve_func in 083e5ef);
  _adjoint_cholesky_full_blocked uses the fused func
- dropped his update_constraint_gauss_cost kernel (belonged to the
  pre-google-deepmind#1301 solver context which carried ctx.cost; current main computes
  gauss cost inside the linesearch tiles)
- dropped his support.py next_act removal (current main's derivative.py
  imports it)
- solver retained fields placed at the end of Data (types_test enforces
  warp-only fields after MuJoCo fields, with matching docstring order)
- test_solver_retained_state used an undefined SPARSE_CONSTRAINT_JACOBIAN
  global; replaced with m.is_sparse

Verification: full suite 1078 passed / 23 skipped, including his 4
GradSolverAdjointTest cases. Cross-validated independently: the GPU
adjoint mapped to force space (M^-1 M H^-1 g) matches central finite
differences through the full GPU solver to 8e-5 max relative error on
a persistent-contact scene.

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Co-authored-by: Mark Yang <markyang2005@gmail.com>
johnnynunez added a commit to johnnynunez/mujoco_warp that referenced this pull request Jun 11, 2026
Semantic port of mar-yan24's mark/autodifferentiation3 (123 commits
behind) onto the 2/3 port. Preserves the design intact:

- collision_smooth.py: smooth analytic distance functions overwrite
  discrete contact geometry (dist/pos/frame) during taped forwards;
  differentiable constraint assembly (smooth_contact_to_efc) recomputes
  efc.J/efc.pos on an AD-visible path. Plane-sphere, sphere-sphere,
  sphere-capsule, capsule-capsule, plane-capsule supported; other geom
  pairs pass through with zero gradient.
- adjoint.py: efc-level adjoint kernels (_efc_J_grad_kernel,
  _efc_pos_grad_kernel) connecting the phase-2 KKT adjoint to contact
  geometry; smooth/surrogate friction adjoint variants.
- forward.py: tape-aware Phase 3 hooks after collision and
  make_constraint; _record_solver_adjoint/_record_fwd_accel_adjoint/
  _record_euler_damp_adjoint tape callbacks; _isolate_intermediates_for_ad
  per-substep array isolation; freejoint zerograd fix (is_free flag
  instead of continue) was already present upstream.
- grad.py: COLLISION_GRAD_FIELDS registry + dotted-path _resolve_field.

Adaptations to current main:
- _record_euler_damp_adjoint captures damp_deriv (polynomial damping,
  computed from the substep's qvel at record time) instead of the old
  constant dof_damping, matching the forward euler solve; sparse kernel
  call updated to the M_rownnz/M_rowadr signature
- _qfrc_smooth factory kernel re-enabled for backward (pure elementwise
  sum; main had enable_backward=False which severed the qfrc_smooth ->
  qfrc_actuator/ctrl gradient chain)
- jac_dof calls updated for the body_isdofancestor parameter
- _solve_LD_sparse: kept main's fused fast path for non-grad solves,
  adopted the nograd-copy + manual-adjoint design for grad solves
- warmstart copy uses d.qacc (solver solution) not integrator-local qacc,
  restoring upstream/sleep semantics; sleep.wake retained in forward()
- step-level xpos-loss tests refresh kinematics inside the tape (step()
  does not recompute xpos post-integration; FD agrees it would otherwise
  be identically zero) and the nonzero-gradient guard is scaled by the
  FD norm (single-step dL/dctrl ~ dt^2 ~ 2e-7 < the old 1e-6 threshold)
- d.qM -> d.M renames; dropped stale pre-google-deepmind#1301 hunks (gauss_cost kernel,
  old next_act relocation, old collision_flex/passive/types drift)

Verification: full suite 1093 passed / 23 skipped, including all 29
grad tests (solver adjoint, euler damp stress, smooth contact, friction
surrogate, freejoint kinematics).

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
johnnynunez added a commit to johnnynunez/mujoco_warp that referenced this pull request Jun 11, 2026
…signment

Semantic re-implementation of mar-yan24's count->scan->emit pipeline
(google-deepmind#1300) on top of the post-a23500c constraint
kernels, since the original branch predates the efc_contact rewrite and
95 other upstream commits and cannot be rebased mechanically.

When opt.deterministic=True, the racy wp.atomic_add(nefc/efc_nnz) slot
allocation in every constraint family is replaced with: a count kernel
that writes per-thread row/nnz counts, a per-world exclusive scan that
converts counts to offsets and bumps the totals, and emit kernels that
read base+offset instead of atomic results. Constraint rows therefore
land at identical positions on every run of the same input.

Differences from the original google-deepmind#1300 branch:
- Contact families use a single _efc_contact_count kernel mirroring the
  unified _efc_contact_init (upstream a23500c replaced the per-cone
  _contact_pyramidal/_contact_elliptic kernels this no longer touches).
- _equality_flex_count honors eq_active (upstream 80bbba7).
- All new kernel factories are wrapped in @cache_kernel (upstream
  f0e2d81) so repeated step() calls do not recreate kernels.
- Host-side overflow validation is skipped while a CUDA graph capture is
  active, making opt.deterministic capture-safe (the original branch
  documented capture as unsupported).
- Ports the expanded determinism regression tests, minus the benchmark
  coverage that targeted the pre-google-deepmind#1301 benchmark API.

Verified: full suite 1101 passed / 23 skipped; constraint rows (nefc,
efc.type/id, sparse J structure and values) bitwise stable across
same-process trials with CUDA graph capture; ~9% overhead at 1 world,
~24% at 256 worlds on RTX PRO 6000 Blackwell.

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Co-authored-by: Mark Yang <markyang2005@gmail.com>
johnnynunez added a commit to johnnynunez/mujoco_warp that referenced this pull request Jun 11, 2026
Semantic port of mar-yan24's mark/autodifferentiation3 (123 commits
behind) onto the 2/3 port. Preserves the design intact:

- collision_smooth.py: smooth analytic distance functions overwrite
  discrete contact geometry (dist/pos/frame) during taped forwards;
  differentiable constraint assembly (smooth_contact_to_efc) recomputes
  efc.J/efc.pos on an AD-visible path. Plane-sphere, sphere-sphere,
  sphere-capsule, capsule-capsule, plane-capsule supported; other geom
  pairs pass through with zero gradient.
- adjoint.py: efc-level adjoint kernels (_efc_J_grad_kernel,
  _efc_pos_grad_kernel) connecting the phase-2 KKT adjoint to contact
  geometry; smooth/surrogate friction adjoint variants.
- forward.py: tape-aware Phase 3 hooks after collision and
  make_constraint; _record_solver_adjoint/_record_fwd_accel_adjoint/
  _record_euler_damp_adjoint tape callbacks; _isolate_intermediates_for_ad
  per-substep array isolation; freejoint zerograd fix (is_free flag
  instead of continue) was already present upstream.
- grad.py: COLLISION_GRAD_FIELDS registry + dotted-path _resolve_field.

Adaptations to current main:
- _record_euler_damp_adjoint captures damp_deriv (polynomial damping,
  computed from the substep's qvel at record time) instead of the old
  constant dof_damping, matching the forward euler solve; sparse kernel
  call updated to the M_rownnz/M_rowadr signature
- _qfrc_smooth factory kernel re-enabled for backward (pure elementwise
  sum; main had enable_backward=False which severed the qfrc_smooth ->
  qfrc_actuator/ctrl gradient chain)
- jac_dof calls updated for the body_isdofancestor parameter
- _solve_LD_sparse: kept main's fused fast path for non-grad solves,
  adopted the nograd-copy + manual-adjoint design for grad solves
- warmstart copy uses d.qacc (solver solution) not integrator-local qacc,
  restoring upstream/sleep semantics; sleep.wake retained in forward()
- step-level xpos-loss tests refresh kinematics inside the tape (step()
  does not recompute xpos post-integration; FD agrees it would otherwise
  be identically zero) and the nonzero-gradient guard is scaled by the
  FD norm (single-step dL/dctrl ~ dt^2 ~ 2e-7 < the old 1e-6 threshold)
- d.qM -> d.M renames; dropped stale pre-google-deepmind#1301 hunks (gauss_cost kernel,
  old next_act relocation, old collision_flex/passive/types drift)

Verification: full suite 1093 passed / 23 skipped, including all 29
grad tests (solver adjoint, euler damp stress, smooth contact, friction
surrogate, freejoint kinematics).

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Co-authored-by: Mark Yang <markyang2005@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants