Refactor CLI tools and generate benchmark documentation#1301
Merged
Conversation
Cleans up testspeed to be only human readable output. Moves benchmark running to benchmarks/run.py with example in humanoid.
thowell
reviewed
Apr 20, 2026
thowell
reviewed
Apr 20, 2026
thowell
approved these changes
Apr 22, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
thowell
reviewed
Apr 24, 2026
Collaborator
|
nice refactor! lgtm |
thowell
approved these changes
Apr 24, 2026
This was referenced Apr 29, 2026
Closed
johnnynunez
added a commit
to johnnynunez/mujoco_warp
that referenced
this pull request
Jun 11, 2026
Semantic port of mar-yan24's mark/autodifferentiation2 (131 commits behind) onto the rebased autodiff branch. Preserves the design intact: - solver_implicit_adjoint: tape-backward hook solving H v = adj_qacc with the retained Newton Hessian, writing adj_qacc_smooth = M v - GPU tile-Cholesky adjoint paths (small-nv single tile, blocked solve-only with stored factor, blocked full factorize+solve) - Data.solver_h / solver_hfactor / solver_Jaref retained state, allocated in make_data/put_data, aliased by the solver context Adaptations to current main: - create_blocked_cholesky_func no longer exists upstream (replaced by fused create_blocked_cholesky_factorize_solve_func in 083e5ef); _adjoint_cholesky_full_blocked uses the fused func - dropped his update_constraint_gauss_cost kernel (belonged to the pre-google-deepmind#1301 solver context which carried ctx.cost; current main computes gauss cost inside the linesearch tiles) - dropped his support.py next_act removal (current main's derivative.py imports it) - solver retained fields placed at the end of Data (types_test enforces warp-only fields after MuJoCo fields, with matching docstring order) - test_solver_retained_state used an undefined SPARSE_CONSTRAINT_JACOBIAN global; replaced with m.is_sparse Verification: full suite 1078 passed / 23 skipped, including his 4 GradSolverAdjointTest cases. Cross-validated independently: the GPU adjoint mapped to force space (M^-1 M H^-1 g) matches central finite differences through the full GPU solver to 8e-5 max relative error on a persistent-contact scene. Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Co-authored-by: Mark Yang <markyang2005@gmail.com>
johnnynunez
added a commit
to johnnynunez/mujoco_warp
that referenced
this pull request
Jun 11, 2026
Semantic port of mar-yan24's mark/autodifferentiation3 (123 commits behind) onto the 2/3 port. Preserves the design intact: - collision_smooth.py: smooth analytic distance functions overwrite discrete contact geometry (dist/pos/frame) during taped forwards; differentiable constraint assembly (smooth_contact_to_efc) recomputes efc.J/efc.pos on an AD-visible path. Plane-sphere, sphere-sphere, sphere-capsule, capsule-capsule, plane-capsule supported; other geom pairs pass through with zero gradient. - adjoint.py: efc-level adjoint kernels (_efc_J_grad_kernel, _efc_pos_grad_kernel) connecting the phase-2 KKT adjoint to contact geometry; smooth/surrogate friction adjoint variants. - forward.py: tape-aware Phase 3 hooks after collision and make_constraint; _record_solver_adjoint/_record_fwd_accel_adjoint/ _record_euler_damp_adjoint tape callbacks; _isolate_intermediates_for_ad per-substep array isolation; freejoint zerograd fix (is_free flag instead of continue) was already present upstream. - grad.py: COLLISION_GRAD_FIELDS registry + dotted-path _resolve_field. Adaptations to current main: - _record_euler_damp_adjoint captures damp_deriv (polynomial damping, computed from the substep's qvel at record time) instead of the old constant dof_damping, matching the forward euler solve; sparse kernel call updated to the M_rownnz/M_rowadr signature - _qfrc_smooth factory kernel re-enabled for backward (pure elementwise sum; main had enable_backward=False which severed the qfrc_smooth -> qfrc_actuator/ctrl gradient chain) - jac_dof calls updated for the body_isdofancestor parameter - _solve_LD_sparse: kept main's fused fast path for non-grad solves, adopted the nograd-copy + manual-adjoint design for grad solves - warmstart copy uses d.qacc (solver solution) not integrator-local qacc, restoring upstream/sleep semantics; sleep.wake retained in forward() - step-level xpos-loss tests refresh kinematics inside the tape (step() does not recompute xpos post-integration; FD agrees it would otherwise be identically zero) and the nonzero-gradient guard is scaled by the FD norm (single-step dL/dctrl ~ dt^2 ~ 2e-7 < the old 1e-6 threshold) - d.qM -> d.M renames; dropped stale pre-google-deepmind#1301 hunks (gauss_cost kernel, old next_act relocation, old collision_flex/passive/types drift) Verification: full suite 1093 passed / 23 skipped, including all 29 grad tests (solver adjoint, euler damp stress, smooth contact, friction surrogate, freejoint kinematics). Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
johnnynunez
added a commit
to johnnynunez/mujoco_warp
that referenced
this pull request
Jun 11, 2026
…signment Semantic re-implementation of mar-yan24's count->scan->emit pipeline (google-deepmind#1300) on top of the post-a23500c constraint kernels, since the original branch predates the efc_contact rewrite and 95 other upstream commits and cannot be rebased mechanically. When opt.deterministic=True, the racy wp.atomic_add(nefc/efc_nnz) slot allocation in every constraint family is replaced with: a count kernel that writes per-thread row/nnz counts, a per-world exclusive scan that converts counts to offsets and bumps the totals, and emit kernels that read base+offset instead of atomic results. Constraint rows therefore land at identical positions on every run of the same input. Differences from the original google-deepmind#1300 branch: - Contact families use a single _efc_contact_count kernel mirroring the unified _efc_contact_init (upstream a23500c replaced the per-cone _contact_pyramidal/_contact_elliptic kernels this no longer touches). - _equality_flex_count honors eq_active (upstream 80bbba7). - All new kernel factories are wrapped in @cache_kernel (upstream f0e2d81) so repeated step() calls do not recreate kernels. - Host-side overflow validation is skipped while a CUDA graph capture is active, making opt.deterministic capture-safe (the original branch documented capture as unsupported). - Ports the expanded determinism regression tests, minus the benchmark coverage that targeted the pre-google-deepmind#1301 benchmark API. Verified: full suite 1101 passed / 23 skipped; constraint rows (nefc, efc.type/id, sparse J structure and values) bitwise stable across same-process trials with CUDA graph capture; ~9% overhead at 1 world, ~24% at 256 worlds on RTX PRO 6000 Blackwell. Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Co-authored-by: Mark Yang <markyang2005@gmail.com>
johnnynunez
added a commit
to johnnynunez/mujoco_warp
that referenced
this pull request
Jun 11, 2026
Semantic port of mar-yan24's mark/autodifferentiation3 (123 commits behind) onto the 2/3 port. Preserves the design intact: - collision_smooth.py: smooth analytic distance functions overwrite discrete contact geometry (dist/pos/frame) during taped forwards; differentiable constraint assembly (smooth_contact_to_efc) recomputes efc.J/efc.pos on an AD-visible path. Plane-sphere, sphere-sphere, sphere-capsule, capsule-capsule, plane-capsule supported; other geom pairs pass through with zero gradient. - adjoint.py: efc-level adjoint kernels (_efc_J_grad_kernel, _efc_pos_grad_kernel) connecting the phase-2 KKT adjoint to contact geometry; smooth/surrogate friction adjoint variants. - forward.py: tape-aware Phase 3 hooks after collision and make_constraint; _record_solver_adjoint/_record_fwd_accel_adjoint/ _record_euler_damp_adjoint tape callbacks; _isolate_intermediates_for_ad per-substep array isolation; freejoint zerograd fix (is_free flag instead of continue) was already present upstream. - grad.py: COLLISION_GRAD_FIELDS registry + dotted-path _resolve_field. Adaptations to current main: - _record_euler_damp_adjoint captures damp_deriv (polynomial damping, computed from the substep's qvel at record time) instead of the old constant dof_damping, matching the forward euler solve; sparse kernel call updated to the M_rownnz/M_rowadr signature - _qfrc_smooth factory kernel re-enabled for backward (pure elementwise sum; main had enable_backward=False which severed the qfrc_smooth -> qfrc_actuator/ctrl gradient chain) - jac_dof calls updated for the body_isdofancestor parameter - _solve_LD_sparse: kept main's fused fast path for non-grad solves, adopted the nograd-copy + manual-adjoint design for grad solves - warmstart copy uses d.qacc (solver solution) not integrator-local qacc, restoring upstream/sleep semantics; sleep.wake retained in forward() - step-level xpos-loss tests refresh kinematics inside the tape (step() does not recompute xpos post-integration; FD agrees it would otherwise be identically zero) and the nonzero-gradient guard is scaled by the FD norm (single-step dL/dctrl ~ dt^2 ~ 2e-7 < the old 1e-6 threshold) - d.qM -> d.M renames; dropped stale pre-google-deepmind#1301 hunks (gauss_cost kernel, old next_act relocation, old collision_flex/passive/types drift) Verification: full suite 1093 passed / 23 skipped, including all 29 grad tests (solver adjoint, euler damp stress, smooth contact, friction surrogate, freejoint kinematics). Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Co-authored-by: Mark Yang <markyang2005@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR completes the refactor of CLI tools (
testspeed.py,record.py,viewer.py) to use a centralizedcli.pymodule, and adds automated documentation generation for benchmarks.Key Changes:
CLI Refactor
init_structsandunrolllogic in_src/cli.py.testspeed.pyandviewer.pyto use shared flags and functions fromcli.py.record.pytool for generating videos of benchmarks, usingPillowfor saving.webpand.giffiles.Benchmark Documentation
README.mdin each benchmark directory with model info and descriptions..webpanimations for each benchmark usingmjwarp-record.benchmarks/README.mdto reflect the new Python-based framework.Dependency Updates
pillowto dev dependencies.