implicit integrator by thowell · Pull Request #1339 · google-deepmind/mujoco_warp

thowell · 2026-05-07T19:34:04Z

add implicit integrator

Humanoid Benchmark Report

Date: 2026-05-08
GPU: NVIDIA RTX 6000 Ada Generation (48 GB)
Scene: benchmarks/humanoid/humanoid.xml (nq=28, nv=27, nu=21, nbody=17, ngeom=20)
Config: --nworld=8192 --nconmax=24 --njmax=64, 1000 steps at dt=0.005
Solver: NEWTON (iterations=100, ls_iterations=50), PYRAMIDAL cone, dense Jacobian

Metric	implicitfast (prev `466793a`)	implicitfast (curr `17fba98`)	implicit (curr `17fba98`)	rk4 (curr `17fba98`)
Steps/sec	4,088,907	4,093,005	1,628,712	1,157,913
Realtime factor	20,445×	20,465×	8,144×	5,790×
Time/step (ns)	244.56	244.32	613.98	863.62
Sim time (s)	2.00	2.00	5.03	7.07
JIT time (s)	0.35	0.36	0.39	0.52
Converged	8192/8192	8192/8192	8192/8192	8192/8192

Integrator Comparison (current commit)

Phase	implicitfast	implicit	rk4
`step` (total)	242.58 ns	612.03 ns	861.40 ns
`forward`	222.88 ns	220.71 ns	202.63 ns
`implicit`	19.18 ns	390.80 ns	—
`deriv_smooth_vel`	11.35 ns	251.57 ns	—
`factor_solve_lu`	—	130.92 ns	—
`rungekutta4`	—	—	658.23 ns
`forward` ×3 (substeps)	—	—	~649.60 ns
`solve`	122.69 ns	122.17 ns	103.96 ns

Relative to implicitfast:

implicit is 2.51× slower (613.98 ns vs 244.32 ns) — overhead from full RNE derivative + LU factorization
rk4 is 3.54× slower (863.62 ns vs 244.32 ns) — overhead from 3 additional forward evaluations

Important

implicit overhead: Full deriv_smooth_vel (22.2× more expensive than the fast variant) + factor_solve_lu (131 ns LU factorization/solve).

rk4 overhead: Three additional forward substep evaluations (~217 ns each) inside rungekutta4, totaling ~650 ns of extra compute.

The forward pipeline (collision, solve, kinematics, etc.) is identical across integrators — the differences are purely in the integration phase.

Regression Check: implicitfast (previous vs current commit)

Phase	prev (`466793a`)	curr (`17fba98`)	Δ
`step` (total)	242.75 ns	242.58 ns	−0.1%
`forward`	222.62 ns	222.88 ns	+0.1%
`implicit`	19.60 ns	19.18 ns	−2.1%
`deriv_smooth_vel`	11.77 ns	11.35 ns	−3.6%
`solve`	122.95 ns	122.69 ns	−0.2%
`fwd_position`	65.83 ns	66.23 ns	+0.6%
`fwd_velocity`	15.61 ns	15.71 ns	+0.6%

Note

The implicitfast integrator shows no regression vs the previous commit (466793a). The implicit phase is slightly faster (−2.1%) due to fused M-structure derivative computation replacing the previous D-structure → M-structure mapping path.

Detailed Event Traces

implicitfast — previous commit (`466793a`)

step: 242.75
  implicit: 19.60
    deriv_smooth_vel: 11.77
  forward: 222.62
    sensor_vel: 0.17
    fwd_actuation: 1.79
    sensor_pos: 0.17
    sensor_acc: 3.16
    fwd_position: 65.83
      tendon_armature: 0.18
      flex: 0.17
      make_constraint: 19.74
      crb: 13.53
      camlight: 1.78
      com_pos: 5.66
      tendon: 0.17
      transmission: 1.60
      kinematics: 10.88
      collision: 10.21
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.35
        nxn_broadphase: 3.79
    solve: 122.95
      mul_m: 6.13
    fwd_acceleration: 11.04
      xfrc_accumulate: 1.63
    fwd_velocity: 15.61
      tendon_bias: 0.17
      passive: 1.38
      rne: 6.71
      com_vel: 5.82

implicitfast — current commit (`17fba98`)

step: 242.58
  implicit: 19.18
    deriv_smooth_vel: 11.35
  forward: 222.88
    fwd_velocity: 15.71
      com_vel: 5.83
      rne: 6.81
      tendon_bias: 0.17
      passive: 1.37
    fwd_acceleration: 11.08
      xfrc_accumulate: 1.67
    fwd_actuation: 1.78
    sensor_pos: 0.17
    fwd_position: 66.23
      make_constraint: 20.05
      flex: 0.17
      com_pos: 5.71
      tendon: 0.17
      camlight: 1.76
      kinematics: 10.91
      transmission: 1.61
      collision: 10.23
        nxn_broadphase: 3.78
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.38
      tendon_armature: 0.17
      crb: 13.54
    sensor_vel: 0.17
    sensor_acc: 3.17
    solve: 122.69
      mul_m: 5.94

implicit — current commit (`17fba98`)

step: 612.03
  implicit: 390.80
    factor_solve_lu: 130.92
    deriv_smooth_vel: 251.57
  forward: 220.71
    sensor_vel: 0.17
    fwd_velocity: 15.71
      tendon_bias: 0.17
      com_vel: 5.83
      rne: 6.82
      passive: 1.37
    sensor_acc: 3.16
    sensor_pos: 0.18
    solve: 122.17
      mul_m: 5.89
    fwd_actuation: 1.78
    fwd_acceleration: 11.07
      xfrc_accumulate: 1.67
    fwd_position: 64.59
      crb: 13.18
      com_pos: 5.74
      collision: 10.23
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.39
        nxn_broadphase: 3.78
      tendon: 0.17
      tendon_armature: 0.17
      kinematics: 9.71
      transmission: 1.64
      make_constraint: 19.89
      flex: 0.17
      camlight: 1.78

rk4 — current commit (`17fba98`)

step: 861.40
  rungekutta4: 658.23
    forward: [ 215.43, 214.54, 219.63 ]
      fwd_acceleration: [ 11.07, 11.07, 11.10 ]
        xfrc_accumulate: [ 1.67, 1.66, 1.67 ]
      sensor_acc: [ 3.18, 3.18, 3.17 ]
      fwd_actuation: [ 1.77, 1.78, 1.78 ]
      fwd_position: [ 64.82, 64.87, 64.88 ]
        collision: [ 10.24, 10.21, 10.21 ]
          nxn_broadphase: [ 3.78, 3.77, 3.77 ]
          primitive_narrowphase: [ 5.38, 5.38, 5.38 ]
          convex_narrowphase: [ 0.17, 0.17, 0.17 ]
        make_constraint: [ 20.15, 20.18, 20.20 ]
        crb: [ 13.20, 13.20, 13.20 ]
        kinematics: [ 9.63, 9.66, 9.65 ]
        com_pos: [ 5.76, 5.77, 5.76 ]
        camlight: [ 1.78, 1.78, 1.78 ]
        transmission: [ 1.63, 1.64, 1.64 ]
        tendon_armature: [ 0.17, 0.18, 0.17 ]
        tendon: [ 0.17, 0.17, 0.18 ]
        flex: [ 0.17, 0.17, 0.17 ]
      sensor_pos: [ 0.17, 0.18, 0.17 ]
      solve: [ 116.66, 115.71, 120.76 ]
        mul_m: [ 5.88, 5.97, 5.99 ]
      sensor_vel: [ 0.17, 0.18, 0.17 ]
      fwd_velocity: [ 15.72, 15.74, 15.73 ]
        rne: [ 6.82, 6.82, 6.82 ]
        com_vel: [ 5.83, 5.83, 5.83 ]
        passive: [ 1.38, 1.37, 1.37 ]
        tendon_bias: [ 0.17, 0.17, 0.17 ]
  forward: 202.63
    fwd_actuation: 1.78
    solve: 103.96
      mul_m: 5.87
    fwd_position: 64.68
      transmission: 1.61
      tendon: 0.17
      kinematics: 9.68
      camlight: 1.78
      tendon_armature: 0.17
      crb: 13.20
      com_pos: 5.76
      flex: 0.17
      collision: 10.20
        primitive_narrowphase: 5.37
        convex_narrowphase: 0.17
        nxn_broadphase: 3.76
      make_constraint: 20.03
    sensor_pos: 0.17
    fwd_velocity: 15.75
      rne: 6.83
      passive: 1.37
      com_vel: 5.84
      tendon_bias: 0.17
    sensor_vel: 0.17
    sensor_acc: 3.16
    fwd_acceleration: 11.08
      xfrc_accumulate: 1.67

- Use high-throughput instead of fast in README intro (thowell) - Simplify --event_trace flag usage, drop =True (thowell) - Remove Jacobian format exception, sparse Jacobians now supported (thowell) - Simplify Flex to experimental status (thowell) - Move differentiability out of exceptions list, link to google-deepmind#500 (thowell) - Link implicit integrator exception to PR google-deepmind#1339 (thowell) - Fix e.stderr logging in _view_benchmark, use e.returncode instead (thowell) - Warn and show only first match when --view matches multiple benchmarks (thowell)

…1359) * Refresh README, add AGENTS.md, add --view flag to benchmarks/run.py Rewrites the README to lead with a quickstart example, reorganizes sections for clarity, replaces the compatibility table with a concise exceptions list, and adds an animated examples gallery showcasing benchmark scenes. Adds AGENTS.md with development workflow conventions and a .agent/rules pointer for AI coding assistants. Adds a --view flag to benchmarks/run.py that launches mjwarp-viewer on a benchmark scene with nworld=1, so the examples in the README work out of the box. * Address review comments - Use high-throughput instead of fast in README intro (thowell) - Simplify --event_trace flag usage, drop =True (thowell) - Remove Jacobian format exception, sparse Jacobians now supported (thowell) - Simplify Flex to experimental status (thowell) - Move differentiability out of exceptions list, link to #500 (thowell) - Link implicit integrator exception to PR #1339 (thowell) - Fix e.stderr logging in _view_benchmark, use e.returncode instead (thowell) - Warn and show only first match when --view matches multiple benchmarks (thowell) * Add PR review etiquette and amend guidelines to AGENTS.md * Clean up run.py view logic and fix AGENTS.md formatting Restructure --view to avoid loop+break pattern, add empty benchmarks check with helpful error message. Unwrap hard-wrapped lines in AGENTS.md and soften PR body guidelines.

thowell force-pushed the implicit branch 8 times, most recently from 062bc66 to 5e8e661 Compare May 7, 2026 21:57

thowell marked this pull request as ready for review May 7, 2026 23:48

thowell force-pushed the implicit branch 3 times, most recently from 6381deb to 08eab76 Compare May 8, 2026 23:00

thowell linked an issue May 15, 2026 that may be closed by this pull request

implicit integrator #891

Open

erikfrey mentioned this pull request May 15, 2026

Refresh README, add AGENTS.md, add --view flag to benchmarks/run.py #1359

Merged

yuvaltassa approved these changes May 21, 2026

View reviewed changes

thowell force-pushed the implicit branch 5 times, most recently from fd3c379 to a0daa0e Compare May 22, 2026 14:39

implicit integrator

a80b897

thowell force-pushed the implicit branch from a0daa0e to a80b897 Compare May 22, 2026 15:34

thowell merged commit ff5dfeb into google-deepmind:main May 22, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implicit integrator#1339

implicit integrator#1339
thowell merged 1 commit into
google-deepmind:mainfrom
thowell:implicit

thowell commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thowell commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Humanoid Benchmark Report

Integrator Comparison (current commit)

Regression Check: implicitfast (previous vs current commit)

Detailed Event Traces

implicitfast — previous commit (466793a)

implicitfast — current commit (17fba98)

implicit — current commit (17fba98)

rk4 — current commit (17fba98)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thowell commented May 7, 2026 •

edited

Loading

implicitfast — previous commit (`466793a`)

implicitfast — current commit (`17fba98`)

implicit — current commit (`17fba98`)

rk4 — current commit (`17fba98`)