Skip to content

implicit integrator#1339

Merged
thowell merged 1 commit into
google-deepmind:mainfrom
thowell:implicit
May 22, 2026
Merged

implicit integrator#1339
thowell merged 1 commit into
google-deepmind:mainfrom
thowell:implicit

Conversation

@thowell

@thowell thowell commented May 7, 2026

Copy link
Copy Markdown
Collaborator

add implicit integrator


Humanoid Benchmark Report

Date: 2026-05-08
GPU: NVIDIA RTX 6000 Ada Generation (48 GB)
Scene: benchmarks/humanoid/humanoid.xml (nq=28, nv=27, nu=21, nbody=17, ngeom=20)
Config: --nworld=8192 --nconmax=24 --njmax=64, 1000 steps at dt=0.005
Solver: NEWTON (iterations=100, ls_iterations=50), PYRAMIDAL cone, dense Jacobian

Metric implicitfast (prev 466793a) implicitfast (curr 17fba98) implicit (curr 17fba98) rk4 (curr 17fba98)
Steps/sec 4,088,907 4,093,005 1,628,712 1,157,913
Realtime factor 20,445× 20,465× 8,144× 5,790×
Time/step (ns) 244.56 244.32 613.98 863.62
Sim time (s) 2.00 2.00 5.03 7.07
JIT time (s) 0.35 0.36 0.39 0.52
Converged 8192/8192 8192/8192 8192/8192 8192/8192

Integrator Comparison (current commit)

Phase implicitfast implicit rk4
step (total) 242.58 ns 612.03 ns 861.40 ns
forward 222.88 ns 220.71 ns 202.63 ns
implicit 19.18 ns 390.80 ns
  deriv_smooth_vel 11.35 ns 251.57 ns
  factor_solve_lu 130.92 ns
rungekutta4 658.23 ns
  forward ×3 (substeps) ~649.60 ns
solve 122.69 ns 122.17 ns 103.96 ns

Relative to implicitfast:

  • implicit is 2.51× slower (613.98 ns vs 244.32 ns) — overhead from full RNE derivative + LU factorization
  • rk4 is 3.54× slower (863.62 ns vs 244.32 ns) — overhead from 3 additional forward evaluations

Important

implicit overhead: Full deriv_smooth_vel (22.2× more expensive than the fast variant) + factor_solve_lu (131 ns LU factorization/solve).

rk4 overhead: Three additional forward substep evaluations (~217 ns each) inside rungekutta4, totaling ~650 ns of extra compute.

The forward pipeline (collision, solve, kinematics, etc.) is identical across integrators — the differences are purely in the integration phase.


Regression Check: implicitfast (previous vs current commit)

Phase prev (466793a) curr (17fba98) Δ
step (total) 242.75 ns 242.58 ns −0.1%
forward 222.62 ns 222.88 ns +0.1%
implicit 19.60 ns 19.18 ns −2.1%
  deriv_smooth_vel 11.77 ns 11.35 ns −3.6%
solve 122.95 ns 122.69 ns −0.2%
fwd_position 65.83 ns 66.23 ns +0.6%
fwd_velocity 15.61 ns 15.71 ns +0.6%

Note

The implicitfast integrator shows no regression vs the previous commit (466793a). The implicit phase is slightly faster (−2.1%) due to fused M-structure derivative computation replacing the previous D-structure → M-structure mapping path.


Detailed Event Traces

implicitfast — previous commit (466793a)

step: 242.75
  implicit: 19.60
    deriv_smooth_vel: 11.77
  forward: 222.62
    sensor_vel: 0.17
    fwd_actuation: 1.79
    sensor_pos: 0.17
    sensor_acc: 3.16
    fwd_position: 65.83
      tendon_armature: 0.18
      flex: 0.17
      make_constraint: 19.74
      crb: 13.53
      camlight: 1.78
      com_pos: 5.66
      tendon: 0.17
      transmission: 1.60
      kinematics: 10.88
      collision: 10.21
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.35
        nxn_broadphase: 3.79
    solve: 122.95
      mul_m: 6.13
    fwd_acceleration: 11.04
      xfrc_accumulate: 1.63
    fwd_velocity: 15.61
      tendon_bias: 0.17
      passive: 1.38
      rne: 6.71
      com_vel: 5.82

implicitfast — current commit (17fba98)

step: 242.58
  implicit: 19.18
    deriv_smooth_vel: 11.35
  forward: 222.88
    fwd_velocity: 15.71
      com_vel: 5.83
      rne: 6.81
      tendon_bias: 0.17
      passive: 1.37
    fwd_acceleration: 11.08
      xfrc_accumulate: 1.67
    fwd_actuation: 1.78
    sensor_pos: 0.17
    fwd_position: 66.23
      make_constraint: 20.05
      flex: 0.17
      com_pos: 5.71
      tendon: 0.17
      camlight: 1.76
      kinematics: 10.91
      transmission: 1.61
      collision: 10.23
        nxn_broadphase: 3.78
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.38
      tendon_armature: 0.17
      crb: 13.54
    sensor_vel: 0.17
    sensor_acc: 3.17
    solve: 122.69
      mul_m: 5.94

implicit — current commit (17fba98)

step: 612.03
  implicit: 390.80
    factor_solve_lu: 130.92
    deriv_smooth_vel: 251.57
  forward: 220.71
    sensor_vel: 0.17
    fwd_velocity: 15.71
      tendon_bias: 0.17
      com_vel: 5.83
      rne: 6.82
      passive: 1.37
    sensor_acc: 3.16
    sensor_pos: 0.18
    solve: 122.17
      mul_m: 5.89
    fwd_actuation: 1.78
    fwd_acceleration: 11.07
      xfrc_accumulate: 1.67
    fwd_position: 64.59
      crb: 13.18
      com_pos: 5.74
      collision: 10.23
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.39
        nxn_broadphase: 3.78
      tendon: 0.17
      tendon_armature: 0.17
      kinematics: 9.71
      transmission: 1.64
      make_constraint: 19.89
      flex: 0.17
      camlight: 1.78

rk4 — current commit (17fba98)

step: 861.40
  rungekutta4: 658.23
    forward: [ 215.43, 214.54, 219.63 ]
      fwd_acceleration: [ 11.07, 11.07, 11.10 ]
        xfrc_accumulate: [ 1.67, 1.66, 1.67 ]
      sensor_acc: [ 3.18, 3.18, 3.17 ]
      fwd_actuation: [ 1.77, 1.78, 1.78 ]
      fwd_position: [ 64.82, 64.87, 64.88 ]
        collision: [ 10.24, 10.21, 10.21 ]
          nxn_broadphase: [ 3.78, 3.77, 3.77 ]
          primitive_narrowphase: [ 5.38, 5.38, 5.38 ]
          convex_narrowphase: [ 0.17, 0.17, 0.17 ]
        make_constraint: [ 20.15, 20.18, 20.20 ]
        crb: [ 13.20, 13.20, 13.20 ]
        kinematics: [ 9.63, 9.66, 9.65 ]
        com_pos: [ 5.76, 5.77, 5.76 ]
        camlight: [ 1.78, 1.78, 1.78 ]
        transmission: [ 1.63, 1.64, 1.64 ]
        tendon_armature: [ 0.17, 0.18, 0.17 ]
        tendon: [ 0.17, 0.17, 0.18 ]
        flex: [ 0.17, 0.17, 0.17 ]
      sensor_pos: [ 0.17, 0.18, 0.17 ]
      solve: [ 116.66, 115.71, 120.76 ]
        mul_m: [ 5.88, 5.97, 5.99 ]
      sensor_vel: [ 0.17, 0.18, 0.17 ]
      fwd_velocity: [ 15.72, 15.74, 15.73 ]
        rne: [ 6.82, 6.82, 6.82 ]
        com_vel: [ 5.83, 5.83, 5.83 ]
        passive: [ 1.38, 1.37, 1.37 ]
        tendon_bias: [ 0.17, 0.17, 0.17 ]
  forward: 202.63
    fwd_actuation: 1.78
    solve: 103.96
      mul_m: 5.87
    fwd_position: 64.68
      transmission: 1.61
      tendon: 0.17
      kinematics: 9.68
      camlight: 1.78
      tendon_armature: 0.17
      crb: 13.20
      com_pos: 5.76
      flex: 0.17
      collision: 10.20
        primitive_narrowphase: 5.37
        convex_narrowphase: 0.17
        nxn_broadphase: 3.76
      make_constraint: 20.03
    sensor_pos: 0.17
    fwd_velocity: 15.75
      rne: 6.83
      passive: 1.37
      com_vel: 5.84
      tendon_bias: 0.17
    sensor_vel: 0.17
    sensor_acc: 3.16
    fwd_acceleration: 11.08
      xfrc_accumulate: 1.67

@thowell thowell force-pushed the implicit branch 8 times, most recently from 062bc66 to 5e8e661 Compare May 7, 2026 21:57
@thowell thowell marked this pull request as ready for review May 7, 2026 23:48
@thowell thowell force-pushed the implicit branch 3 times, most recently from 6381deb to 08eab76 Compare May 8, 2026 23:00
@thowell thowell linked an issue May 15, 2026 that may be closed by this pull request
erikfrey added a commit to erikfrey/mujoco_warp that referenced this pull request May 15, 2026
- Use high-throughput instead of fast in README intro (thowell)
- Simplify --event_trace flag usage, drop =True (thowell)
- Remove Jacobian format exception, sparse Jacobians now supported (thowell)
- Simplify Flex to experimental status (thowell)
- Move differentiability out of exceptions list, link to google-deepmind#500 (thowell)
- Link implicit integrator exception to PR google-deepmind#1339 (thowell)
- Fix e.stderr logging in _view_benchmark, use e.returncode instead (thowell)
- Warn and show only first match when --view matches multiple benchmarks (thowell)
erikfrey added a commit that referenced this pull request May 18, 2026
…1359)

* Refresh README, add AGENTS.md, add --view flag to benchmarks/run.py

Rewrites the README to lead with a quickstart example, reorganizes sections for clarity, replaces the compatibility table with a concise exceptions list, and adds an animated examples gallery showcasing benchmark scenes.

Adds AGENTS.md with development workflow conventions and a .agent/rules pointer for AI coding assistants.

Adds a --view flag to benchmarks/run.py that launches mjwarp-viewer on a benchmark scene with nworld=1, so the examples in the README work out of the box.

* Address review comments

- Use high-throughput instead of fast in README intro (thowell)
- Simplify --event_trace flag usage, drop =True (thowell)
- Remove Jacobian format exception, sparse Jacobians now supported (thowell)
- Simplify Flex to experimental status (thowell)
- Move differentiability out of exceptions list, link to #500 (thowell)
- Link implicit integrator exception to PR #1339 (thowell)
- Fix e.stderr logging in _view_benchmark, use e.returncode instead (thowell)
- Warn and show only first match when --view matches multiple benchmarks (thowell)

* Add PR review etiquette and amend guidelines to AGENTS.md

* Clean up run.py view logic and fix AGENTS.md formatting

Restructure --view to avoid loop+break pattern, add empty benchmarks check with helpful error message. Unwrap hard-wrapped lines in AGENTS.md and soften PR body guidelines.
@thowell thowell force-pushed the implicit branch 5 times, most recently from fd3c379 to a0daa0e Compare May 22, 2026 14:39
@thowell thowell merged commit ff5dfeb into google-deepmind:main May 22, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implicit integrator

2 participants