Optimization: qderiv_actuator_passive_actuation by Kenny-Vilella · Pull Request #1243 · google-deepmind/mujoco_warp

Kenny-Vilella · 2026-03-18T07:26:52Z

Optimized the qderiv_actuator_passive_actuation kernel in the sparse case.
It was taking around 10% of the time in the g1 environment in newton, it is now almost negligible.

Some benchmark numbers from Newton using menagerie asset:

Robot	Unit	Worlds	DoFs	Bodies	This branch (ms)	main (ms)	delta (%)
go2	x2	8192	36	26	79.1	81.5	-3%
g1	x1	8192	49	44	103.1	112.6	-8%
t1	x3	4096	87	72	57.5	67.3	-15%
g1	x2	8192	98	88	180.0	217.3	-17%
apollo	x3	8192	114	108	103.2	139.7	-26%
humanoid	x5	8192	135	65	152.4	161.8	-6%

For Mjwarp benchmarks, I do not see any measurable perf difference.

Kenny-Vilella

I am not yet totally sure about the code for complex coupled actuators.
Added a unit tests, but wonder if threre is more complex use cases that should also be tested.

thowell · 2026-03-19T12:59:19Z

+  rowadr = moment_rowadr_in[worldid, actid]
+
+  for i in range(rownnz):
+    dofi = moment_colind_in[worldid, rowadr + i]


is it more efficient to create a variable like rowadri = rowadr + i and reuse below?

thowell · 2026-03-19T13:04:48Z

+        <flag gravity="disable"/>
+      </option>
+      <worldbody>
+{body_xml}{close_xml}      </worldbody>


can we update this xml string to include the actual strings that are generated above?

Done.
It's not pretty though^^

@erikfrey what do we think about the this, do we prefer having the entire xml specified in the string or generating parts of the xml string with code?

hahaha, okay @Kenny-Vilella yeah that's not pretty :-)

here's my order of preference stack ranked from worst to best:

giant xml with deep indent embedded in derivative_test.py

programmatic string manipulation to create model in derivative_test.py

xml with deep indent saved mujoco_warp/test_data

programmatic model manipulation using mjspec in derivative_test.py

find a smaller XML that tests the same thing (e.g. skip the 35 dofs, just manually set jacobian="sparse" in xml)

thowell · 2026-03-19T13:06:56Z

@Kenny-Vilella mujoco actuators are independent. nice speedups with this pr!

…ze_qderiv_actuator_passive_actuation

thowell · 2026-03-20T09:41:03Z

+
+  for i in range(rownnz):
+    rowadri = rowadr + i
+    dofi = moment_colind_in[worldid, rowadri]


should we consider moving this moment_colind_in read after the continue?

thowell · 2026-03-20T09:41:47Z

+
+    for j in range(i + 1):
+      rowadrj = rowadr + j
+      dofj = moment_colind_in[worldid, rowadrj]


should we consider moving this moment_colind_in read after the continue?

thowell · 2026-03-20T09:43:18Z

+        <flag gravity="disable"/>
+      </option>
+      <worldbody>
+    <body pos="0.1 0 0">


i think this should be indented in the worldbody scope

I think, it should be fine now?

thowell · 2026-03-20T09:49:13Z

left a few comments, otherwise lgtm. thanks!

…e_actuation

@thowell

…okup table (#1334) * Fix _qderiv_actuator_passive_actuation_sparse row indexing into qM_fullm The kernel introduced in #1243 searches qM_fullm for the (row, col) elemid using m.M_rownnz / m.M_rowadr. Those arrays describe MuJoCo's compact mass matrix sparsity, where joints whose internal block is treated as diagonal-only by mj_factorM (e.g. free joints) contribute one entry per internal dof. qM_fullm_i / qM_fullm_j, however, are built by walking dof_parentid and include the full chained internal block, so the two layouts diverge whenever a joint with diagonal-only compact storage precedes any actuated dof in qvel order. In that case the kernel reads the wrong slice of qMj, the inner `qMj[row_startk] == col` check never fires, and the actuator's contribution to qDeriv is silently dropped. Downstream factor_solve_i then sees (M - dt*qDeriv) without the actuator damping for the affected dof, and the implicit step diverges. The bug only surfaces with this specific topology (free joint at qvel start + actuated dof after it), which is why the existing serial-chain unit test (PR #1243's test_smooth_vel_sparse_tendon_coupled, no free joint) does not catch it. Repro: a single free body followed by an actuated hinge; with the buggy indexing, mjwarp's deriv_smooth_vel diagonal entry for the hinge is qM only (0.001), while MuJoCo's reference is qM + dt*(kp+kv) (0.003). Fix: build chain-aware row offsets qM_fullm_rownnz / qM_fullm_rowadr alongside qM_fullm_i / qM_fullm_j in io.py and pass them to the kernel instead of the compact M_rownnz / M_rowadr. Adds test_smooth_vel_sparse_free_joint_precedes_actuator covering the minimum reproducer. * Address review: drop unused body name and is_sparse assert Per @thowell's PR feedback (#1334), remove the redundant body name attribute and the is_sparse assertion in the new regression test. * Switch sparse qDeriv lookup to qM_fullm_elemid (Kenny's approach) Replace the chain-aware (qM_fullm_rownnz, qM_fullm_rowadr) row offsets plus linear search through qMj with a dense (nv x nv) qM_fullm_elemid lookup table built in io.py. The kernel does an O(1) reverse lookup instead of walking the row to find the matching column. For typical robot models (humanoid, three_humanoids) kernel time is unchanged within trial noise. For deep chains with high-rownnz actuators (e.g. tendons spanning a long serial chain) the kernel asymptotically drops from O(depth^3) to O(depth^2) per actuator thread; a 100-link chain with a tendon actuator runs ~8x faster end-to-end on a 5090. Memory cost: nv^2 * 4 bytes (40 KiB at nv=100, 1 MiB at nv=500).

Kenny-Vilella added 2 commits March 18, 2026 13:48

Optimize qderiv_actuator_passive_actuation

4ad497b

Add a UT for sparse and coupled actuator

37b8d96

Kenny-Vilella commented Mar 18, 2026

View reviewed changes

Comment thread mujoco_warp/_src/derivative.py Outdated

Autoformatter

176f5a5

thowell reviewed Mar 19, 2026

View reviewed changes

Comment thread mujoco_warp/_src/derivative.py Outdated

thowell reviewed Mar 19, 2026

View reviewed changes

Comment thread mujoco_warp/_src/derivative.py Outdated

thowell reviewed Mar 19, 2026

View reviewed changes

Comment thread mujoco_warp/_src/derivative.py Outdated

thowell reviewed Mar 19, 2026

View reviewed changes

Kenny-Vilella added 2 commits March 20, 2026 11:39

Address PR review comment

d162c36

Merge remote-tracking branch 'upstream/main' into dev/kvilella/optimi…

941c7c2

…ze_qderiv_actuator_passive_actuation

thowell reviewed Mar 20, 2026

View reviewed changes

Kenny-Vilella and others added 2 commits March 20, 2026 17:49

Address PR review comment (2)

88911f3

Merge branch 'main' into dev/kvilella/optimize_qderiv_actuator_passiv…

b8f4123

…e_actuation

thowell approved these changes Mar 21, 2026

View reviewed changes

fix merge

1109574

thowell merged commit 80e146c into google-deepmind:main Mar 21, 2026
10 checks passed

Kenny-Vilella deleted the dev/kvilella/optimize_qderiv_actuator_passive_actuation branch March 23, 2026 01:09

This was referenced May 6, 2026

_qderiv_actuator_passive_actuation_sparse indexes qMj with wrong row offsets when free joint is in qvel #1333

Closed

Fix _qderiv_actuator_passive_actuation_sparse: use qM_fullm_elemid lookup table #1334

Merged

Conversation

Kenny-Vilella commented Mar 18, 2026

Uh oh!

Kenny-Vilella left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thowell commented Mar 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thowell commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants