Optimization: qderiv_actuator_passive_actuation#1243
Conversation
Kenny-Vilella
left a comment
There was a problem hiding this comment.
I am not yet totally sure about the code for complex coupled actuators.
Added a unit tests, but wonder if threre is more complex use cases that should also be tested.
| rowadr = moment_rowadr_in[worldid, actid] | ||
|
|
||
| for i in range(rownnz): | ||
| dofi = moment_colind_in[worldid, rowadr + i] |
There was a problem hiding this comment.
is it more efficient to create a variable like rowadri = rowadr + i and reuse below?
| <flag gravity="disable"/> | ||
| </option> | ||
| <worldbody> | ||
| {body_xml}{close_xml} </worldbody> |
There was a problem hiding this comment.
can we update this xml string to include the actual strings that are generated above?
There was a problem hiding this comment.
Done.
It's not pretty though^^
There was a problem hiding this comment.
@erikfrey what do we think about the this, do we prefer having the entire xml specified in the string or generating parts of the xml string with code?
There was a problem hiding this comment.
hahaha, okay @Kenny-Vilella yeah that's not pretty :-)
here's my order of preference stack ranked from worst to best:
- giant xml with deep indent embedded in
derivative_test.py - programmatic string manipulation to create model in
derivative_test.py - xml with deep indent saved
mujoco_warp/test_data - programmatic model manipulation using mjspec in
derivative_test.py - find a smaller XML that tests the same thing (e.g. skip the 35 dofs, just manually set
jacobian="sparse"in xml)
|
@Kenny-Vilella mujoco actuators are independent. nice speedups with this pr! |
…ze_qderiv_actuator_passive_actuation
|
|
||
| for i in range(rownnz): | ||
| rowadri = rowadr + i | ||
| dofi = moment_colind_in[worldid, rowadri] |
There was a problem hiding this comment.
should we consider moving this moment_colind_in read after the continue?
|
|
||
| for j in range(i + 1): | ||
| rowadrj = rowadr + j | ||
| dofj = moment_colind_in[worldid, rowadrj] |
There was a problem hiding this comment.
should we consider moving this moment_colind_in read after the continue?
| <flag gravity="disable"/> | ||
| </option> | ||
| <worldbody> | ||
| <body pos="0.1 0 0"> |
There was a problem hiding this comment.
i think this should be indented in the worldbody scope
There was a problem hiding this comment.
I think, it should be fine now?
|
left a few comments, otherwise lgtm. thanks! |
…okup table (#1334) * Fix _qderiv_actuator_passive_actuation_sparse row indexing into qM_fullm The kernel introduced in #1243 searches qM_fullm for the (row, col) elemid using m.M_rownnz / m.M_rowadr. Those arrays describe MuJoCo's compact mass matrix sparsity, where joints whose internal block is treated as diagonal-only by mj_factorM (e.g. free joints) contribute one entry per internal dof. qM_fullm_i / qM_fullm_j, however, are built by walking dof_parentid and include the full chained internal block, so the two layouts diverge whenever a joint with diagonal-only compact storage precedes any actuated dof in qvel order. In that case the kernel reads the wrong slice of qMj, the inner `qMj[row_startk] == col` check never fires, and the actuator's contribution to qDeriv is silently dropped. Downstream factor_solve_i then sees (M - dt*qDeriv) without the actuator damping for the affected dof, and the implicit step diverges. The bug only surfaces with this specific topology (free joint at qvel start + actuated dof after it), which is why the existing serial-chain unit test (PR #1243's test_smooth_vel_sparse_tendon_coupled, no free joint) does not catch it. Repro: a single free body followed by an actuated hinge; with the buggy indexing, mjwarp's deriv_smooth_vel diagonal entry for the hinge is qM only (0.001), while MuJoCo's reference is qM + dt*(kp+kv) (0.003). Fix: build chain-aware row offsets qM_fullm_rownnz / qM_fullm_rowadr alongside qM_fullm_i / qM_fullm_j in io.py and pass them to the kernel instead of the compact M_rownnz / M_rowadr. Adds test_smooth_vel_sparse_free_joint_precedes_actuator covering the minimum reproducer. * Address review: drop unused body name and is_sparse assert Per @thowell's PR feedback (#1334), remove the redundant body name attribute and the is_sparse assertion in the new regression test. * Switch sparse qDeriv lookup to qM_fullm_elemid (Kenny's approach) Replace the chain-aware (qM_fullm_rownnz, qM_fullm_rowadr) row offsets plus linear search through qMj with a dense (nv x nv) qM_fullm_elemid lookup table built in io.py. The kernel does an O(1) reverse lookup instead of walking the row to find the matching column. For typical robot models (humanoid, three_humanoids) kernel time is unchanged within trial noise. For deep chains with high-rownnz actuators (e.g. tendons spanning a long serial chain) the kernel asymptotically drops from O(depth^3) to O(depth^2) per actuator thread; a 100-link chain with a tendon actuator runs ~8x faster end-to-end on a 5090. Memory cost: nv^2 * 4 bytes (40 KiB at nv=100, 1 MiB at nv=500).
Optimized the qderiv_actuator_passive_actuation kernel in the sparse case.
It was taking around 10% of the time in the g1 environment in newton, it is now almost negligible.
Some benchmark numbers from Newton using menagerie asset:
For Mjwarp benchmarks, I do not see any measurable perf difference.