Reduce the overhead introduced by converting Newton contacts to MjWarp#2393
Conversation
…g instead of linear scaling (w.r.t. number of worlds)
…in scenarios where contacts are only update once every N substeps
📝 WalkthroughWalkthroughReplaces a global shape-pair upper bound with per-world counts, consolidates geometry preparation into AABB computation, and adds a contact-generation driven fast path for MuJoCo contact conversion that preserves contact mappings and avoids full recomputation when contacts are unchanged. Changes
Sequence Diagram(s)sequenceDiagram
participant Contacts
participant ConversionKernel as Conversion Kernel
participant Solver
participant MJWarp as MJWarp Data
Contacts->>Contacts: clear() -> increment contact_generation
Contacts->>ConversionKernel: call convert with contact_generation
ConversionKernel->>ConversionKernel: compare contact_generation vs last_contact_generation
alt Generation Changed (Full path)
ConversionKernel->>ConversionKernel: full per-contact validation & compute
ConversionKernel->>ConversionKernel: build tid → cid mapping
ConversionKernel->>MJWarp: write all contact fields (counts, pos, dist, efc addresses)
else Generation Unchanged (Fast path)
ConversionKernel->>ConversionKernel: use tid→cid mapping
ConversionKernel->>MJWarp: restore nacon from last_nacon_count
ConversionKernel->>MJWarp: update distance and position fields only
end
Solver->>ConversionKernel: snapshot nacon & generation (_snapshot_nacon_count)
ConversionKernel->>Solver: store last_nacon_count and last_contact_generation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
newton/examples/robot/example_robot_anymal_d.py (1)
75-78: Only buildCollisionPipelineon the non-MuJoCo-contact path.When
use_mujoco_contacts=True, this pipeline is never used, but it still allocates the NXN broad-phase/contact buffers during startup. With the new 4096-world default, that is a pretty expensive no-op for the benchmarked path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/examples/robot/example_robot_anymal_d.py` around lines 75 - 78, The code always constructs self.collision_pipeline = newton.CollisionPipeline(self.model, broad_phase="nxn") which needlessly allocates NXN buffers even when use_mujoco_contacts=True; change the initialization to only build and assign self.collision_pipeline when use_mujoco_contacts is False (i.e., guard the creation with an if not use_mujoco_contacts or similar flag), leaving self.collision_pipeline unset or None on the MuJoCo-contact path to avoid the expensive NXN allocation during startup.newton/_src/solvers/kamino/_src/geometry/unified.py (1)
468-484: Share this pair-count calculation with the main collision helper.This is now a second copy of the per-world/global-shape sizing algorithm that already exists in
newton/_src/sim/collide.py. If that helper changes again, Kamino can silently drift. I'd strongly prefer extracting a shared utility or reusing the same implementation here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/solvers/kamino/_src/geometry/unified.py` around lines 468 - 484, The per-world/global pair-count logic duplicated here (the block that sets self._max_shape_pairs using self._model.geoms.wid and global_count) should be replaced by a shared helper: move the sizing algorithm into a single utility (e.g., compute_max_shape_pairs) in the existing collision helper module (newton/_src/sim/collide.py) or import and call the already-existing function there, then call that helper from unified.py to set self._max_shape_pairs instead of recomputing; reference the current use of self._model.geoms.wid, _num_geoms and the attribute self._max_shape_pairs when wiring the helper so behavior is identical.newton/tests/test_collision_pipeline.py (1)
800-912: Add one mixed-shape_flagscase here.These scaling tests cover world/global layouts well, but
_compute_per_world_shape_pairs_max()also drops shapes withoutShapeFlags.COLLIDE_SHAPES. A tiny case with a few disabled shapes would keep that part of the bound from regressing untested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/tests/test_collision_pipeline.py` around lines 800 - 912, Add a new unit test in TestShapePairsMaxScaling that verifies `_compute_per_world_shape_pairs_max` correctly ignores shapes without `ShapeFlags.COLLIDE_SHAPES`: use the existing helper `_make_model` (or replicate its behavior) to build a small model with multiple worlds where some shapes have `shape_flags_value=0` (or the non-collide value) and others have `ShapeFlags.COLLIDE_SHAPES`, then call `_compute_per_world_shape_pairs_max(model)` and assert the result equals the expected per-world pair count computed only from the enabled shapes (e.g., reduce shapes-per-world by the disabled count when computing expected pairs). Reference: TestShapePairsMaxScaling._make_model, ShapeFlags, and _compute_per_world_shape_pairs_max.newton/_src/solvers/mujoco/solver_mujoco.py (1)
2947-2949: Use concrete Warp array annotations for the new cache fields.These buffers are all 1-D
int32arrays, sowp.array[wp.int32] | Noneis a better fit than barewp.array | None.As per coding guidelines, "Annotate Warp arrays with bracket syntax (
wp.array[wp.vec3],wp.array2d[float],wp.array[Any]), not the parenthesized form. Usewp.array[X]for 1-D arrays, notwp.array1d[X]."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/solvers/mujoco/solver_mujoco.py` around lines 2947 - 2949, The three cache fields _contact_tid_to_cid, _last_contact_generation, and _last_nacon_count are currently typed as the generic wp.array | None; change their annotations to concrete 1-D int32 Warp arrays by using the bracket syntax wp.array[wp.int32] | None for each field (e.g., update the annotations on the class attributes _contact_tid_to_cid, _last_contact_generation, and _last_nacon_count to wp.array[wp.int32] | None) to follow the project guideline of using wp.array[X] for 1-D arrays.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@newton/_src/sim/contacts.py`:
- Around line 227-228: The current bump in the collision regeneration path
(wp.launch(_increment_contact_generation, ...)) makes contact_generation change
on every clear(), breaking the “unchanged contacts” fast-path; remove that
unconditional launch from where contacts are regenerated and instead increment
contact_generation only when the contact set actually changes — e.g., call
_increment_contact_generation (or otherwise bump contact_generation) inside the
Contacts mutation points such as add_contact/remove_contact/finalize_contacts
(or any method that truly alters the contact list), and ensure clear() does not
itself increment the generation so identical regenerated contacts keep the same
generation value.
In `@newton/_src/solvers/mujoco/solver_mujoco.py`:
- Around line 3078-3092: The solver fast-path snapshot can be reused across
different Contacts instances or after notify_model_changed(), so
invalidate/reset the snapshot whenever its inputs change: detect when the active
Contacts object changes or when notify_model_changed() updates shape-derived
contact constants and then reset the stored snapshot state (e.g. set
self._last_contact_generation to an impossible/sentinel value or bump a stored
snapshot version/id) before launching _snapshot_nacon_count; update the code
paths that swap Contacts (and notify_model_changed) to clear/reset the
solver-global snapshot state so the fast path cannot reuse stale tid_to_cid /
cached contact fields.
- Around line 3008-3013: The lazy-allocation of _contact_tid_to_cid assumes
launch_dim never grows; update the allocation logic in step() to also detect
when the existing _contact_tid_to_cid length is smaller than the current
launch_dim and reallocate it (and the associated _last_contact_generation and
_last_nacon_count buffers if needed) to the new launch_dim/device using
wp.full/wp.zeros so the kernel never indexes past the end of the buffer;
reference the variables _contact_tid_to_cid, _last_contact_generation,
_last_nacon_count and the parameter launch_dim when adding the
size-check-and-reallocate branch.
---
Nitpick comments:
In `@newton/_src/solvers/kamino/_src/geometry/unified.py`:
- Around line 468-484: The per-world/global pair-count logic duplicated here
(the block that sets self._max_shape_pairs using self._model.geoms.wid and
global_count) should be replaced by a shared helper: move the sizing algorithm
into a single utility (e.g., compute_max_shape_pairs) in the existing collision
helper module (newton/_src/sim/collide.py) or import and call the
already-existing function there, then call that helper from unified.py to set
self._max_shape_pairs instead of recomputing; reference the current use of
self._model.geoms.wid, _num_geoms and the attribute self._max_shape_pairs when
wiring the helper so behavior is identical.
In `@newton/_src/solvers/mujoco/solver_mujoco.py`:
- Around line 2947-2949: The three cache fields _contact_tid_to_cid,
_last_contact_generation, and _last_nacon_count are currently typed as the
generic wp.array | None; change their annotations to concrete 1-D int32 Warp
arrays by using the bracket syntax wp.array[wp.int32] | None for each field
(e.g., update the annotations on the class attributes _contact_tid_to_cid,
_last_contact_generation, and _last_nacon_count to wp.array[wp.int32] | None) to
follow the project guideline of using wp.array[X] for 1-D arrays.
In `@newton/examples/robot/example_robot_anymal_d.py`:
- Around line 75-78: The code always constructs self.collision_pipeline =
newton.CollisionPipeline(self.model, broad_phase="nxn") which needlessly
allocates NXN buffers even when use_mujoco_contacts=True; change the
initialization to only build and assign self.collision_pipeline when
use_mujoco_contacts is False (i.e., guard the creation with an if not
use_mujoco_contacts or similar flag), leaving self.collision_pipeline unset or
None on the MuJoCo-contact path to avoid the expensive NXN allocation during
startup.
In `@newton/tests/test_collision_pipeline.py`:
- Around line 800-912: Add a new unit test in TestShapePairsMaxScaling that
verifies `_compute_per_world_shape_pairs_max` correctly ignores shapes without
`ShapeFlags.COLLIDE_SHAPES`: use the existing helper `_make_model` (or replicate
its behavior) to build a small model with multiple worlds where some shapes have
`shape_flags_value=0` (or the non-collide value) and others have
`ShapeFlags.COLLIDE_SHAPES`, then call
`_compute_per_world_shape_pairs_max(model)` and assert the result equals the
expected per-world pair count computed only from the enabled shapes (e.g.,
reduce shapes-per-world by the disabled count when computing expected pairs).
Reference: TestShapePairsMaxScaling._make_model, ShapeFlags, and
_compute_per_world_shape_pairs_max.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: b0cac0f3-41e7-4ea2-9ff2-33d9e96b95bb
📒 Files selected for processing (8)
CHANGELOG.mdnewton/_src/sim/collide.pynewton/_src/sim/contacts.pynewton/_src/solvers/kamino/_src/geometry/unified.pynewton/_src/solvers/mujoco/kernels.pynewton/_src/solvers/mujoco/solver_mujoco.pynewton/examples/robot/example_robot_anymal_d.pynewton/tests/test_collision_pipeline.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
newton/_src/sim/contacts.py (1)
104-108: Redundantdevice=parameter insideScopedDeviceblock.The
contact_generationallocation specifiesdevice=device, but this is redundant since it's already inside awith wp.ScopedDevice(device):block (line 97). Other allocations in this block (e.g., lines 100, 111–142) don't specify the device parameter.♻️ Suggested fix
- self.contact_generation = wp.zeros(1, dtype=wp.int32, device=device) + self.contact_generation = wp.zeros(1, dtype=wp.int32)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/sim/contacts.py` around lines 104 - 108, Inside the wp.ScopedDevice(device) block remove the redundant device=device argument from the wp.zeros call that creates self.contact_generation; locate the allocation of self.contact_generation (the wp.zeros(1, dtype=wp.int32, device=device) call) and change it to match the other allocations in the block by calling wp.zeros with only the shape and dtype (e.g., wp.zeros(1, dtype=wp.int32)), leaving ScopedDevice to handle the device selection.newton/_src/solvers/mujoco/kernels.py (1)
463-471: Minor type inconsistency in array annotations.The
naconparameter useswp.array[int]while other parameters usewp.array[wp.int32]. This is functional since Pythonintmaps to a Warp integer type, but inconsistent with the other parameters in the same kernel.♻️ Suggested fix for consistency
`@wp.kernel` def _snapshot_nacon_count( - nacon: wp.array[int], + nacon: wp.array[wp.int32], last_nacon_count: wp.array[wp.int32], contact_generation: wp.array[wp.int32], last_contact_generation: wp.array[wp.int32], ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/solvers/mujoco/kernels.py` around lines 463 - 471, The kernel _snapshot_nacon_count uses inconsistent typing for the nacon parameter (wp.array[int]) compared to the other arrays; update the annotation for nacon to wp.array[wp.int32] so all array parameters (nacon, last_nacon_count, contact_generation, last_contact_generation) consistently use wp.int32, keeping the kernel signature in _snapshot_nacon_count uniform with the rest of the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@newton/_src/solvers/mujoco/solver_mujoco.py`:
- Around line 3019-3021: The cache invalidation currently resets
self._last_contact_generation (and where applicable self._last_nacon_count) to 0
which can collide with a fresh Contacts buffer at generation 0; change the
sentinel to an impossible value (e.g. -1) so the fast-path is always bypassed on
first use. Locate references to self._last_contact_generation and
self._last_nacon_count in solver_mujoco.py (the reset at the block with "if
self._last_contact_generation is not None:" and the other reset around lines
3042-3047) and set the generation sentinel to -1 instead of zero, ensuring any
logic that compares against contacts.contact_generation treats -1 as
invalidated.
---
Nitpick comments:
In `@newton/_src/sim/contacts.py`:
- Around line 104-108: Inside the wp.ScopedDevice(device) block remove the
redundant device=device argument from the wp.zeros call that creates
self.contact_generation; locate the allocation of self.contact_generation (the
wp.zeros(1, dtype=wp.int32, device=device) call) and change it to match the
other allocations in the block by calling wp.zeros with only the shape and dtype
(e.g., wp.zeros(1, dtype=wp.int32)), leaving ScopedDevice to handle the device
selection.
In `@newton/_src/solvers/mujoco/kernels.py`:
- Around line 463-471: The kernel _snapshot_nacon_count uses inconsistent typing
for the nacon parameter (wp.array[int]) compared to the other arrays; update the
annotation for nacon to wp.array[wp.int32] so all array parameters (nacon,
last_nacon_count, contact_generation, last_contact_generation) consistently use
wp.int32, keeping the kernel signature in _snapshot_nacon_count uniform with the
rest of the codebase.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 1304379b-a8ea-41a6-ac64-e7aa8f4bc9ee
📒 Files selected for processing (3)
newton/_src/sim/contacts.pynewton/_src/solvers/mujoco/kernels.pynewton/_src/solvers/mujoco/solver_mujoco.py
adenzler-nvidia
left a comment
There was a problem hiding this comment.
Review Summary
Well-engineered performance PR — the three optimizations are independently sound and the profiling data is compelling. The generation-counter design correctly avoids host-device sync and is CUDA graph capture safe.
I found one correctness issue (fast-path invalidation gap for BODY_PROPERTIES), one robustness concern (zero sentinel), and several smaller suggestions. See inline comments.
| # | Severity | Issue |
|---|---|---|
| 1 | Important | Missing _invalidate_contact_fast_path() for BODY_PROPERTIES |
| 2 | Important | Sentinel value of 0 can false-match a fresh Contacts |
| 3 | Suggestion | clear() docstring says "1 kernel launch" but is now 2 |
| 4 | Suggestion | _increment_contact_generation missing enable_backward=False |
| 5 | Suggestion | _snapshot_nacon_count missing enable_backward=False |
| 6 | Suggestion | Removed comments from full path reduce readability |
| 7 | Suggestion | Document fast-path assumption about invariant contact normals |
| 8 | Nit | Type annotations on cache fields and nacon parameter |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
newton/_src/sim/contacts.py (1)
9-20: LGTM with one minor documentation nit.The code is correct, but the docstring at line 10 saying "the increment kernel skips it" is slightly imprecise. The kernel never produces
-1because it wraps from2147483647to0—it doesn't actively "skip" the sentinel value; it simply never reaches it through normal operation.Consider:
GENERATION_SENTINEL = -1 -"""Value reserved as an impossible generation; the increment kernel skips it.""" +"""Value reserved as an invalid generation; the increment kernel wraps from MAX_INT to 0, never producing this value."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/sim/contacts.py` around lines 9 - 20, The docstring for GENERATION_SENTINEL is slightly misleading: update the comment to accurately state that _increment_contact_generation never produces -1 (GENERATION_SENTINEL) because it wraps from 2147483647 to 0 rather than actively skipping that value; mention the sentinel is an impossible generation value reserved for special use and that the kernel's logic (in _increment_contact_generation) wraps the int32 range so -1 is never generated during normal increments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@newton/_src/solvers/mujoco/solver_mujoco.py`:
- Around line 3029-3032: The launch_dim calculation wrongly caps threads with
self.mjw_data.naconmax (which limits compacted cid values) and thus can drop
valid candidate tids; change the grid sizing to launch over candidate contacts
(use contacts.rigid_contact_max or the appropriate candidate count variable) and
let the kernel itself filter by cid >= self.mjw_data.naconmax, i.e., remove the
min(..., self.mjw_data.naconmax) clamp when computing launch_dim (also apply the
same change to the other occurrence around the 3055–3057 block).
---
Nitpick comments:
In `@newton/_src/sim/contacts.py`:
- Around line 9-20: The docstring for GENERATION_SENTINEL is slightly
misleading: update the comment to accurately state that
_increment_contact_generation never produces -1 (GENERATION_SENTINEL) because it
wraps from 2147483647 to 0 rather than actively skipping that value; mention the
sentinel is an impossible generation value reserved for special use and that the
kernel's logic (in _increment_contact_generation) wraps the int32 range so -1 is
never generated during normal increments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: acd47c94-eb66-4993-bfe9-0bb4d551f264
📒 Files selected for processing (3)
newton/_src/sim/contacts.pynewton/_src/solvers/mujoco/kernels.pynewton/_src/solvers/mujoco/solver_mujoco.py
adenzler-nvidia
left a comment
There was a problem hiding this comment.
All review comments have been addressed. LGTM.
The pre-release audit found user-facing commits that landed since v1.1.0 without an [Unreleased] entry. Backfill five of them here. Kamino-related commits (newton-physics#2280, newton-physics#2517, newton-physics#2554, newton-physics#2575) are intentionally omitted: Kamino is still alpha and not currently tracked in CHANGELOG. - Changed: contact-conversion overhead reduction in SolverMuJoCo ([newton-physicsGH-2393](newton-physics#2393)) - Fixed: target_voxel_size ignored on the texture-SDF path ([newton-physicsGH-2464](newton-physics#2464)) - Fixed: Newton-to-mujoco-warp material-combination mismatch ([newton-physicsGH-2439](newton-physics#2439)) - Fixed: eq_objtype mismatch on joint equality / mimic constraints in SolverMuJoCo ([newton-physicsGH-2459](newton-physics#2459)) - Fixed: _tiled_sum_kernel launch-dim handling under warp-lang 1.13 ([newton-physicsGH-2546](newton-physics#2546))
Description
Optimize the collision detection and contact conversion pipeline. Three independent improvements that together reduce collision-related GPU time by ~6x on the
example_robot_anymal_dbenchmark (4096 worlds):Fast-path contact conversion:
convert_newton_contacts_to_mjwarp_kernelnow detects when the contact set hasn't changed since the last collision detection (via a device-side generation counter onContacts). Subsequent substeps skip all constant-field computation (frame, friction, solref, solimp, condim, geom, worldid) and only recompute body-q-dependent fields (dist, pos) and reset efc_address. A separate_snapshot_nacon_countkernel runs after the main kernel to safely persist the compacted contact count and generation for the next substep, avoiding cross-block races.Fuse AABB + geom-data kernels:
compute_shape_aabbsandprepare_geom_data_kernelboth computed the same world transform (body_q * shape_transform) per shape. Fused into a single kernel that computes the transform once and writes both AABB bounds and narrow-phase geometry data. Saves one kernel launch and one redundant pass over transform arrays per collision step.Cap conversion kernel launch dimension: The kernel previously launched with
dim=rigid_contact_maxthreads, which can be orders of magnitude larger than the MJWarp contact capacity (naconmax). Now launches withmin(rigid_contact_max, naconmax), eliminating millions of wasted threads that could never produce valid output.Nsight Systems results on
example_robot_anymal_d(4096 worlds, elliptic cone):convert_newton_contacts_to_mjwarp_kernel: median 1.1 ms → 30.7 µs (36x), total 611 ms → 40 mscompute_shape_aabbs(fused): total 20.7 ms → 9.1 ms (2.3x)Checklist
CHANGELOG.mdhas been updated (if user-facing change)Test plan
All 185 mujoco solver tests and 79 collision pipeline tests pass. The
example_robot_anymal_dtest (which uses CUDA graph capture with 4096 worlds) passes on both CPU and CUDA.Summary by CodeRabbit
New Features
Performance Improvements