SolverMuJoCo: Benchmarks for solver initialization by adenzler-nvidia · Pull Request #733 · newton-physics/newton

adenzler-nvidia · 2025-09-04T11:56:01Z

completes #591 for alpha P0 envs. Once allegro hand is up, we can include here as well.

We should figure out whether we really need all the robots here or whether it's ok to have a subset.

Newton Migration Guide

Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this PR.

The migration guide in docs/migration.rst is up-to date

Before your PR is "Ready for review"

Necessary tests have been added and new examples are tested (see newton/tests/test_examples.py)
Documentation is up-to-date
Code passes formatting and linting checks with pre-commit run -a

Summary by CodeRabbit

New Features
- GPU-aware cached warm-up benchmarking with kernel preloading and per-robot/per-env parameterization; reports separate model-construction and solver-initialization medians (ms).
Refactor
- Unified multi-phase benchmarking flow with standardized parameter handling, first-run caching, and consistent invocation across benchmark stages.
Chores
- Runtime GPU availability gating, public exposure of solver/constraint configuration, and new public timing/parameter attributes for benchmarks.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai · 2025-09-04T11:56:08Z

📝 Walkthrough

Walkthrough

Adds benchmark caching and parameterized invocation: run_benchmark can call setup_cache once to return cached_data; params may be converted to dicts when param_names provided; benchmark calls route through a wrapper that injects cached_data; model benchmarks preload SolverMuJoCo kernels per (robot, num_envs) and record median ModelBuilder and SolverMuJoCo timings.

Changes

Cohort / File(s)	Summary
Benchmark caching & model timing `asv/benchmarks/setup/bench_model.py`	Added `setup_cache` (runtime CUDA guard) that enumerates `(robot, num_envs)` combos, finalizes models, instantiates `SolverMuJoCo` to preload kernels (using `nconmax`), synchronizes device, records median timings (model builder and solver) in a `timings` dict (ms), returns cached timings; added `track_time_initialize_model` and `track_time_initialize_solverMuJoJo`/`track_time_initialize_solverMuJoCo`; adjusted `peakmem_initialize_model_cpu` signature; added public `params`/`param_names` to `KpiInitializeModel` and `FastInitializeModel`; introduced global `nconmax`; exposed `SolverMuJoCo`; replaced decorator gating with runtime CUDA check.
Benchmark runner / param & caching helpers `newton/_src/utils/benchmark.py`	Added `_convert_params_to_dict` and `_call_with_params` to normalize parameter passing and inject `cached_data`; updated `run_benchmark` to treat multi-list params as Cartesian product when appropriate, call `setup_cache` once for the first combination to obtain `cached_data`, and invoke `setup`/`time_`/`track_`/`teardown` via the wrapper supporting positional or keyword invocation and cached_data injection.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant ASV as ASV Runner
  participant RUN as run_benchmark
  participant BEN as Benchmark (Kpi/Fast)
  participant CUDA as CUDA
  participant MB as ModelBuilder
  participant SMJ as SolverMuJoCo
  Note over RUN,BEN: param combination & cached setup

  ASV->>RUN: start benchmark(params)
  RUN->>BEN: compute param combos
  RUN->>BEN: call setup_cache()  -- first combo
  BEN->>CUDA: check availability
  alt CUDA unavailable
    BEN-->>RUN: raise NotImplementedError
  else CUDA available
    loop for each (robot,num_envs)
      BEN->>MB: build + finalize model
      BEN->>SMJ: instantiate solver (preload kernels)
      BEN->>CUDA: synchronize device
      BEN-->>BEN: record median timings -> timings[(robot,num_envs)]
      BEN->>BEN: cleanup
    end
    BEN-->>RUN: return cached timings
  end

  RUN->>BEN: _call_with_params(track_time_initialize_model, param_dict, params, cached_data)
  BEN-->>RUN: return model timing (ms)
  RUN->>BEN: _call_with_params(track_time_initialize_solverMuJoCo, ...)
  BEN-->>RUN: return solver timing (ms)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Headless Viewer and Benchmark Fixes #682 — Modifies benchmark runner logic; directly related to the new _convert/_call helpers and setup_cache invocation flow.
Fixes for pre-trained anymal-c example/benchmark #687 — Adjusts SolverMuJoCo construction/usage; likely impacts solver instantiation and kernel preloading in these benchmarks.
Switch MuJoCo examples and Benchmarks to Parallel Linesearch #738 — Changes SolverMuJoCo constructor/API (e.g., new flags); relevant to solver instantiation in benchmarks.

Suggested reviewers

eric-heiden
shi-eric

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly states that this PR adds benchmarks for solver initialization using the SolverMuJoCo class, which is a central part of the changeset; it is concise, focused, and free of unrelated noise, even though it does not mention the secondary model initialization integration explicitly.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (4)

asv/benchmarks/setup/bench_solver_mujoco.py (4)
45-49: Comment does not match code behavior.
The comment says “Load a small model,” but you instantiate the solver with self._model (which may be very large). Either update the comment or actually use a small model for warm-up.

Apply one of:

Update comment (preferred, since kernel specialization often depends on shape):
-        # Load a small model to cache the kernels
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
Or construct a tiny warm-up model (if acceptable).

16-16: Remove unused import.
gc is not used in this module.
-import gc
27-35: Optional: sanity-check large param sizes.
num_envs of 4096/8192 can OOM typical 16–24GB GPUs during solver construction. If CI runs these benches, consider adding an environment toggle or reducing defaults.

52-52: Optional: align method names with PEP 8 / other benches.
Consider time_initialize_solver_mujoco for consistency with time_initialize_model.

Also applies to: 79-79

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0e8ee91 and f7bbae4.

📒 Files selected for processing (2)

asv/benchmarks/setup/bench_model.py (1 hunks)
asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_solver_mujoco.py (2)

asv/benchmarks/setup/bench_model.py (2)

setup (36-37)

setup_cache (57-61)

newton/examples/example_mujoco.py (1)

create_model_builder (296-325)

🪛 Ruff (0.12.2)

asv/benchmarks/setup/bench_solver_mujoco.py

53-53: Local variable solver is assigned to but never used

Remove assignment to unused variable solver

(F841)

80-80: Local variable solver is assigned to but never used

Remove assignment to unused variable solver

(F841)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)

🔇 Additional comments (1)

asv/benchmarks/setup/bench_model.py (1)

81-81: No-op whitespace change; OK to keep.
No functional impact. Safe to merge as-is.

coderabbitai · 2025-09-04T12:14:52Z

+    def setup(self, robot, num_envs):
+        wp.init()
+
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+
+        # finalize model
+        self._model = builder.finalize()
+
+        # Load a small model to cache the kernels
+        solver = SolverMuJoCo(self._model)
+        del solver
+
+        wp.synchronize_device()
+


🛠️ Refactor suggestion

⚠️ Potential issue

Guard setup on CPU-only runners and finalize on CUDA explicitly.
Without a guard, setup will still instantiate the solver on CPU-only environments even though the timing method is skipped, likely breaking CI runners without GPUs. Also, make the model device explicit to avoid accidental CPU models skewing solver init times.

Apply:

def setup(self, robot, num_envs): wp.init() - - builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) + if wp.get_cuda_device_count() == 0: + # CPU-only: time method is skipped; avoid creating GPU-dependent resources. + return + + builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model - self._model = builder.finalize() + self._model = builder.finalize(device="cuda:0") - # Load a small model to cache the kernels - solver = SolverMuJoCo(self._model) - del solver + # Warm up: instantiate once to JIT-compile kernels for this model shape + solver = SolverMuJoCo(self._model) + del solver wp.synchronize_device()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def setup(self, robot, num_envs):

wp.init()

builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

# finalize model

self._model = builder.finalize()

# Load a small model to cache the kernels

solver = SolverMuJoCo(self._model)

del solver

wp.synchronize_device()

def setup(self, robot, num_envs):

wp.init()

if wp.get_cuda_device_count() == 0:

# CPU-only: time method is skipped; avoid creating GPU-dependent resources.

return

builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

# finalize model

self._model = builder.finalize(device="cuda:0")

# Warm up: instantiate once to JIT-compile kernels for this model shape

solver = SolverMuJoCo(self._model)

del solver

wp.synchronize_device()

🤖 Prompt for AI Agents

In asv/benchmarks/setup/bench_solver_mujoco.py around lines 37-50, guard the setup so it does nothing on CPU-only runners and ensure the model is finalized on CUDA before creating the solver: check for a CUDA device at the start of setup (e.g., if wp.get_device_count("cuda") == 0: return), then call builder.finalize specifying the CUDA device (e.g., finalize(device="cuda:0") or finalize(device=wp.Device("cuda", 0))) before instantiating SolverMuJoCo, so the solver creation only runs when CUDA is available and the model is explicitly on the GPU.

coderabbitai · 2025-09-04T12:14:53Z

+    @skip_benchmark_if(wp.get_cuda_device_count() == 0)
+    def time_initialize_solverMuJoCo(self, robot, num_envs):
+        solver = SolverMuJoCo(self._model)
+        wp.synchronize_device()
+


🛠️ Refactor suggestion

Fix Ruff F841 and ensure deterministic lifetime in the timed section.
Variable solver is unused; switch to _ and explicitly delete after sync.

Apply:

- solver = SolverMuJoCo(self._model) - wp.synchronize_device() + _ = SolverMuJoCo(self._model) + wp.synchronize_device() + del _

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@skip_benchmark_if(wp.get_cuda_device_count() == 0)

def time_initialize_solverMuJoCo(self, robot, num_envs):

solver = SolverMuJoCo(self._model)

wp.synchronize_device()

@skip_benchmark_if(wp.get_cuda_device_count() == 0)

def time_initialize_solverMuJoCo(self, robot, num_envs):

_ = SolverMuJoCo(self._model)

wp.synchronize_device()

del _

🧰 Tools

🪛 Ruff (0.12.2)

53-53: Local variable solver is assigned to but never used

Remove assignment to unused variable solver

(F841)

🤖 Prompt for AI Agents

In asv/benchmarks/setup/bench_solver_mujoco.py around lines 51 to 55, the timed section assigns SolverMuJoCo to an unused variable (causing Ruff F841) and doesn't guarantee deterministic cleanup; replace the unused name with an underscore (e.g., _ = SolverMuJoCo(self._model)), call wp.synchronize_device(), then explicitly delete the instance with del _ immediately after sync to ensure deterministic lifetime and satisfy the linter.

coderabbitai · 2025-09-04T12:14:53Z

+    def setup_cache(self, robot, num_envs):
+        wp.init()
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+        # finalize model
+        self._model = builder.finalize()
+
+        # Load a small model to cache the kernels
+        solver = SolverMuJoCo(self._model)
+        del solver
+
+        wp.synchronize_device()
+


🛠️ Refactor suggestion

Mirror the CPU guard and explicit CUDA device in setup_cache.
Avoid GPU work on CPU-only runners and make device explicit for consistency with KPI class.

Apply:

def setup_cache(self, robot, num_envs): - wp.init() - builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) + wp.init() + if wp.get_cuda_device_count() == 0: + return + builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model - self._model = builder.finalize() + self._model = builder.finalize(device="cuda:0") - # Load a small model to cache the kernels - solver = SolverMuJoCo(self._model) - del solver + # Warm up: instantiate once to JIT-compile kernels for this model shape + solver = SolverMuJoCo(self._model) + del solver wp.synchronize_device()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def setup_cache(self, robot, num_envs):

wp.init()

builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

# finalize model

self._model = builder.finalize()

# Load a small model to cache the kernels

solver = SolverMuJoCo(self._model)

del solver

wp.synchronize_device()

def setup_cache(self, robot, num_envs):

wp.init()

if wp.get_cuda_device_count() == 0:

return

builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

# finalize model

self._model = builder.finalize(device="cuda:0")

# Warm up: instantiate once to JIT-compile kernels for this model shape

solver = SolverMuJoCo(self._model)

del solver

wp.synchronize_device()

🤖 Prompt for AI Agents

In asv/benchmarks/setup/bench_solver_mujoco.py around lines 66-77, setup_cache currently always initializes GPU work; add the same CPU guard and explicit CUDA device selection used in the KPI class: first check if the runtime is CPU-only and return early (e.g. if wp.get_device().is_cpu or wp.context.is_cpu then return), otherwise set the CUDA device explicitly (e.g. select "cuda:0" via the same wp API your KPI class uses) before calling wp.init(), build/finalize the model and create/delete SolverMuJoCo, then call wp.synchronize_device(); this avoids GPU calls on CPU-only runners and makes the device explicit and consistent.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

…/solver-creation-benchmarks

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

asv.conf.json (1)
22-22: SHA verified; packaging metadata present in pyproject.toml
Optional:

Add --no-deps to your pip install line to lock transitive dependencies:
-    "python -m pip install git+https://github.com/google-deepmind/mujoco_warp@64bb938f87b6aaf840753ef9ddc198db655020c0",
+    "python -m pip install --no-deps git+https://github.com/google-deepmind/mujoco_warp@64bb938f87b6aaf840753ef9ddc198db655020c0",

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f7bbae4 and b8d835a.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

asv.conf.json (1 hunks)
asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

asv/benchmarks/setup/bench_solver_mujoco.py

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (5)

asv/benchmarks/setup/bench_solver_mujoco.py (5)

52-56: Fix Ruff F841 in timed section and ensure deterministic teardown.

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def time_initialize_solverMuJoCo(self, robot, num_envs):
-    _solver = SolverMuJoCo(self._model)
-    wp.synchronize_device()
+    _ = SolverMuJoCo(self._model)
+    wp.synchronize_device()
+    del _

73-78: Warm-up: same deterministic lifetime pattern as above.

-# Load a small model to cache the kernels
-solver = SolverMuJoCo(self._model)
-del solver
-
-wp.synchronize_device()
+# Warm up: instantiate once to JIT-compile kernels for this model shape
+_ = SolverMuJoCo(self._model)
+wp.synchronize_device()
+del _

79-83: Fix Ruff F841 in fast timed section and delete after sync.

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def time_initialize_solverMuJoCo(self, robot, num_envs):
-    _solver = SolverMuJoCo(self._model)
-    wp.synchronize_device()
+    _ = SolverMuJoCo(self._model)
+    wp.synchronize_device()
+    del _

38-45: Guard setup on CPU-only and finalize model on CUDA explicitly.

Without a guard, setup still creates GPU-dependent resources on CPU-only runners; also make the model device explicit to avoid accidental CPU placement skewing results.

 def setup(self, robot, num_envs):
-    wp.init()
-
-    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+    wp.init()
+    if wp.get_cuda_device_count() == 0:
+        # CPU-only: time method is skipped; avoid creating GPU-dependent resources.
+        return
+    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
 
     # finalize model
-    self._model = builder.finalize()
+    self._model = builder.finalize(device="cuda:0")

67-72: Mirror the CPU guard and explicit CUDA device in the fast benchmark setup.

 def setup(self, robot, num_envs):
-    wp.init()
-    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+    wp.init()
+    if wp.get_cuda_device_count() == 0:
+        return
+    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
     # finalize model
-    self._model = builder.finalize()
+    self._model = builder.finalize(device="cuda:0")

🧹 Nitpick comments (3)

asv/benchmarks/setup/bench_solver_mujoco.py (3)
46-51: Warm-up: use throwaway name, sync, then delete for deterministic lifetime.

Also fix the stale comment (“small model”) to reflect actual behavior.
-# Load a small model to cache the kernels
-solver = SolverMuJoCo(self._model)
-del solver
-
-wp.synchronize_device()
+# Warm up: instantiate once to JIT-compile kernels for this model shape
+_ = SolverMuJoCo(self._model)
+wp.synchronize_device()
+del _
62-66: Consider setting a timeout for symmetry with KPI class.

Optional, but keeps benchmark configs consistent.
 rounds = 1
 number = 1
 repeat = 3
 min_run_count = 1
+timeout = 3600
90-93: Rename CLI keys for clarity (they benchmark SolverMuJoCo, not “Model”).

Purely cosmetic but reduces confusion when invoking via --bench.
 benchmark_list = {
-    "KpiInitializeModel": KpiInitializeSolverMuJoCo,
-    "FastInitializeModel": FastInitializeSolverMuJoCo,
+    "KpiInitializeSolverMuJoCo": KpiInitializeSolverMuJoCo,
+    "FastInitializeSolverMuJoCo": FastInitializeSolverMuJoCo,
 }

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b8d835a and bbe1a78.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_solver_mujoco.py (3)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1094-2565)

asv/benchmarks/setup/bench_model.py (1)

setup (38-39)

newton/examples/example_mujoco.py (1)

create_model_builder (291-320)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)

🔇 Additional comments (1)

asv/benchmarks/setup/bench_solver_mujoco.py (1)

29-30: Validate feasibility of large env counts across robots.

8192 envs for humanoid/quadruped may OOM on common GPUs; confirm target hardware or downscale for CI. If needed, we can split heavy robots into a separate KPI suite.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

eric-heiden

Thanks, LGTM! Just a small nit on a comment string.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (5)

asv/benchmarks/setup/bench_solver_mujoco.py (5)

61-65: Free the solver deterministically after timing.

Helps keep GPU memory steady across repeats.

     def time_initialize_solverMuJoCo(self, robot, num_envs):
-        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        wp.synchronize_device()
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _

88-91: Deterministic cleanup in fast timing too.

     def time_initialize_solverMuJoCo(self, robot, num_envs):
-        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        wp.synchronize_device()
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _

68-69: “Fast” still feels large; confirm intent or downscale.

128–256 envs for humanoid/ant/quadruped may still be heavy. Either rename to “Medium” or drop to, e.g., [64, 128].

47-60: Guard setup on CPU-only runners and finalize on CUDA explicitly.

Without a guard, setup will still instantiate the solver on CPU-only CI and likely fail. Also, finalize the model on a CUDA device to avoid skewing timings with CPU models. Replace the warm-up variable and order to avoid Ruff F841 and ensure deterministic lifetime after a device sync.

     def setup(self, robot, num_envs):
         wp.init()
-
-        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+        if wp.get_cuda_device_count() == 0:
+            # CPU-only: timing is skipped; avoid creating GPU-dependent resources.
+            return
+
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
 
         # finalize model
-        self._model = builder.finalize()
+        self._model = builder.finalize(device="cuda:0")
 
-        # Load the model to cache the kernels
-        solver = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        del solver
-
-        wp.synchronize_device()
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _

76-87: Mirror the CPU guard and explicit CUDA device in the fast benchmark setup.

Same failure mode as above; also fix the warm-up variable and ordering.

     def setup(self, robot, num_envs):
         wp.init()
-        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+        if wp.get_cuda_device_count() == 0:
+            return
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
         # finalize model
-        self._model = builder.finalize()
+        self._model = builder.finalize(device="cuda:0")
 
-        # Load the model to cache the kernels
-        solver = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        del solver
-
-        wp.synchronize_device()
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _

🧹 Nitpick comments (4)

asv/benchmarks/setup/bench_solver_mujoco.py (4)
27-34: Revisit ncon_per_env for g1/h1 to avoid OOM at large env counts.

At 4096–8192 envs, 150 contacts/env yields extremely large allocations (nworld * ncon_per_env) and may OOM on 16–24GB GPUs. Consider scaling by env count, capping, or skipping large envs for these robots.

Would you like me to propose a helper (e.g., compute_ncon_per_env(robot, num_envs)) and wire it through setup/time for safer defaults?

19-21: Avoid global config side-effects at import time.

Setting wp.config globally on import can leak into other ASV modules in the same process. Move these into setup or guard with “if not already set”.
-wp.config.enable_backward = False
-wp.config.quiet = True
+wp.config.enable_backward = False  # consider moving into setup() to limit scope
+wp.config.quiet = True             # or gate with an env flag if needed
71-75: Add a timeout for the fast benchmark too.

Keeps runaway runs contained on constrained GPUs.
     rounds = 1
     number = 1
     repeat = 3
     min_run_count = 1
+    timeout = 1200
99-102: Rename CLI keys for clarity.

These benchmarks initialize the MuJoCo solver, not the model. Rename keys to match class names.
-        "KpiInitializeModel": KpiInitializeSolverMuJoCo,
-        "FastInitializeModel": FastInitializeSolverMuJoCo,
+        "KpiInitializeSolverMuJoCo": KpiInitializeSolverMuJoCo,
+        "FastInitializeSolverMuJoCo": FastInitializeSolverMuJoCo,

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bbe1a78 and b8f73b4.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_solver_mujoco.py (3)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1094-2565)

asv/benchmarks/setup/bench_model.py (1)

setup (38-39)

newton/examples/example_mujoco.py (1)

create_model_builder (291-320)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)

🔇 Additional comments (1)

asv/benchmarks/setup/bench_solver_mujoco.py (1)

47-54: builder.finalize() already supports an optional device parameter – its signature is finalize(self, device: Devicelike | None = None, requires_grad: bool = False), so no changes are needed.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

adenzler-nvidia · 2025-09-05T11:52:22Z

@eric-heiden I refactored the entire thing now to minimize benchmark runtime, integrating the solver setup benchmarks with the model initialization benchmarks.

I had to jump through a few hoops to make it work but I think it's worthwhile to keep the benchmark runtime low.

also tagging @shi-eric for visibility, mostly to make sure I'm not making an obvious mistake here..

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

asv/benchmarks/setup/bench_model.py (2)

127-137: Same warm-up release issue as KPI.

Delete _model/builder and GC after warm-up to avoid cumulative memory use.

Apply:

             solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
             del solver
+            del _model
+            del builder
+            gc.collect()
 
             wp.synchronize_device()

141-159: Same per-repeat cleanup improvements as KPI.

Delete builder post-finalize and GC after releasing solver/model.

Apply:

                 model = builder.finalize()
                 wp.synchronize_device()
                 modelBuilder_end = time.perf_counter()
                 timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg)
+                del builder
                 ...
                 del solver
                 del model
+                gc.collect()

🧹 Nitpick comments (4)

asv/benchmarks/setup/bench_model.py (4)
27-36: Validate and document ncon_per_env overrides.

Hard-coding 150 for g1/h1 may be too low/high depending on contact-rich scenes and large num_envs. Please confirm against worst-case contacts per env, and add a short comment with rationale or derive this once from a 1-env probe.

73-92: Tighten per-repeat cleanup to avoid transient pressure.

Delete builder after finalize and GC after releasing solver/model.

Apply:
                 model = builder.finalize()
                 wp.synchronize_device()
                 modelBuilder_end = time.perf_counter()
                 timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg)
+                del builder
 
                 solver_beg = time.perf_counter()
                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
                 timings_solver.append(solver_end - solver_beg)
 
                 del solver
                 del model
+                gc.collect()
49-95: Deduplicate the timing logic.

Both classes duplicate setup_cache/loop code. Factor into a small helper to reduce drift and ease maintenance.

Example (sketch):
def _compute_timings(robots, envs, repeats):
    ...  # shared body here
    return timings
Then call it from each setup_cache with the class-specific params.

Also applies to: 117-164

39-48: Guard against OOM for largest configs.

4096/8192 envs for humanoid/g1/h1 can exceed common GPU RAM. Consider a try/except around finalize/solver to skip combos gracefully (record None and short-circuit track_*), or downscale per-robot env counts.

Also applies to: 108-116

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b8f73b4 and e23cb57.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_model.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_model.py (2)

newton/examples/example_mujoco.py (2)

Example (185-320)

create_model_builder (291-320)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1094-2565)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)

🔇 Additional comments (5)

asv/benchmarks/setup/bench_model.py (5)

49-58: Good use of setup_cache and cached timings.

ASV will pass the cache to track_* and peakmem_*; this aligns with asv’s API and reduces run time.

92-95: Median in ms is appropriate for stability.

Looks good.

97-106: track_time accessors look correct.

Key lookup, units, and signatures align with asv expectations.

117-126: Fast path setup mirrors KPI path correctly.

CUDA gate and cache structure are consistent.

175-185: peakmem hook signature change is correct.

Absorbing the cache arg with “_” keeps asv compatible; CPU-scoped finalize is fine.

… setup_cache Signed-off-by: Alain Denzler <adenzler@nvidia.com>

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

newton/_src/utils/benchmark.py (4)

20-30: Add type hints and improve docstring for better maintainability.

The new helper function would benefit from type hints and a more comprehensive docstring to clarify its purpose and expected input/output types.

-def _convert_params_to_dict(params, param_names):
-    """Convert params to keyword arguments using param_names."""
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+def _convert_params_to_dict(
+    params: Union[Any, List, Tuple],
+    param_names: Optional[List[str]]
+) -> Optional[Dict[str, Any]]:
+    """Convert params to keyword arguments using param_names.
+    
+    Args:
+        params: Parameter values, can be a single value or tuple/list of values
+        param_names: List of parameter names to map to the values
+        
+    Returns:
+        Dictionary mapping parameter names to values, or None if no param_names
+    """

32-44: Add type hints and improve documentation for the parameter calling helper.

The function handles complex parameter passing logic that would benefit from explicit type annotations.

-def _call_with_params(method, param_dict, params, cached_data):
-    """Call a method with appropriate parameters (keyword or positional)."""
+def _call_with_params(
+    method: callable,
+    param_dict: Optional[Dict[str, Any]],
+    params: Tuple,
+    cached_data: Optional[Any]
+) -> Any:
+    """Call a method with appropriate parameters (keyword or positional).
+    
+    Args:
+        method: The method to call
+        param_dict: Keyword arguments if param_names was provided, None otherwise
+        params: Positional arguments to pass
+        cached_data: Cached data from setup_cache, passed as first positional arg
+        
+    Returns:
+        The result of the method call
+    """

74-75: Consider handling the enumerate index more explicitly.

While the variable i is used correctly, consider making its purpose clearer with a more descriptive name.

-    for i, params in enumerate(combinations):
+    for combination_index, params in enumerate(combinations):

Then update line 88:

-        if i == 0 and hasattr(benchmark_cls, "setup_cache"):
+        if combination_index == 0 and hasattr(benchmark_cls, "setup_cache"):

99-99: Consider improving parameter display for better readability.

When params is a tuple with complex objects, the print output might be hard to read. Consider formatting it more nicely.

-                print(f"\n[Benchmark] Running {benchmark_cls.__name__}.{attr} with parameters {params}")
+                param_str = ", ".join(str(p) for p in params) if isinstance(params, (list, tuple)) else str(params)
+                print(f"\n[Benchmark] Running {benchmark_cls.__name__}.{attr} with parameters ({param_str})")

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e23cb57 and 1c45bb3.

📒 Files selected for processing (1)

newton/_src/utils/benchmark.py (3 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

newton/_src/utils/benchmark.py

26-26: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)

29-29: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

63-63: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)

84-84: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)

85-85: for loop variable params overwritten by assignment target

(PLW2901)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)

🔇 Additional comments (4)

newton/_src/utils/benchmark.py (4)
61-66: LGTM! Clever handling of parameter combinations.

The logic correctly distinguishes between multiple parameter lists (requiring cartesian product) and a single parameter list, which improves flexibility for benchmark configurations.

88-91: Good implementation of the caching mechanism.

The setup_cache is correctly called only once at the beginning, which aligns with the PR objective of minimizing benchmark runtime by avoiding redundant initialization.

29-29: Add strict=True to zip() call for Python 3.10+ compatibility.

Using strict=True ensures that the lengths of param_names and params match, catching configuration errors early.
-    return dict(zip(param_names, params))
+    return dict(zip(param_names, params, strict=True))
Note: If you need to maintain compatibility with Python < 3.10, you can add a length check instead:
if len(param_names) != len(params):
    raise ValueError(f"Length mismatch: {len(param_names)} param_names vs {len(params)} params")
return dict(zip(param_names, params))
⛔ Skipped due to learnings
Learnt from: dylanturpin
PR: newton-physics/newton#635
File: newton/examples/example_anymal_c_walk_on_sand.py:316-316
Timestamp: 2025-08-25T20:21:18.407Z
Learning: PEP 618 introduced the strict parameter to zip() in Python 3.10 with strict=False as the default. Adding explicit strict=False to zip() calls maintains existing behavior while satisfying linters that require explicit strict parameter specification in Python 3.10+ codebases.
Learnt from: dylanturpin
PR: newton-physics/newton#635
File: newton/_src/utils/gizmo.py:483-487
Timestamp: 2025-08-25T20:20:59.822Z
Learning: Adding `strict=False` explicitly to zip() calls in Python version upgrade PRs is appropriate for maintaining existing behavior while satisfying linters that expect the parameter to be explicit in Python 3.10+. This follows PEP 618 where strict=False is the default, so explicit usage doesn't change runtime behavior.
88-91: Cached_data compatibility verified
setup_cache returns a timings dict and all track_time_* methods accept it as their first parameter; no changes required.

coderabbitai · 2025-09-05T12:55:36Z

+        if not isinstance(params, (list, tuple)):
+            params = (params,)
+


🛠️ Refactor suggestion

Variable reassignment could cause confusion.

The params variable is reassigned here, which overwrites the loop variable. This could lead to confusion about the variable's state.

- # Ensure params is always a tuple for consistent handling - if not isinstance(params, (list, tuple)): - params = (params,) + # Ensure params is always a tuple for consistent handling + params_tuple = params if isinstance(params, (list, tuple)) else (params,)

Then update the subsequent uses of params to use params_tuple where the tuple form is needed:

- _call_with_params(instance.setup, param_dict, params, cached_data) + _call_with_params(instance.setup, param_dict, params_tuple, cached_data)

And similarly for lines 105, 111, and 117.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not isinstance(params, (list, tuple)):

params = (params,)

# Ensure params is always a tuple for consistent handling

- if not isinstance(params, (list, tuple)):

params_tuple = params if isinstance(params, (list, tuple)) else (params,)

param_dict = dict(zip(_names, params))

_call_with_params(instance.setup, param_dict, params_tuple, cached_data)

🧰 Tools

🪛 Ruff (0.12.2)

84-84: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)

85-85: for loop variable params overwritten by assignment target

(PLW2901)

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

newton/_src/utils/benchmark.py (1)
82-85: Fix: params_tuple may be undefined when params is already a list/tuple

Define params_tuple in both branches to avoid UnboundLocalError on subsequent calls.
-        # Ensure params is always a tuple for consistent handling
-        if not isinstance(params, (list | tuple)):
-            params_tuple = (params,)
+        # Ensure params is always a tuple for consistent handling
+        params_tuple = tuple(params) if isinstance(params, (list | tuple)) else (params,)

🧹 Nitpick comments (3)

newton/_src/utils/benchmark.py (3)

20-30: Validate param_names/params length to prevent silent mismatches

Guard against dropped/extra parameters; avoid relying on zip(strict=False).

 def _convert_params_to_dict(params, param_names):
     """Convert params to keyword arguments using param_names."""
     if not param_names:
         return None
 
     # Handle single value case by wrapping in tuple
     if not isinstance(params, (list | tuple)):
         params = (params,)
 
-    return dict(zip(param_names, params, strict=False))
+    if len(param_names) != len(params):
+        raise ValueError(
+            f"param_names length ({len(param_names)}) != params length ({len(params)})"
+        )
+    return dict(zip(param_names, params))

87-90: Reduce noise: print setup_cache message only when printing results

Keeps output quiet for programmatic runs.

-            print(f"\n[Benchmark] Running {benchmark_cls.__name__}.setup_cache")
+            if print_results:
+                print(f"\n[Benchmark] Running {benchmark_cls.__name__}.setup_cache")

100-115: Gracefully skip NotImplemented benchmarks; avoid ZeroDivisionError

Catching NotImplementedError lets the runner continue; also skip averaging when there are no samples.

                 if attr.startswith("time_"):
                     # Run timing benchmarks multiple times and measure elapsed time.
                     for _ in range(number):
                         start = time.perf_counter()
-                        _call_with_params(method, param_dict, params_tuple, cached_data)
-                        t = time.perf_counter() - start
-                        samples.append(t)
+                        try:
+                            _call_with_params(method, param_dict, params_tuple, cached_data)
+                            t = time.perf_counter() - start
+                            samples.append(t)
+                        except NotImplementedError:
+                            samples.clear()
+                            break
                 elif attr.startswith("track_"):
                     # Run tracking benchmarks multiple times and record returned values.
                     for _ in range(number):
-                        val = _call_with_params(method, param_dict, params_tuple, cached_data)
-                        samples.append(val)
+                        try:
+                            val = _call_with_params(method, param_dict, params_tuple, cached_data)
+                            samples.append(val)
+                        except NotImplementedError:
+                            samples.clear()
+                            break
                 # Compute the average result.
-                avg = sum(samples) / len(samples)
-                results[(attr, params)] = avg
+                if not samples:
+                    if print_results:
+                        print(f"[Benchmark] Skipping {benchmark_cls.__name__}.{attr} (NotImplemented).")
+                    continue
+                avg = sum(samples) / len(samples)
+                results[(attr, params)] = avg

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1c45bb3 and 7e5c676.

📒 Files selected for processing (1)

newton/_src/utils/benchmark.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

newton/_src/utils/benchmark.py (2)

asv/benchmarks/compilation/bench_example_load.py (6)

setup (32-34)

setup (60-62)

setup (87-89)

setup (114-116)

setup (142-144)

setup (170-172)

asv/benchmarks/simulation/bench_cloth.py (2)

setup (32-34)

setup (47-49)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)

🔇 Additional comments (1)

newton/_src/utils/benchmark.py (1)

32-44: Ignore name-based signature checks for cached_data
Benchmark methods uniformly expect the cached data as the first positional argument (often named timings), so probing for a parameter literally named cached_data or for varargs will break the existing benchmarks. Leave the unconditional positional prepend in place.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-09-05T14:28:29Z

+        # If param_lists contains multiple lists, generate all combinations
+        # If it's a single list, just use it directly
+        if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists):
+            combinations = itertools.product(*param_lists)
+        else:
+            combinations = param_lists
    else:


🛠️ Refactor suggestion

Avoid Cartesian product for already-expanded parameter sets

Current check will incorrectly product a list-of-tuples like [('ant', 32), ('humanoid', 64)]. Gate the product by matching param_names length.

- # If param_lists contains multiple lists, generate all combinations - # If it's a single list, just use it directly - if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists): - combinations = itertools.product(*param_lists) - else: - combinations = param_lists + # If params are given as axes (one list per param_name), take the Cartesian product. + # Otherwise, treat as a sequence of already-expanded parameter sets. + param_names_for_axes = getattr(benchmark_cls, "param_names", None) + if ( + isinstance(param_lists, (list | tuple)) + and param_names_for_axes + and len(param_lists) == len(param_names_for_axes) + and all(isinstance(item, (list | tuple)) for item in param_lists) + ): + combinations = itertools.product(*param_lists) + else: + combinations = param_lists

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# If param_lists contains multiple lists, generate all combinations

# If it's a single list, just use it directly

if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists):

combinations = itertools.product(*param_lists)

else:

combinations = param_lists

else:

# If params are given as axes (one list per param_name), take the Cartesian product.

# Otherwise, treat as a sequence of already-expanded parameter sets.

param_names_for_axes = getattr(benchmark_cls, "param_names", None)

if (

isinstance(param_lists, (list | tuple))

and param_names_for_axes

and len(param_lists) == len(param_names_for_axes)

and all(isinstance(item, (list | tuple)) for item in param_lists)

):

combinations = itertools.product(*param_lists)

else:

combinations = param_lists

🤖 Prompt for AI Agents

In newton/_src/utils/benchmark.py around lines 61 to 67, the current logic uses itertools.product whenever param_lists has length >1 and all items are list/tuple, which incorrectly expands already-paired parameter tuples like [('ant', 32), ('humanoid', 64)]; change the condition to only use itertools.product when the number of param_lists matches the number of param_names (i.e., each param name has its own list of candidate values), otherwise treat param_lists as already-expanded combinations and use it directly. Ensure the check compares len(param_lists) == len(param_names) (and optionally that each element is a sequence) before calling itertools.product.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

asv/benchmarks/setup/bench_model.py (2)
64-71: Warm-up cleanup for KPI looks correct and resolves prior OOM risk.

del solver/_model/builder + gc.collect() + wp.synchronize_device() is in place. Good fix.

132-142: Free warm-up model/builder in FastInitializeModel to prevent memory growth across (robot, num_envs).

_model and builder persist after warm-up; this can accumulate. Mirror KPI cleanup and sync after deletes.
-            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
-            del solver
-
-            wp.synchronize_device()
+            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
+            del solver
+            del _model
+            del builder
+            gc.collect()
+            wp.synchronize_device()

🧹 Nitpick comments (5)

asv/benchmarks/setup/bench_model.py (5)
23-24: Avoid global Warp config side effects; set per-benchmark.

Move wp.config.enable_backward = False and wp.config.quiet = True into each setup_cache to prevent global impact when this module is imported.

Apply:
- wp.config.enable_backward = False
- wp.config.quiet = True
And add at the beginning of both setup methods:
@@
-    def setup_cache(self):
+    def setup_cache(self):
+        wp.config.enable_backward = False
+        wp.config.quiet = True
         wp.init()
146-165: Delete builder inside the repeat loop (Fast) to keep host memory flat.

Currently only solver and model are deleted.
                 del solver
                 del model
+                del builder
+                gc.collect()
29-36: Document/verify nconmax heuristics.

If 150 is empirical for g1/h1, add a brief comment; otherwise consider deriving from model or leaving None for consistency.

117-121: Align benchmark metadata across classes.

Optional: add timeout to FastInitializeModel for consistency with KPI.
     rounds = 1
     number = 1
     repeat = 3
     min_run_count = 1
+    timeout = 1200
180-191: Release builder in peakmem bench to avoid skewing measurements.

Delete builder and force GC after finalizing/del model.
         with wp.ScopedDevice("cpu"):
             builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

             # finalize model
             model = builder.finalize()

         del model
+        del builder
+        gc.collect()

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7e5c676 and ae94ce6.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_model.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_model.py (2)

newton/examples/example_mujoco.py (1)

create_model_builder (291-320)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1094-2565)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)

🔇 Additional comments (1)

asv/benchmarks/setup/bench_model.py (1)

52-54: The current directory listing isn’t required for resolving whether ASV treats NotImplementedError as a skip. Instead, confirm via ASV documentation or test harness.
Next step: search ASV source or docs for exception handling semantics of NotImplementedError.

shi-eric · 2025-09-06T05:15:30Z

+            timings_solver = []
+
+            for _ in range(self.repeat):
+                modelBuilder_beg = time.perf_counter()


These variable names could be improved, e.g. use snake case model_builder and don't abbreviate beg since we're not being charged by the character

ack, you're totally right. Will improve this.

shi-eric · 2025-09-06T05:23:44Z

    timeout = 3600

-    def setup(self, robot, num_envs):
+    def setup_cache(self):


I don't quite understand the logic for using setup_cache. A track_ benchmark already runs once total so it seems less confusing to put specific quantity you want to measure in the track_ function itself instead of doing everything in setup_cache.

I would also recommend trying to make things work with as time_ benchmarks if possible since they will also report some measure of variability (in addition to collecting multiple samples) and report them in the summary tables

Thanks for the feedback. It feels really overengineered to me as well..

The main thing I wanted to accomplish is to avoid having to run the modelBuilder setup for each run of the SolverMuJoCo initialization because of the way it impacts the total time needed to run the benchmarks.

I would strongly prefer using time_* benchmarks, but for both time_ and track_ I cannot assume any ordering of the runs. I also cannot measure 2 things in both of these, so I need a way to combine it. setup_cache seemed to be the logical place because it only runs once, and I can forward the data to the track_ functions.

Please correct me if my assumptions are wrong here.

Thinking of it, maybe it's easiest to just leave the existing model benchmark as-is, and revert to the MuJoCo specific one that does the modelBuilder stuff in setup_cache to avoid running it for every repeat. I think we can pay that price.

separate question you might know the answer to - is the warp cache cleared for each repeat or does it persist?

I played some more with this. I tried to move the mujoco parts to it's own file again and cache the model. That doesn't really work because setup_cache isn't parametrizable, so I would end up implementing the same thing I did here with the combined solution.

The other option is obviously also to just forget about optimizing the runtime and revert back to the simple version that was already approved by Eric before. I'm fine with that as well.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

asv/benchmarks/setup/bench_model.py (2)

132-141: Missing resource cleanup compared to KpiInitializeModel.

The FastInitializeModel class is missing the del _model and gc.collect() cleanup that exists in KpiInitializeModel after the kernel preloading step. This could lead to memory accumulation across iterations.

Apply this diff to add consistent resource cleanup:

             # Load the model to cache the kernels
             solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
             del solver
+            del _model
+            del builder
+            gc.collect()

             wp.synchronize_device()

76-100: Fix variable name inconsistencies.

Several variable naming issues need to be addressed:

modelBuilder_beg should use snake_case as model_builder_begin
The timing collection uses inconsistent variable names

Apply this diff to fix the variable naming:

             for _ in range(self.repeat):
-                modelBuilder_beg = time.perf_counter()
+                model_builder_begin = time.perf_counter()
                 builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

                 # finalize model
                 model = builder.finalize()
                 wp.synchronize_device()
-                modelBuilder_end = time.perf_counter()
-                timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg)
+                model_builder_end = time.perf_counter()
+                timings_modelBuilder.append(model_builder_end - model_builder_begin)

-                solver_beg = time.perf_counter()
+                solver_begin = time.perf_counter()
                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
-                timings_solver.append(solver_end - solver_beg)
+                timings_solver.append(solver_end - solver_begin)

🧹 Nitpick comments (2)

asv/benchmarks/setup/bench_model.py (2)
29-36: Consider making the constraint configuration more explicit.

The nconmax dictionary mixes None values with specific integers, making the intent unclear. Consider documenting why some robots use None while others have specific values, or use a more explicit configuration structure.

Add a comment explaining the configuration:
+# Per-robot constraint limits for SolverMuJoCo initialization
+# None = use solver's automatic estimation, int = explicit limit
 nconmax = {
     "humanoid": None,
     "g1": 150,
     "h1": 150,
     "cartpole": None,
     "ant": None,
     "quadruped": None,
 }
180-180: Unused parameter should be named with underscore prefix.

The first parameter appears to be unused cached data. Consider making this more explicit by using a descriptive name or double underscore prefix.

Apply this diff for better clarity:
-    def peakmem_initialize_model_cpu(self, _, robot, num_envs):
+    def peakmem_initialize_model_cpu(self, _cached_data, robot, num_envs):

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae94ce6 and 92a9db2.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_model.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_model.py (2)

newton/examples/example_mujoco.py (2)

Example (185-320)

create_model_builder (291-320)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1094-2565)

🪛 Ruff (0.12.2)

asv/benchmarks/setup/bench_model.py

156-156: Local variable solver_begin is assigned to but never used

Remove assignment to unused variable solver_begin

(F841)

160-160: Undefined name solver_beg

(F821)

165-165: Undefined name timings_modelBuilder

(F821)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)

🔇 Additional comments (3)

asv/benchmarks/setup/bench_model.py (3)

16-28: LGTM! Appropriate imports added for the new functionality.

The new imports (itertools, time, numpy) and SolverMuJoCo are correctly imported and necessary for the benchmark caching implementation.

49-72: LGTM! Well-structured setup caching with proper resource management.

The setup_cache method correctly implements kernel preloading and resource cleanup to optimize benchmark runtime. The CUDA availability check prevents runtime errors in environments without GPU support.

102-110: LGTM! Clean implementation of cached timing retrieval.

The track method correctly retrieves cached solver timing data and properly specifies the unit as milliseconds.

adenzler-nvidia · 2025-09-10T15:05:13Z

@eric-heiden would you mind taking this over in the context of #548 ?

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

asv/benchmarks/setup/bench_model.py (1)

1-221: Fix undefined name and incorrect deletions in FastInitializeModel.setup_cache

At line 163, change timings_solver.append(solver_end - solver_beg) to use the correct variable solver_begin.

At lines 93, 140, 166, and 194, replace del model with del _model to match the variable initialized by builder.finalize().

♻️ Duplicate comments (2)

asv/benchmarks/setup/bench_model.py (2)
59-72: Warm-up and cleanup looks good.

Preloading kernels and freeing _model/builder between (robot, num_envs) avoids memory growth and noisy timings. Nice.

159-164: Undefined name solver_beg → use solver_begin.

Typo breaks timing; use the variable you define.
                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
-                timings_solver.append(solver_end - solver_beg)
+                timings_solver.append(solver_end - solver_begin)

🧹 Nitpick comments (5)

asv/benchmarks/setup/bench_model.py (5)
185-196: Also delete builder in peakmem benchmark to avoid lingering allocations.

Minor hygiene; helps stabilizing RSS measurements.
     with wp.ScopedDevice("cpu"):
         builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
 
         # finalize model
         model = builder.finalize()
 
     del model
+    del builder
+    gc.collect()
65-65: Defensive access for nconmax.

Future robot additions won’t crash the benchmark if the map isn’t updated.
-            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
+            solver = SolverMuJoCo(_model, ncon_per_env=nconmax.get(robot))
-                solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+                solver = SolverMuJoCo(model, ncon_per_env=nconmax.get(robot))
Also applies to: 87-87, 138-138, 160-160

117-121: Consider adding timeout like KPI class for consistency.

Keeps runner behavior uniform across benchmarks.
     rounds = 1
     number = 1
     repeat = 3
     min_run_count = 1
+    timeout = 3600
23-25: Module-wide Warp config can leak to other benchmarks.

Optional: set/restore these in setup_cache to localize side effects.

59-101: DRY: factor shared benchmarking loop.

Both classes duplicate the same warm-up/measure/cleanup logic; a small helper would reduce maintenance.

If helpful, I can sketch a _collect_timings(robots, envs, repeat) helper and integrate it into both classes.

Also applies to: 132-173

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92a9db2 and 12b12df.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_model.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_model.py (2)

newton/examples/example_mujoco.py (1)

create_model_builder (300-329)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1088-2559)

🪛 Ruff (0.12.2)

asv/benchmarks/setup/bench_model.py

140-140: Undefined name model

(F821)

159-159: Local variable solver_begin is assigned to but never used

Remove assignment to unused variable solver_begin

(F841)

163-163: Undefined name solver_beg

(F821)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)

🔇 Additional comments (1)

asv/benchmarks/setup/bench_model.py (1)

52-54: Confirm skip semantics when CUDA is unavailable.

Raising NotImplementedError is fine if your run_benchmark treats it as SKIPPED; otherwise the run will error. Please confirm behavior.

Also applies to: 125-127

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

asv/benchmarks/setup/bench_model.py (2)
59-72: Warm-up + explicit cleanup looks good; prevents memory growth across sweeps.

Preloading kernels, deleting solver/model/builder, gc.collect(), and synchronizing are correctly in place.

159-164: Fix NameError in timing: use solver_begin, not solver_beg.

This currently crashes and also triggers Ruff F821/F841. Use the correct variable.
                 solver_begin = time.perf_counter()
                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
-                timings_solver.append(solver_end - solver_beg)
+                timings_solver.append(solver_end - solver_begin)

🧹 Nitpick comments (2)

asv/benchmarks/setup/bench_model.py (2)
185-196: Release builder and force GC in peakmem path (post-measurement cleanup).

Minor cleanup parity with other paths; avoids lingering CPU allocations after peakmem sampling.
     with wp.ScopedDevice("cpu"):
         builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

         # finalize model
         model = builder.finalize()

     del model
+    del builder
+    gc.collect()
39-111: Optional: de-duplicate common benchmarking logic between classes.

Both classes share identical loops; extracting a small helper (e.g., _measure_timings(robots, envs, repeat)) would reduce maintenance and keep fixes (like the NameError above) in one place.

Also applies to: 113-184

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12b12df and 2e96c89.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_model.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_model.py (2)

newton/examples/example_mujoco.py (2)

Example (191-329)

create_model_builder (300-329)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1088-2559)

🪛 Ruff (0.12.2)

asv/benchmarks/setup/bench_model.py

159-159: Local variable solver_begin is assigned to but never used

Remove assignment to unused variable solver_begin

(F841)

163-163: Undefined name solver_beg

(F821)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)

🔇 Additional comments (4)

asv/benchmarks/setup/bench_model.py (4)

76-85: Timing instrumentation is correct.

Includes builder creation and finalize; device syncs bracket timings appropriately. Same for solver init.

Also applies to: 86-91

97-101: Returning cached medians in ms via track_ methods: LGTM.

Interface aligns with run_benchmark’s cached-arg calling convention.

Also applies to: 102-111

132-145: FastInitializeModel warm-up/cleanup mirrors KPI path correctly.

Consistent kernel preload and cleanup sequence.

52-54: Confirm ASV skip on CUDA unavailable.
No except NotImplementedError is present in newton/_src/utils/benchmark.py (around the run_benchmark definition), so verify that ASV’s runner will catch the exception raised in setup_cache and skip the benchmark when CUDA isn’t available to prevent CPU-only CI failures.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

asv/benchmarks/setup/bench_model.py (5)

76-78: Synchronize before starting the timed region to isolate measurements.

Insert a device sync before each timed block to avoid overlap from prior work influencing the next measurement. This does not affect timing since it’s outside the measured window.

-            for _ in range(self.repeat):
-                model_builder_begin = time.perf_counter()
+            for _ in range(self.repeat):
+                wp.synchronize_device()
+                model_builder_begin = time.perf_counter()

-            for _ in range(self.repeat):
-                model_builder_begin = time.perf_counter()
+            for _ in range(self.repeat):
+                wp.synchronize_device()
+                model_builder_begin = time.perf_counter()

Also applies to: 149-151

29-36: Use a constant-style name and clarify None semantics for ncon_per_env.

Minor naming/readability nit: make it an ALL_CAPS module constant and keep usage consistent. Also add a short comment that None falls back to model.rigid_contact_max in SolverMuJoCo.

-nconmax = {
+NCON_PER_ENV = {
     "humanoid": None,
     "g1": 150,
     "h1": 150,
     "cartpole": None,
     "ant": None,
     "quadruped": None,
 }

-            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
+            solver = SolverMuJoCo(_model, ncon_per_env=NCON_PER_ENV[robot])

-                solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+                solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])

-            solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+            solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])

-                solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+                solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])

Also applies to: 65-66, 87-88, 138-141, 160-161

102-106: Prefer snake_case and consistent naming for track methods.

Rename for consistency with Python style and the class name: solver_mujoco.

-def track_time_initialize_solverMuJoCo(self, timings, robot, num_envs):
+def track_time_initialize_solver_mujoco(self, timings, robot, num_envs):
     return timings["solver"][(robot, num_envs)]

-track_time_initialize_solverMuJoCo.unit = "ms"
+track_time_initialize_solver_mujoco.unit = "ms"

-def track_time_initialize_solverMuJoCo(self, timings, robot, num_envs):
+def track_time_initialize_solver_mujoco(self, timings, robot, num_envs):
     return timings["solver"][(robot, num_envs)]

-track_time_initialize_solverMuJoCo.unit = "ms"
+track_time_initialize_solver_mujoco.unit = "ms"

Also applies to: 175-179

149-169: De-duplicate nearly identical benchmarking logic between classes.

The warm-up, timing, and cleanup loops are duplicated. Factor into a shared helper to reduce drift and maintenance.

Example helper (add at module scope):

def _measure_init_timings(robot_list, env_list, repeat):
    timings = {"model_builder": {}, "solver": {}}
    for robot, num_envs in itertools.product(robot_list, env_list):
        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
        _model = builder.finalize()
        solver = SolverMuJoCo(_model, ncon_per_env=NCON_PER_ENV[robot])
        del solver, _model, builder
        gc.collect(); wp.synchronize_device()

        t_mb, t_sv = [], []
        for _ in range(repeat):
            wp.synchronize_device()
            t0 = time.perf_counter()
            builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
            model = builder.finalize(); wp.synchronize_device()
            t_mb.append(time.perf_counter() - t0)

            t1 = time.perf_counter()
            solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])
            wp.synchronize_device()
            t_sv.append(time.perf_counter() - t1)

            del solver, model, builder
            gc.collect(); wp.synchronize_device()

        timings["model_builder"][(robot, num_envs)] = np.median(t_mb) * 1000
        timings["solver"][(robot, num_envs)] = np.median(t_sv) * 1000
    return timings

Then in each setup_cache:

return _measure_init_timings(self.params[0], self.params[1], self.repeat)

Also applies to: 76-97, 132-173, 49-101

188-196: Free the builder in peakmem CPU path as well.

Small cleanup to avoid retaining host-side allocations in memory measurements.

 with wp.ScopedDevice("cpu"):
     builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

     # finalize model
     model = builder.finalize()

 del model
+del builder

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e96c89 and 05f26c8.

📒 Files selected for processing (1)

asv/benchmarks/setup/bench_model.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/setup/bench_model.py (2)

newton/examples/example_mujoco.py (2)

Example (191-329)

create_model_builder (300-329)

newton/_src/solvers/mujoco/solver_mujoco.py (1)

SolverMuJoCo (1088-2559)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
GitHub Check: run-newton-tests / newton-unittests (windows-latest)

🔇 Additional comments (2)

asv/benchmarks/setup/bench_model.py (2)
59-72: Warm-up + cleanup looks solid.

Kernel preloading plus explicit deletion and sync before measurement is the right tradeoff for stable timings.

Also applies to: 132-145

29-36: Confirm ncon_per_env values for g1/h1 are safe at high env counts.

Hard-coding 150 may under- or over-allocate depending on contact richness. Please verify against ROBOT_CONFIGS and/or empirical rigid_contact_max.

You can locate any configured per-robot contact limits and cross-check:
#!/bin/bash
# Show where ncon/contact-related caps are defined for MuJoCo examples
rg -n -C2 -S '(ROBOT_CONFIGS|nconmax|ncon_per_env|rigid_contact_max)' newton
If uncertainty remains, consider using None (fallback to model.rigid_contact_max) or a conservative per-robot cap derived from Example builders.

coderabbitai · 2025-09-10T18:58:46Z

+                del solver
+                del model
+                del builder
+                gc.collect()
+
+            timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000


🛠️ Refactor suggestion

Ensure GPU memory is actually reclaimed between repeats (add sync after gc).

Without a post-gc device sync, freeing large MJ/solver buffers can lag and accumulate across repeats, risking OOM at 4096/8192 envs. Add a sync after cleanup (outside the timed region).

del solver del model del builder gc.collect() + wp.synchronize_device()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

del solver

del model

del builder

gc.collect()

timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000

del solver

del model

del builder

gc.collect()

wp.synchronize_device()

timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000

🤖 Prompt for AI Agents

In asv/benchmarks/setup/bench_model.py around lines 92 to 97, GPU memory from MJ/solver buffers may not be reclaimed immediately after deleting solver/model/builder and calling gc.collect(), so add an explicit device synchronization call (e.g. torch.cuda.synchronize() if using PyTorch) immediately after gc.collect() and before recording timings to ensure freed GPU memory is actually released between repeats; place this sync outside the timed region so it does not affect benchmark measurements.

coderabbitai · 2025-09-10T18:58:47Z

+                del solver
+                del model
+                del builder
+                gc.collect()
+


🛠️ Refactor suggestion

Mirror the post-gc device sync in the fast benchmark too.

Same memory pressure risk as above; add a sync here as well.

del solver del model del builder gc.collect() + wp.synchronize_device()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

del solver

del model

del builder

gc.collect()

del solver

del model

del builder

gc.collect()

wp.synchronize_device()

🤖 Prompt for AI Agents

In asv/benchmarks/setup/bench_model.py around lines 165 to 169, after deleting solver/model/builder and calling gc.collect() in the fast benchmark you must mirror the post-gc device synchronization done elsewhere: add a device sync call (e.g., call the CUDA/device synchronize function used elsewhere in the file) immediately after gc.collect(). Ensure the sync is conditional on the device/backend being used (only call if CUDA/accelerator is available) and import or reference the same sync utility already used in the file to keep behavior consistent.

eric-heiden · 2025-09-18T17:04:14Z

Closing this PR, since #778 is adding solver initialization benchmarks (and merges this PR branch).

adenzler-nvidia added 2 commits September 4, 2025 13:46

add first versions of benchmarks

642619a

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

move to new file

f7bbae4

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

adenzler-nvidia added 2 commits September 4, 2025 13:59

Merge branch 'main' into dev/adenzler/solver-creation-benchmarks

a79c208

add new stuff after merge

1084d4c

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 4, 2025

View reviewed changes

adenzler-nvidia added 4 commits September 4, 2025 15:07

bump mjwarp dependency

0e408a1

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

add asv as well

d4636ef

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

Merge branch 'dev/adenzler/another-package-upgrade' into dev/adenzler…

679ad61

…/solver-creation-benchmarks

formatting

b8d835a

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 4, 2025

View reviewed changes

move to setup for correct parametrization

bbe1a78

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 4, 2025

View reviewed changes

adenzler-nvidia added 5 commits September 4, 2025 17:15

add nconmax to avoid running out of memory

15b7b6c

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

fix parametrization

807d011

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

formatting

11222be

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

fix parameters

0861dce

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

Merge branch 'main' into dev/adenzler/solver-creation-benchmarks

b05cf6f

eric-heiden reviewed Sep 4, 2025

View reviewed changes

Comment thread asv/benchmarks/setup/bench_solver_mujoco.py Outdated

eric-heiden previously approved these changes Sep 4, 2025

View reviewed changes

fix comment

b8f73b4

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

adenzler-nvidia enabled auto-merge (squash) September 5, 2025 06:45

coderabbitai Bot reviewed Sep 5, 2025

View reviewed changes

adenzler-nvidia added 6 commits September 5, 2025 10:27

Merge branch 'main' into dev/adenzler/solver-creation-benchmarks

9141779

make it only run once

3f1a620

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

move to model benchmark

ed2ab1c

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

cleanup

be09aae

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

remove old benchmark file

57802c6

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

ruff

e23cb57

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

make sure we can use parameter lists with single params

6806052

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 5, 2025

View reviewed changes

Comment thread asv/benchmarks/setup/bench_model.py

adenzler-nvidia added 2 commits September 5, 2025 14:25

fix benchmark utils to allow running single-list parametrizations and…

afe2666

… setup_cache Signed-off-by: Alain Denzler <adenzler@nvidia.com>

cleanup and make it simpler to follow

1c45bb3

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 5, 2025

View reviewed changes

adenzler-nvidia added 3 commits September 5, 2025 16:02

Merge branch 'main' into dev/adenzler/solver-creation-benchmarks

cac20c0

ruff

7e5c676

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbit suggestions

ae94ce6

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 5, 2025

View reviewed changes

Merge branch 'main' into dev/adenzler/solver-creation-benchmarks

d077061

eric-heiden requested a review from shi-eric September 5, 2025 19:10

shi-eric requested changes Sep 6, 2025

View reviewed changes

improve variable names

92a9db2

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 8, 2025

View reviewed changes

Comment thread asv/benchmarks/setup/bench_model.py

adenzler-nvidia added 2 commits September 10, 2025 17:56

cleanup

9223e8d

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

Merge branch 'main' into dev/adenzler/solver-creation-benchmarks

12b12df

adenzler-nvidia dismissed eric-heiden’s stale review via 12b12df September 10, 2025 16:12

coderabbitai Bot reviewed Sep 10, 2025

View reviewed changes

Comment thread asv/benchmarks/setup/bench_model.py

formatting

2e96c89

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 10, 2025

View reviewed changes

typo

05f26c8

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

coderabbitai Bot reviewed Sep 10, 2025

View reviewed changes

eric-heiden mentioned this pull request Sep 18, 2025

Improve SolverMuJoCo setup performance #778

Merged

4 tasks

eric-heiden closed this Sep 18, 2025

auto-merge was automatically disabled September 18, 2025 17:04
Pull request was closed

-        if not isinstance(params, (list, tuple)):
-            params = (params,)
+        # Ensure params is always a tuple for consistent handling
+-        if not isinstance(params, (list, tuple)):
+        params_tuple = params if isinstance(params, (list, tuple)) else (params,)
+        param_dict = dict(zip(_names, params))
+            _call_with_params(instance.setup, param_dict, params_tuple, cached_data)

Conversation

adenzler-nvidia commented Sep 4, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Newton Migration Guide

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks (2 passed, 1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eric-heiden left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

adenzler-nvidia commented Sep 5, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

shi-eric Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

adenzler-nvidia Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

shi-eric Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

adenzler-nvidia Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

adenzler-nvidia Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

adenzler-nvidia Sep 8, 2025

adenzler-nvidia commented Sep 4, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 4, 2025 •

edited

Loading