Skip to content

SolverMuJoCo: Benchmarks for solver initialization#733

Closed
adenzler-nvidia wants to merge 33 commits into
newton-physics:mainfrom
adenzler-nvidia:dev/adenzler/solver-creation-benchmarks
Closed

SolverMuJoCo: Benchmarks for solver initialization#733
adenzler-nvidia wants to merge 33 commits into
newton-physics:mainfrom
adenzler-nvidia:dev/adenzler/solver-creation-benchmarks

Conversation

@adenzler-nvidia

@adenzler-nvidia adenzler-nvidia commented Sep 4, 2025

Copy link
Copy Markdown
Member

completes #591 for alpha P0 envs. Once allegro hand is up, we can include here as well.

We should figure out whether we really need all the robots here or whether it's ok to have a subset.

Newton Migration Guide

Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this PR.

  • The migration guide in docs/migration.rst is up-to date

Before your PR is "Ready for review"

  • Necessary tests have been added and new examples are tested (see newton/tests/test_examples.py)
  • Documentation is up-to-date
  • Code passes formatting and linting checks with pre-commit run -a

Summary by CodeRabbit

  • New Features

    • GPU-aware cached warm-up benchmarking with kernel preloading and per-robot/per-env parameterization; reports separate model-construction and solver-initialization medians (ms).
  • Refactor

    • Unified multi-phase benchmarking flow with standardized parameter handling, first-run caching, and consistent invocation across benchmark stages.
  • Chores

    • Runtime GPU availability gating, public exposure of solver/constraint configuration, and new public timing/parameter attributes for benchmarks.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
@coderabbitai

coderabbitai Bot commented Sep 4, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds benchmark caching and parameterized invocation: run_benchmark can call setup_cache once to return cached_data; params may be converted to dicts when param_names provided; benchmark calls route through a wrapper that injects cached_data; model benchmarks preload SolverMuJoCo kernels per (robot, num_envs) and record median ModelBuilder and SolverMuJoCo timings.

Changes

Cohort / File(s) Summary
Benchmark caching & model timing
asv/benchmarks/setup/bench_model.py
Added setup_cache (runtime CUDA guard) that enumerates (robot, num_envs) combos, finalizes models, instantiates SolverMuJoCo to preload kernels (using nconmax), synchronizes device, records median timings (model builder and solver) in a timings dict (ms), returns cached timings; added track_time_initialize_model and track_time_initialize_solverMuJoJo/track_time_initialize_solverMuJoCo; adjusted peakmem_initialize_model_cpu signature; added public params/param_names to KpiInitializeModel and FastInitializeModel; introduced global nconmax; exposed SolverMuJoCo; replaced decorator gating with runtime CUDA check.
Benchmark runner / param & caching helpers
newton/_src/utils/benchmark.py
Added _convert_params_to_dict and _call_with_params to normalize parameter passing and inject cached_data; updated run_benchmark to treat multi-list params as Cartesian product when appropriate, call setup_cache once for the first combination to obtain cached_data, and invoke setup/time_*/track_*/teardown via the wrapper supporting positional or keyword invocation and cached_data injection.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant ASV as ASV Runner
  participant RUN as run_benchmark
  participant BEN as Benchmark (Kpi/Fast)
  participant CUDA as CUDA
  participant MB as ModelBuilder
  participant SMJ as SolverMuJoCo
  Note over RUN,BEN: param combination & cached setup

  ASV->>RUN: start benchmark(params)
  RUN->>BEN: compute param combos
  RUN->>BEN: call setup_cache()  -- first combo
  BEN->>CUDA: check availability
  alt CUDA unavailable
    BEN-->>RUN: raise NotImplementedError
  else CUDA available
    loop for each (robot,num_envs)
      BEN->>MB: build + finalize model
      BEN->>SMJ: instantiate solver (preload kernels)
      BEN->>CUDA: synchronize device
      BEN-->>BEN: record median timings -> timings[(robot,num_envs)]
      BEN->>BEN: cleanup
    end
    BEN-->>RUN: return cached timings
  end

  RUN->>BEN: _call_with_params(track_time_initialize_model, param_dict, params, cached_data)
  BEN-->>RUN: return model timing (ms)
  RUN->>BEN: _call_with_params(track_time_initialize_solverMuJoCo, ...)
  BEN-->>RUN: return solver timing (ms)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • eric-heiden
  • shi-eric

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly states that this PR adds benchmarks for solver initialization using the SolverMuJoCo class, which is a central part of the changeset; it is concise, focused, and free of unrelated noise, even though it does not mention the secondary model initialization integration explicitly.
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
asv/benchmarks/setup/bench_solver_mujoco.py (4)

45-49: Comment does not match code behavior.
The comment says “Load a small model,” but you instantiate the solver with self._model (which may be very large). Either update the comment or actually use a small model for warm-up.

Apply one of:

  • Update comment (preferred, since kernel specialization often depends on shape):
-        # Load a small model to cache the kernels
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
  • Or construct a tiny warm-up model (if acceptable).

16-16: Remove unused import.
gc is not used in this module.

-import gc

27-35: Optional: sanity-check large param sizes.
num_envs of 4096/8192 can OOM typical 16–24GB GPUs during solver construction. If CI runs these benches, consider adding an environment toggle or reducing defaults.


52-52: Optional: align method names with PEP 8 / other benches.
Consider time_initialize_solver_mujoco for consistency with time_initialize_model.

Also applies to: 79-79

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0e8ee91 and f7bbae4.

📒 Files selected for processing (2)
  • asv/benchmarks/setup/bench_model.py (1 hunks)
  • asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_solver_mujoco.py (2)
asv/benchmarks/setup/bench_model.py (2)
  • setup (36-37)
  • setup_cache (57-61)
newton/examples/example_mujoco.py (1)
  • create_model_builder (296-325)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_solver_mujoco.py

53-53: Local variable solver is assigned to but never used

Remove assignment to unused variable solver

(F841)


80-80: Local variable solver is assigned to but never used

Remove assignment to unused variable solver

(F841)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_model.py (1)

81-81: No-op whitespace change; OK to keep.
No functional impact. Safe to merge as-is.

Comment on lines +37 to +50
def setup(self, robot, num_envs):
wp.init()

builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

# finalize model
self._model = builder.finalize()

# Load a small model to cache the kernels
solver = SolverMuJoCo(self._model)
del solver

wp.synchronize_device()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Guard setup on CPU-only runners and finalize on CUDA explicitly.
Without a guard, setup will still instantiate the solver on CPU-only environments even though the timing method is skipped, likely breaking CI runners without GPUs. Also, make the model device explicit to avoid accidental CPU models skewing solver init times.

Apply:

     def setup(self, robot, num_envs):
         wp.init()
-
-        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+        if wp.get_cuda_device_count() == 0:
+            # CPU-only: time method is skipped; avoid creating GPU-dependent resources.
+            return
+
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
 
         # finalize model
-        self._model = builder.finalize()
+        self._model = builder.finalize(device="cuda:0")
 
-        # Load a small model to cache the kernels
-        solver = SolverMuJoCo(self._model)
-        del solver
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
+        solver = SolverMuJoCo(self._model)
+        del solver
 
         wp.synchronize_device()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def setup(self, robot, num_envs):
wp.init()
builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
# finalize model
self._model = builder.finalize()
# Load a small model to cache the kernels
solver = SolverMuJoCo(self._model)
del solver
wp.synchronize_device()
def setup(self, robot, num_envs):
wp.init()
if wp.get_cuda_device_count() == 0:
# CPU-only: time method is skipped; avoid creating GPU-dependent resources.
return
builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
# finalize model
self._model = builder.finalize(device="cuda:0")
# Warm up: instantiate once to JIT-compile kernels for this model shape
solver = SolverMuJoCo(self._model)
del solver
wp.synchronize_device()
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_solver_mujoco.py around lines 37-50, guard the
setup so it does nothing on CPU-only runners and ensure the model is finalized
on CUDA before creating the solver: check for a CUDA device at the start of
setup (e.g., if wp.get_device_count("cuda") == 0: return), then call
builder.finalize specifying the CUDA device (e.g., finalize(device="cuda:0") or
finalize(device=wp.Device("cuda", 0))) before instantiating SolverMuJoCo, so the
solver creation only runs when CUDA is available and the model is explicitly on
the GPU.

Comment on lines +51 to +55
@skip_benchmark_if(wp.get_cuda_device_count() == 0)
def time_initialize_solverMuJoCo(self, robot, num_envs):
solver = SolverMuJoCo(self._model)
wp.synchronize_device()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix Ruff F841 and ensure deterministic lifetime in the timed section.
Variable solver is unused; switch to _ and explicitly delete after sync.

Apply:

-        solver = SolverMuJoCo(self._model)
-        wp.synchronize_device()
+        _ = SolverMuJoCo(self._model)
+        wp.synchronize_device()
+        del _
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@skip_benchmark_if(wp.get_cuda_device_count() == 0)
def time_initialize_solverMuJoCo(self, robot, num_envs):
solver = SolverMuJoCo(self._model)
wp.synchronize_device()
@skip_benchmark_if(wp.get_cuda_device_count() == 0)
def time_initialize_solverMuJoCo(self, robot, num_envs):
_ = SolverMuJoCo(self._model)
wp.synchronize_device()
del _
🧰 Tools
🪛 Ruff (0.12.2)

53-53: Local variable solver is assigned to but never used

Remove assignment to unused variable solver

(F841)

🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_solver_mujoco.py around lines 51 to 55, the timed
section assigns SolverMuJoCo to an unused variable (causing Ruff F841) and
doesn't guarantee deterministic cleanup; replace the unused name with an
underscore (e.g., _ = SolverMuJoCo(self._model)), call wp.synchronize_device(),
then explicitly delete the instance with del _ immediately after sync to ensure
deterministic lifetime and satisfy the linter.

Comment on lines +66 to +77
def setup_cache(self, robot, num_envs):
wp.init()
builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
# finalize model
self._model = builder.finalize()

# Load a small model to cache the kernels
solver = SolverMuJoCo(self._model)
del solver

wp.synchronize_device()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Mirror the CPU guard and explicit CUDA device in setup_cache.
Avoid GPU work on CPU-only runners and make device explicit for consistency with KPI class.

Apply:

     def setup_cache(self, robot, num_envs):
-        wp.init()
-        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+        wp.init()
+        if wp.get_cuda_device_count() == 0:
+            return
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
         # finalize model
-        self._model = builder.finalize()
+        self._model = builder.finalize(device="cuda:0")
 
-        # Load a small model to cache the kernels
-        solver = SolverMuJoCo(self._model)
-        del solver
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
+        solver = SolverMuJoCo(self._model)
+        del solver
 
         wp.synchronize_device()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def setup_cache(self, robot, num_envs):
wp.init()
builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
# finalize model
self._model = builder.finalize()
# Load a small model to cache the kernels
solver = SolverMuJoCo(self._model)
del solver
wp.synchronize_device()
def setup_cache(self, robot, num_envs):
wp.init()
if wp.get_cuda_device_count() == 0:
return
builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
# finalize model
self._model = builder.finalize(device="cuda:0")
# Warm up: instantiate once to JIT-compile kernels for this model shape
solver = SolverMuJoCo(self._model)
del solver
wp.synchronize_device()
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_solver_mujoco.py around lines 66-77, setup_cache
currently always initializes GPU work; add the same CPU guard and explicit CUDA
device selection used in the KPI class: first check if the runtime is CPU-only
and return early (e.g. if wp.get_device().is_cpu or wp.context.is_cpu then
return), otherwise set the CUDA device explicitly (e.g. select "cuda:0" via the
same wp API your KPI class uses) before calling wp.init(), build/finalize the
model and create/delete SolverMuJoCo, then call wp.synchronize_device(); this
avoids GPU calls on CPU-only runners and makes the device explicit and
consistent.

Comment thread asv/benchmarks/setup/bench_solver_mujoco.py Outdated
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
asv.conf.json (1)

22-22: SHA verified; packaging metadata present in pyproject.toml
Optional:

  • Add --no-deps to your pip install line to lock transitive dependencies:
-    "python -m pip install git+https://github.com/google-deepmind/mujoco_warp@64bb938f87b6aaf840753ef9ddc198db655020c0",
+    "python -m pip install --no-deps git+https://github.com/google-deepmind/mujoco_warp@64bb938f87b6aaf840753ef9ddc198db655020c0",
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f7bbae4 and b8d835a.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • asv.conf.json (1 hunks)
  • asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • asv/benchmarks/setup/bench_solver_mujoco.py

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
asv/benchmarks/setup/bench_solver_mujoco.py (5)

52-56: Fix Ruff F841 in timed section and ensure deterministic teardown.

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def time_initialize_solverMuJoCo(self, robot, num_envs):
-    _solver = SolverMuJoCo(self._model)
-    wp.synchronize_device()
+    _ = SolverMuJoCo(self._model)
+    wp.synchronize_device()
+    del _

73-78: Warm-up: same deterministic lifetime pattern as above.

-# Load a small model to cache the kernels
-solver = SolverMuJoCo(self._model)
-del solver
-
-wp.synchronize_device()
+# Warm up: instantiate once to JIT-compile kernels for this model shape
+_ = SolverMuJoCo(self._model)
+wp.synchronize_device()
+del _

79-83: Fix Ruff F841 in fast timed section and delete after sync.

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def time_initialize_solverMuJoCo(self, robot, num_envs):
-    _solver = SolverMuJoCo(self._model)
-    wp.synchronize_device()
+    _ = SolverMuJoCo(self._model)
+    wp.synchronize_device()
+    del _

38-45: Guard setup on CPU-only and finalize model on CUDA explicitly.

Without a guard, setup still creates GPU-dependent resources on CPU-only runners; also make the model device explicit to avoid accidental CPU placement skewing results.

 def setup(self, robot, num_envs):
-    wp.init()
-
-    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+    wp.init()
+    if wp.get_cuda_device_count() == 0:
+        # CPU-only: time method is skipped; avoid creating GPU-dependent resources.
+        return
+    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
 
     # finalize model
-    self._model = builder.finalize()
+    self._model = builder.finalize(device="cuda:0")

67-72: Mirror the CPU guard and explicit CUDA device in the fast benchmark setup.

 def setup(self, robot, num_envs):
-    wp.init()
-    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+    wp.init()
+    if wp.get_cuda_device_count() == 0:
+        return
+    builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
     # finalize model
-    self._model = builder.finalize()
+    self._model = builder.finalize(device="cuda:0")
🧹 Nitpick comments (3)
asv/benchmarks/setup/bench_solver_mujoco.py (3)

46-51: Warm-up: use throwaway name, sync, then delete for deterministic lifetime.

Also fix the stale comment (“small model”) to reflect actual behavior.

-# Load a small model to cache the kernels
-solver = SolverMuJoCo(self._model)
-del solver
-
-wp.synchronize_device()
+# Warm up: instantiate once to JIT-compile kernels for this model shape
+_ = SolverMuJoCo(self._model)
+wp.synchronize_device()
+del _

62-66: Consider setting a timeout for symmetry with KPI class.

Optional, but keeps benchmark configs consistent.

 rounds = 1
 number = 1
 repeat = 3
 min_run_count = 1
+timeout = 3600

90-93: Rename CLI keys for clarity (they benchmark SolverMuJoCo, not “Model”).

Purely cosmetic but reduces confusion when invoking via --bench.

 benchmark_list = {
-    "KpiInitializeModel": KpiInitializeSolverMuJoCo,
-    "FastInitializeModel": FastInitializeSolverMuJoCo,
+    "KpiInitializeSolverMuJoCo": KpiInitializeSolverMuJoCo,
+    "FastInitializeSolverMuJoCo": FastInitializeSolverMuJoCo,
 }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b8d835a and bbe1a78.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_solver_mujoco.py (3)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1094-2565)
asv/benchmarks/setup/bench_model.py (1)
  • setup (38-39)
newton/examples/example_mujoco.py (1)
  • create_model_builder (291-320)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_solver_mujoco.py (1)

29-30: Validate feasibility of large env counts across robots.

8192 envs for humanoid/quadruped may OOM on common GPUs; confirm target hardware or downscale for CI. If needed, we can split heavy robots into a separate KPI suite.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Comment thread asv/benchmarks/setup/bench_solver_mujoco.py Outdated
eric-heiden
eric-heiden previously approved these changes Sep 4, 2025

@eric-heiden eric-heiden left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM! Just a small nit on a comment string.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>
@adenzler-nvidia adenzler-nvidia enabled auto-merge (squash) September 5, 2025 06:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
asv/benchmarks/setup/bench_solver_mujoco.py (5)

61-65: Free the solver deterministically after timing.

Helps keep GPU memory steady across repeats.

     def time_initialize_solverMuJoCo(self, robot, num_envs):
-        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        wp.synchronize_device()
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _

88-91: Deterministic cleanup in fast timing too.

     def time_initialize_solverMuJoCo(self, robot, num_envs):
-        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        wp.synchronize_device()
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _

68-69: “Fast” still feels large; confirm intent or downscale.

128–256 envs for humanoid/ant/quadruped may still be heavy. Either rename to “Medium” or drop to, e.g., [64, 128].


47-60: Guard setup on CPU-only runners and finalize on CUDA explicitly.

Without a guard, setup will still instantiate the solver on CPU-only CI and likely fail. Also, finalize the model on a CUDA device to avoid skewing timings with CPU models. Replace the warm-up variable and order to avoid Ruff F841 and ensure deterministic lifetime after a device sync.

     def setup(self, robot, num_envs):
         wp.init()
-
-        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+        if wp.get_cuda_device_count() == 0:
+            # CPU-only: timing is skipped; avoid creating GPU-dependent resources.
+            return
+
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
 
         # finalize model
-        self._model = builder.finalize()
+        self._model = builder.finalize(device="cuda:0")
 
-        # Load the model to cache the kernels
-        solver = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        del solver
-
-        wp.synchronize_device()
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _

76-87: Mirror the CPU guard and explicit CUDA device in the fast benchmark setup.

Same failure mode as above; also fix the warm-up variable and ordering.

     def setup(self, robot, num_envs):
         wp.init()
-        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+        if wp.get_cuda_device_count() == 0:
+            return
+        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
         # finalize model
-        self._model = builder.finalize()
+        self._model = builder.finalize(device="cuda:0")
 
-        # Load the model to cache the kernels
-        solver = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
-        del solver
-
-        wp.synchronize_device()
+        # Warm up: instantiate once to JIT-compile kernels for this model shape
+        _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot])
+        wp.synchronize_device()
+        del _
🧹 Nitpick comments (4)
asv/benchmarks/setup/bench_solver_mujoco.py (4)

27-34: Revisit ncon_per_env for g1/h1 to avoid OOM at large env counts.

At 4096–8192 envs, 150 contacts/env yields extremely large allocations (nworld * ncon_per_env) and may OOM on 16–24GB GPUs. Consider scaling by env count, capping, or skipping large envs for these robots.

Would you like me to propose a helper (e.g., compute_ncon_per_env(robot, num_envs)) and wire it through setup/time for safer defaults?


19-21: Avoid global config side-effects at import time.

Setting wp.config globally on import can leak into other ASV modules in the same process. Move these into setup or guard with “if not already set”.

-wp.config.enable_backward = False
-wp.config.quiet = True
+wp.config.enable_backward = False  # consider moving into setup() to limit scope
+wp.config.quiet = True             # or gate with an env flag if needed

71-75: Add a timeout for the fast benchmark too.

Keeps runaway runs contained on constrained GPUs.

     rounds = 1
     number = 1
     repeat = 3
     min_run_count = 1
+    timeout = 1200

99-102: Rename CLI keys for clarity.

These benchmarks initialize the MuJoCo solver, not the model. Rename keys to match class names.

-        "KpiInitializeModel": KpiInitializeSolverMuJoCo,
-        "FastInitializeModel": FastInitializeSolverMuJoCo,
+        "KpiInitializeSolverMuJoCo": KpiInitializeSolverMuJoCo,
+        "FastInitializeSolverMuJoCo": FastInitializeSolverMuJoCo,
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bbe1a78 and b8f73b4.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_solver_mujoco.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_solver_mujoco.py (3)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1094-2565)
asv/benchmarks/setup/bench_model.py (1)
  • setup (38-39)
newton/examples/example_mujoco.py (1)
  • create_model_builder (291-320)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_solver_mujoco.py (1)

47-54: builder.finalize() already supports an optional device parameter – its signature is finalize(self, device: Devicelike | None = None, requires_grad: bool = False), so no changes are needed.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
@adenzler-nvidia

Copy link
Copy Markdown
Member Author

@eric-heiden I refactored the entire thing now to minimize benchmark runtime, integrating the solver setup benchmarks with the model initialization benchmarks.

I had to jump through a few hoops to make it work but I think it's worthwhile to keep the benchmark runtime low.

also tagging @shi-eric for visibility, mostly to make sure I'm not making an obvious mistake here..

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)

127-137: Same warm-up release issue as KPI.

Delete _model/builder and GC after warm-up to avoid cumulative memory use.

Apply:

             solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
             del solver
+            del _model
+            del builder
+            gc.collect()
 
             wp.synchronize_device()

141-159: Same per-repeat cleanup improvements as KPI.

Delete builder post-finalize and GC after releasing solver/model.

Apply:

                 model = builder.finalize()
                 wp.synchronize_device()
                 modelBuilder_end = time.perf_counter()
                 timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg)
+                del builder
                 ...
                 del solver
                 del model
+                gc.collect()
🧹 Nitpick comments (4)
asv/benchmarks/setup/bench_model.py (4)

27-36: Validate and document ncon_per_env overrides.

Hard-coding 150 for g1/h1 may be too low/high depending on contact-rich scenes and large num_envs. Please confirm against worst-case contacts per env, and add a short comment with rationale or derive this once from a 1-env probe.


73-92: Tighten per-repeat cleanup to avoid transient pressure.

Delete builder after finalize and GC after releasing solver/model.

Apply:

                 model = builder.finalize()
                 wp.synchronize_device()
                 modelBuilder_end = time.perf_counter()
                 timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg)
+                del builder
 
                 solver_beg = time.perf_counter()
                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
                 timings_solver.append(solver_end - solver_beg)
 
                 del solver
                 del model
+                gc.collect()

49-95: Deduplicate the timing logic.

Both classes duplicate setup_cache/loop code. Factor into a small helper to reduce drift and ease maintenance.

Example (sketch):

def _compute_timings(robots, envs, repeats):
    ...  # shared body here
    return timings

Then call it from each setup_cache with the class-specific params.

Also applies to: 117-164


39-48: Guard against OOM for largest configs.

4096/8192 envs for humanoid/g1/h1 can exceed common GPU RAM. Consider a try/except around finalize/solver to skip combos gracefully (record None and short-circuit track_*), or downscale per-robot env counts.

Also applies to: 108-116

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b8f73b4 and e23cb57.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_model.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
  • Example (185-320)
  • create_model_builder (291-320)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1094-2565)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
🔇 Additional comments (5)
asv/benchmarks/setup/bench_model.py (5)

49-58: Good use of setup_cache and cached timings.

ASV will pass the cache to track_* and peakmem_*; this aligns with asv’s API and reduces run time.


92-95: Median in ms is appropriate for stability.

Looks good.


97-106: track_time accessors look correct.

Key lookup, units, and signatures align with asv expectations.


117-126: Fast path setup mirrors KPI path correctly.

CUDA gate and cache structure are consistent.


175-185: peakmem hook signature change is correct.

Absorbing the cache arg with “_” keeps asv compatible; CPU-scoped finalize is fine.

Comment thread asv/benchmarks/setup/bench_model.py
… setup_cache

Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
newton/_src/utils/benchmark.py (4)

20-30: Add type hints and improve docstring for better maintainability.

The new helper function would benefit from type hints and a more comprehensive docstring to clarify its purpose and expected input/output types.

-def _convert_params_to_dict(params, param_names):
-    """Convert params to keyword arguments using param_names."""
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+def _convert_params_to_dict(
+    params: Union[Any, List, Tuple],
+    param_names: Optional[List[str]]
+) -> Optional[Dict[str, Any]]:
+    """Convert params to keyword arguments using param_names.
+    
+    Args:
+        params: Parameter values, can be a single value or tuple/list of values
+        param_names: List of parameter names to map to the values
+        
+    Returns:
+        Dictionary mapping parameter names to values, or None if no param_names
+    """

32-44: Add type hints and improve documentation for the parameter calling helper.

The function handles complex parameter passing logic that would benefit from explicit type annotations.

-def _call_with_params(method, param_dict, params, cached_data):
-    """Call a method with appropriate parameters (keyword or positional)."""
+def _call_with_params(
+    method: callable,
+    param_dict: Optional[Dict[str, Any]],
+    params: Tuple,
+    cached_data: Optional[Any]
+) -> Any:
+    """Call a method with appropriate parameters (keyword or positional).
+    
+    Args:
+        method: The method to call
+        param_dict: Keyword arguments if param_names was provided, None otherwise
+        params: Positional arguments to pass
+        cached_data: Cached data from setup_cache, passed as first positional arg
+        
+    Returns:
+        The result of the method call
+    """

74-75: Consider handling the enumerate index more explicitly.

While the variable i is used correctly, consider making its purpose clearer with a more descriptive name.

-    for i, params in enumerate(combinations):
+    for combination_index, params in enumerate(combinations):

Then update line 88:

-        if i == 0 and hasattr(benchmark_cls, "setup_cache"):
+        if combination_index == 0 and hasattr(benchmark_cls, "setup_cache"):

99-99: Consider improving parameter display for better readability.

When params is a tuple with complex objects, the print output might be hard to read. Consider formatting it more nicely.

-                print(f"\n[Benchmark] Running {benchmark_cls.__name__}.{attr} with parameters {params}")
+                param_str = ", ".join(str(p) for p in params) if isinstance(params, (list, tuple)) else str(params)
+                print(f"\n[Benchmark] Running {benchmark_cls.__name__}.{attr} with parameters ({param_str})")
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e23cb57 and 1c45bb3.

📒 Files selected for processing (1)
  • newton/_src/utils/benchmark.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
newton/_src/utils/benchmark.py

26-26: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)


29-29: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


63-63: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)


84-84: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)


85-85: for loop variable params overwritten by assignment target

(PLW2901)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (4)
newton/_src/utils/benchmark.py (4)

61-66: LGTM! Clever handling of parameter combinations.

The logic correctly distinguishes between multiple parameter lists (requiring cartesian product) and a single parameter list, which improves flexibility for benchmark configurations.


88-91: Good implementation of the caching mechanism.

The setup_cache is correctly called only once at the beginning, which aligns with the PR objective of minimizing benchmark runtime by avoiding redundant initialization.


29-29: Add strict=True to zip() call for Python 3.10+ compatibility.

Using strict=True ensures that the lengths of param_names and params match, catching configuration errors early.

-    return dict(zip(param_names, params))
+    return dict(zip(param_names, params, strict=True))

Note: If you need to maintain compatibility with Python < 3.10, you can add a length check instead:

if len(param_names) != len(params):
    raise ValueError(f"Length mismatch: {len(param_names)} param_names vs {len(params)} params")
return dict(zip(param_names, params))
⛔ Skipped due to learnings
Learnt from: dylanturpin
PR: newton-physics/newton#635
File: newton/examples/example_anymal_c_walk_on_sand.py:316-316
Timestamp: 2025-08-25T20:21:18.407Z
Learning: PEP 618 introduced the strict parameter to zip() in Python 3.10 with strict=False as the default. Adding explicit strict=False to zip() calls maintains existing behavior while satisfying linters that require explicit strict parameter specification in Python 3.10+ codebases.
Learnt from: dylanturpin
PR: newton-physics/newton#635
File: newton/_src/utils/gizmo.py:483-487
Timestamp: 2025-08-25T20:20:59.822Z
Learning: Adding `strict=False` explicitly to zip() calls in Python version upgrade PRs is appropriate for maintaining existing behavior while satisfying linters that expect the parameter to be explicit in Python 3.10+. This follows PEP 618 where strict=False is the default, so explicit usage doesn't change runtime behavior.

88-91: Cached_data compatibility verified
setup_cache returns a timings dict and all track_time_* methods accept it as their first parameter; no changes required.

Comment thread newton/_src/utils/benchmark.py Outdated
Comment on lines +84 to +86
if not isinstance(params, (list, tuple)):
params = (params,)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Variable reassignment could cause confusion.

The params variable is reassigned here, which overwrites the loop variable. This could lead to confusion about the variable's state.

-        # Ensure params is always a tuple for consistent handling
-        if not isinstance(params, (list, tuple)):
-            params = (params,)
+        # Ensure params is always a tuple for consistent handling
+        params_tuple = params if isinstance(params, (list, tuple)) else (params,)

Then update the subsequent uses of params to use params_tuple where the tuple form is needed:

-            _call_with_params(instance.setup, param_dict, params, cached_data)
+            _call_with_params(instance.setup, param_dict, params_tuple, cached_data)

And similarly for lines 105, 111, and 117.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not isinstance(params, (list, tuple)):
params = (params,)
# Ensure params is always a tuple for consistent handling
- if not isinstance(params, (list, tuple)):
params_tuple = params if isinstance(params, (list, tuple)) else (params,)
param_dict = dict(zip(_names, params))
_call_with_params(instance.setup, param_dict, params_tuple, cached_data)
🧰 Tools
🪛 Ruff (0.12.2)

84-84: Use X | Y in isinstance call instead of (X, Y)

Convert to X | Y

(UP038)


85-85: for loop variable params overwritten by assignment target

(PLW2901)

Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
newton/_src/utils/benchmark.py (1)

82-85: Fix: params_tuple may be undefined when params is already a list/tuple

Define params_tuple in both branches to avoid UnboundLocalError on subsequent calls.

-        # Ensure params is always a tuple for consistent handling
-        if not isinstance(params, (list | tuple)):
-            params_tuple = (params,)
+        # Ensure params is always a tuple for consistent handling
+        params_tuple = tuple(params) if isinstance(params, (list | tuple)) else (params,)
🧹 Nitpick comments (3)
newton/_src/utils/benchmark.py (3)

20-30: Validate param_names/params length to prevent silent mismatches

Guard against dropped/extra parameters; avoid relying on zip(strict=False).

 def _convert_params_to_dict(params, param_names):
     """Convert params to keyword arguments using param_names."""
     if not param_names:
         return None
 
     # Handle single value case by wrapping in tuple
     if not isinstance(params, (list | tuple)):
         params = (params,)
 
-    return dict(zip(param_names, params, strict=False))
+    if len(param_names) != len(params):
+        raise ValueError(
+            f"param_names length ({len(param_names)}) != params length ({len(params)})"
+        )
+    return dict(zip(param_names, params))

87-90: Reduce noise: print setup_cache message only when printing results

Keeps output quiet for programmatic runs.

-            print(f"\n[Benchmark] Running {benchmark_cls.__name__}.setup_cache")
+            if print_results:
+                print(f"\n[Benchmark] Running {benchmark_cls.__name__}.setup_cache")

100-115: Gracefully skip NotImplemented benchmarks; avoid ZeroDivisionError

Catching NotImplementedError lets the runner continue; also skip averaging when there are no samples.

                 if attr.startswith("time_"):
                     # Run timing benchmarks multiple times and measure elapsed time.
                     for _ in range(number):
                         start = time.perf_counter()
-                        _call_with_params(method, param_dict, params_tuple, cached_data)
-                        t = time.perf_counter() - start
-                        samples.append(t)
+                        try:
+                            _call_with_params(method, param_dict, params_tuple, cached_data)
+                            t = time.perf_counter() - start
+                            samples.append(t)
+                        except NotImplementedError:
+                            samples.clear()
+                            break
                 elif attr.startswith("track_"):
                     # Run tracking benchmarks multiple times and record returned values.
                     for _ in range(number):
-                        val = _call_with_params(method, param_dict, params_tuple, cached_data)
-                        samples.append(val)
+                        try:
+                            val = _call_with_params(method, param_dict, params_tuple, cached_data)
+                            samples.append(val)
+                        except NotImplementedError:
+                            samples.clear()
+                            break
                 # Compute the average result.
-                avg = sum(samples) / len(samples)
-                results[(attr, params)] = avg
+                if not samples:
+                    if print_results:
+                        print(f"[Benchmark] Skipping {benchmark_cls.__name__}.{attr} (NotImplemented).")
+                    continue
+                avg = sum(samples) / len(samples)
+                results[(attr, params)] = avg
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1c45bb3 and 7e5c676.

📒 Files selected for processing (1)
  • newton/_src/utils/benchmark.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
newton/_src/utils/benchmark.py (2)
asv/benchmarks/compilation/bench_example_load.py (6)
  • setup (32-34)
  • setup (60-62)
  • setup (87-89)
  • setup (114-116)
  • setup (142-144)
  • setup (170-172)
asv/benchmarks/simulation/bench_cloth.py (2)
  • setup (32-34)
  • setup (47-49)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (1)
newton/_src/utils/benchmark.py (1)

32-44: Ignore name-based signature checks for cached_data
Benchmark methods uniformly expect the cached data as the first positional argument (often named timings), so probing for a parameter literally named cached_data or for varargs will break the existing benchmarks. Leave the unconditional positional prepend in place.

Likely an incorrect or invalid review comment.

Comment on lines +61 to 67
# If param_lists contains multiple lists, generate all combinations
# If it's a single list, just use it directly
if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists):
combinations = itertools.product(*param_lists)
else:
combinations = param_lists
else:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid Cartesian product for already-expanded parameter sets

Current check will incorrectly product a list-of-tuples like [('ant', 32), ('humanoid', 64)]. Gate the product by matching param_names length.

-        # If param_lists contains multiple lists, generate all combinations
-        # If it's a single list, just use it directly
-        if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists):
-            combinations = itertools.product(*param_lists)
-        else:
-            combinations = param_lists
+        # If params are given as axes (one list per param_name), take the Cartesian product.
+        # Otherwise, treat as a sequence of already-expanded parameter sets.
+        param_names_for_axes = getattr(benchmark_cls, "param_names", None)
+        if (
+            isinstance(param_lists, (list | tuple))
+            and param_names_for_axes
+            and len(param_lists) == len(param_names_for_axes)
+            and all(isinstance(item, (list | tuple)) for item in param_lists)
+        ):
+            combinations = itertools.product(*param_lists)
+        else:
+            combinations = param_lists
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# If param_lists contains multiple lists, generate all combinations
# If it's a single list, just use it directly
if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists):
combinations = itertools.product(*param_lists)
else:
combinations = param_lists
else:
# If params are given as axes (one list per param_name), take the Cartesian product.
# Otherwise, treat as a sequence of already-expanded parameter sets.
param_names_for_axes = getattr(benchmark_cls, "param_names", None)
if (
isinstance(param_lists, (list | tuple))
and param_names_for_axes
and len(param_lists) == len(param_names_for_axes)
and all(isinstance(item, (list | tuple)) for item in param_lists)
):
combinations = itertools.product(*param_lists)
else:
combinations = param_lists
🤖 Prompt for AI Agents
In newton/_src/utils/benchmark.py around lines 61 to 67, the current logic uses
itertools.product whenever param_lists has length >1 and all items are
list/tuple, which incorrectly expands already-paired parameter tuples like
[('ant', 32), ('humanoid', 64)]; change the condition to only use
itertools.product when the number of param_lists matches the number of
param_names (i.e., each param name has its own list of candidate values),
otherwise treat param_lists as already-expanded combinations and use it
directly. Ensure the check compares len(param_lists) == len(param_names) (and
optionally that each element is a sequence) before calling itertools.product.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)

64-71: Warm-up cleanup for KPI looks correct and resolves prior OOM risk.

del solver/_model/builder + gc.collect() + wp.synchronize_device() is in place. Good fix.


132-142: Free warm-up model/builder in FastInitializeModel to prevent memory growth across (robot, num_envs).

_model and builder persist after warm-up; this can accumulate. Mirror KPI cleanup and sync after deletes.

-            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
-            del solver
-
-            wp.synchronize_device()
+            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
+            del solver
+            del _model
+            del builder
+            gc.collect()
+            wp.synchronize_device()
🧹 Nitpick comments (5)
asv/benchmarks/setup/bench_model.py (5)

23-24: Avoid global Warp config side effects; set per-benchmark.

Move wp.config.enable_backward = False and wp.config.quiet = True into each setup_cache to prevent global impact when this module is imported.

Apply:

- wp.config.enable_backward = False
- wp.config.quiet = True

And add at the beginning of both setup methods:

@@
-    def setup_cache(self):
+    def setup_cache(self):
+        wp.config.enable_backward = False
+        wp.config.quiet = True
         wp.init()

146-165: Delete builder inside the repeat loop (Fast) to keep host memory flat.

Currently only solver and model are deleted.

                 del solver
                 del model
+                del builder
+                gc.collect()

29-36: Document/verify nconmax heuristics.

If 150 is empirical for g1/h1, add a brief comment; otherwise consider deriving from model or leaving None for consistency.


117-121: Align benchmark metadata across classes.

Optional: add timeout to FastInitializeModel for consistency with KPI.

     rounds = 1
     number = 1
     repeat = 3
     min_run_count = 1
+    timeout = 1200

180-191: Release builder in peakmem bench to avoid skewing measurements.

Delete builder and force GC after finalizing/del model.

         with wp.ScopedDevice("cpu"):
             builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

             # finalize model
             model = builder.finalize()

         del model
+        del builder
+        gc.collect()
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7e5c676 and ae94ce6.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_model.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (1)
  • create_model_builder (291-320)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1094-2565)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_model.py (1)

52-54: The current directory listing isn’t required for resolving whether ASV treats NotImplementedError as a skip. Instead, confirm via ASV documentation or test harness.
Next step: search ASV source or docs for exception handling semantics of NotImplementedError.

@eric-heiden eric-heiden requested a review from shi-eric September 5, 2025 19:10
Comment thread asv/benchmarks/setup/bench_model.py Outdated
timings_solver = []

for _ in range(self.repeat):
modelBuilder_beg = time.perf_counter()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variable names could be improved, e.g. use snake case model_builder and don't abbreviate beg since we're not being charged by the character

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, you're totally right. Will improve this.

timeout = 3600

def setup(self, robot, num_envs):
def setup_cache(self):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the logic for using setup_cache. A track_ benchmark already runs once total so it seems less confusing to put specific quantity you want to measure in the track_ function itself instead of doing everything in setup_cache.

I would also recommend trying to make things work with as time_ benchmarks if possible since they will also report some measure of variability (in addition to collecting multiple samples) and report them in the summary tables

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. It feels really overengineered to me as well..

The main thing I wanted to accomplish is to avoid having to run the modelBuilder setup for each run of the SolverMuJoCo initialization because of the way it impacts the total time needed to run the benchmarks.

I would strongly prefer using time_* benchmarks, but for both time_ and track_ I cannot assume any ordering of the runs. I also cannot measure 2 things in both of these, so I need a way to combine it. setup_cache seemed to be the logical place because it only runs once, and I can forward the data to the track_ functions.

Please correct me if my assumptions are wrong here.

Thinking of it, maybe it's easiest to just leave the existing model benchmark as-is, and revert to the MuJoCo specific one that does the modelBuilder stuff in setup_cache to avoid running it for every repeat. I think we can pay that price.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate question you might know the answer to - is the warp cache cleared for each repeat or does it persist?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played some more with this. I tried to move the mujoco parts to it's own file again and cache the model. That doesn't really work because setup_cache isn't parametrizable, so I would end up implementing the same thing I did here with the combined solution.

The other option is obviously also to just forget about optimizing the runtime and revert back to the simple version that was already approved by Eric before. I'm fine with that as well.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)

132-141: Missing resource cleanup compared to KpiInitializeModel.

The FastInitializeModel class is missing the del _model and gc.collect() cleanup that exists in KpiInitializeModel after the kernel preloading step. This could lead to memory accumulation across iterations.

Apply this diff to add consistent resource cleanup:

             # Load the model to cache the kernels
             solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
             del solver
+            del _model
+            del builder
+            gc.collect()

             wp.synchronize_device()

76-100: Fix variable name inconsistencies.

Several variable naming issues need to be addressed:

  1. modelBuilder_beg should use snake_case as model_builder_begin
  2. The timing collection uses inconsistent variable names

Apply this diff to fix the variable naming:

             for _ in range(self.repeat):
-                modelBuilder_beg = time.perf_counter()
+                model_builder_begin = time.perf_counter()
                 builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

                 # finalize model
                 model = builder.finalize()
                 wp.synchronize_device()
-                modelBuilder_end = time.perf_counter()
-                timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg)
+                model_builder_end = time.perf_counter()
+                timings_modelBuilder.append(model_builder_end - model_builder_begin)

-                solver_beg = time.perf_counter()
+                solver_begin = time.perf_counter()
                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
-                timings_solver.append(solver_end - solver_beg)
+                timings_solver.append(solver_end - solver_begin)
🧹 Nitpick comments (2)
asv/benchmarks/setup/bench_model.py (2)

29-36: Consider making the constraint configuration more explicit.

The nconmax dictionary mixes None values with specific integers, making the intent unclear. Consider documenting why some robots use None while others have specific values, or use a more explicit configuration structure.

Add a comment explaining the configuration:

+# Per-robot constraint limits for SolverMuJoCo initialization
+# None = use solver's automatic estimation, int = explicit limit
 nconmax = {
     "humanoid": None,
     "g1": 150,
     "h1": 150,
     "cartpole": None,
     "ant": None,
     "quadruped": None,
 }

180-180: Unused parameter should be named with underscore prefix.

The first parameter appears to be unused cached data. Consider making this more explicit by using a descriptive name or double underscore prefix.

Apply this diff for better clarity:

-    def peakmem_initialize_model_cpu(self, _, robot, num_envs):
+    def peakmem_initialize_model_cpu(self, _cached_data, robot, num_envs):
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae94ce6 and 92a9db2.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_model.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
  • Example (185-320)
  • create_model_builder (291-320)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1094-2565)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_model.py

156-156: Local variable solver_begin is assigned to but never used

Remove assignment to unused variable solver_begin

(F841)


160-160: Undefined name solver_beg

(F821)


165-165: Undefined name timings_modelBuilder

(F821)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (3)
asv/benchmarks/setup/bench_model.py (3)

16-28: LGTM! Appropriate imports added for the new functionality.

The new imports (itertools, time, numpy) and SolverMuJoCo are correctly imported and necessary for the benchmark caching implementation.


49-72: LGTM! Well-structured setup caching with proper resource management.

The setup_cache method correctly implements kernel preloading and resource cleanup to optimize benchmark runtime. The CUDA availability check prevents runtime errors in environments without GPU support.


102-110: LGTM! Clean implementation of cached timing retrieval.

The track method correctly retrieves cached solver timing data and properly specifies the unit as milliseconds.

Comment thread asv/benchmarks/setup/bench_model.py
@adenzler-nvidia

Copy link
Copy Markdown
Member Author

@eric-heiden would you mind taking this over in the context of #548 ?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
asv/benchmarks/setup/bench_model.py (1)

1-221: Fix undefined name and incorrect deletions in FastInitializeModel.setup_cache

  • At line 163, change timings_solver.append(solver_end - solver_beg) to use the correct variable solver_begin.
  • At lines 93, 140, 166, and 194, replace del model with del _model to match the variable initialized by builder.finalize().
♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)

59-72: Warm-up and cleanup looks good.

Preloading kernels and freeing _model/builder between (robot, num_envs) avoids memory growth and noisy timings. Nice.


159-164: Undefined name solver_beg → use solver_begin.

Typo breaks timing; use the variable you define.

                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
-                timings_solver.append(solver_end - solver_beg)
+                timings_solver.append(solver_end - solver_begin)
🧹 Nitpick comments (5)
asv/benchmarks/setup/bench_model.py (5)

185-196: Also delete builder in peakmem benchmark to avoid lingering allocations.

Minor hygiene; helps stabilizing RSS measurements.

     with wp.ScopedDevice("cpu"):
         builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
 
         # finalize model
         model = builder.finalize()
 
     del model
+    del builder
+    gc.collect()

65-65: Defensive access for nconmax.

Future robot additions won’t crash the benchmark if the map isn’t updated.

-            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
+            solver = SolverMuJoCo(_model, ncon_per_env=nconmax.get(robot))
-                solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+                solver = SolverMuJoCo(model, ncon_per_env=nconmax.get(robot))

Also applies to: 87-87, 138-138, 160-160


117-121: Consider adding timeout like KPI class for consistency.

Keeps runner behavior uniform across benchmarks.

     rounds = 1
     number = 1
     repeat = 3
     min_run_count = 1
+    timeout = 3600

23-25: Module-wide Warp config can leak to other benchmarks.

Optional: set/restore these in setup_cache to localize side effects.


59-101: DRY: factor shared benchmarking loop.

Both classes duplicate the same warm-up/measure/cleanup logic; a small helper would reduce maintenance.

If helpful, I can sketch a _collect_timings(robots, envs, repeat) helper and integrate it into both classes.

Also applies to: 132-173

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92a9db2 and 12b12df.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_model.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (1)
  • create_model_builder (300-329)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1088-2559)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_model.py

140-140: Undefined name model

(F821)


159-159: Local variable solver_begin is assigned to but never used

Remove assignment to unused variable solver_begin

(F841)


163-163: Undefined name solver_beg

(F821)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_model.py (1)

52-54: Confirm skip semantics when CUDA is unavailable.

Raising NotImplementedError is fine if your run_benchmark treats it as SKIPPED; otherwise the run will error. Please confirm behavior.

Also applies to: 125-127

Comment thread asv/benchmarks/setup/bench_model.py
Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)

59-72: Warm-up + explicit cleanup looks good; prevents memory growth across sweeps.

Preloading kernels, deleting solver/model/builder, gc.collect(), and synchronizing are correctly in place.


159-164: Fix NameError in timing: use solver_begin, not solver_beg.

This currently crashes and also triggers Ruff F821/F841. Use the correct variable.

                 solver_begin = time.perf_counter()
                 solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
                 wp.synchronize_device()
                 solver_end = time.perf_counter()
-                timings_solver.append(solver_end - solver_beg)
+                timings_solver.append(solver_end - solver_begin)
🧹 Nitpick comments (2)
asv/benchmarks/setup/bench_model.py (2)

185-196: Release builder and force GC in peakmem path (post-measurement cleanup).

Minor cleanup parity with other paths; avoids lingering CPU allocations after peakmem sampling.

     with wp.ScopedDevice("cpu"):
         builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

         # finalize model
         model = builder.finalize()

     del model
+    del builder
+    gc.collect()

39-111: Optional: de-duplicate common benchmarking logic between classes.

Both classes share identical loops; extracting a small helper (e.g., _measure_timings(robots, envs, repeat)) would reduce maintenance and keep fixes (like the NameError above) in one place.

Also applies to: 113-184

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12b12df and 2e96c89.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_model.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
  • Example (191-329)
  • create_model_builder (300-329)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1088-2559)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_model.py

159-159: Local variable solver_begin is assigned to but never used

Remove assignment to unused variable solver_begin

(F841)


163-163: Undefined name solver_beg

(F821)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (4)
asv/benchmarks/setup/bench_model.py (4)

76-85: Timing instrumentation is correct.

Includes builder creation and finalize; device syncs bracket timings appropriately. Same for solver init.

Also applies to: 86-91


97-101: Returning cached medians in ms via track_ methods: LGTM.

Interface aligns with run_benchmark’s cached-arg calling convention.

Also applies to: 102-111


132-145: FastInitializeModel warm-up/cleanup mirrors KPI path correctly.

Consistent kernel preload and cleanup sequence.


52-54: Confirm ASV skip on CUDA unavailable.
No except NotImplementedError is present in newton/_src/utils/benchmark.py (around the run_benchmark definition), so verify that ASV’s runner will catch the exception raised in setup_cache and skip the benchmark when CUDA isn’t available to prevent CPU-only CI failures.

Signed-off-by: Alain Denzler <adenzler@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
asv/benchmarks/setup/bench_model.py (5)

76-78: Synchronize before starting the timed region to isolate measurements.

Insert a device sync before each timed block to avoid overlap from prior work influencing the next measurement. This does not affect timing since it’s outside the measured window.

-            for _ in range(self.repeat):
-                model_builder_begin = time.perf_counter()
+            for _ in range(self.repeat):
+                wp.synchronize_device()
+                model_builder_begin = time.perf_counter()
-            for _ in range(self.repeat):
-                model_builder_begin = time.perf_counter()
+            for _ in range(self.repeat):
+                wp.synchronize_device()
+                model_builder_begin = time.perf_counter()

Also applies to: 149-151


29-36: Use a constant-style name and clarify None semantics for ncon_per_env.

Minor naming/readability nit: make it an ALL_CAPS module constant and keep usage consistent. Also add a short comment that None falls back to model.rigid_contact_max in SolverMuJoCo.

-nconmax = {
+NCON_PER_ENV = {
     "humanoid": None,
     "g1": 150,
     "h1": 150,
     "cartpole": None,
     "ant": None,
     "quadruped": None,
 }
-            solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot])
+            solver = SolverMuJoCo(_model, ncon_per_env=NCON_PER_ENV[robot])
-                solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+                solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])
-            solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+            solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])
-                solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot])
+                solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])

Also applies to: 65-66, 87-88, 138-141, 160-161


102-106: Prefer snake_case and consistent naming for track methods.

Rename for consistency with Python style and the class name: solver_mujoco.

-def track_time_initialize_solverMuJoCo(self, timings, robot, num_envs):
+def track_time_initialize_solver_mujoco(self, timings, robot, num_envs):
     return timings["solver"][(robot, num_envs)]

-track_time_initialize_solverMuJoCo.unit = "ms"
+track_time_initialize_solver_mujoco.unit = "ms"
-def track_time_initialize_solverMuJoCo(self, timings, robot, num_envs):
+def track_time_initialize_solver_mujoco(self, timings, robot, num_envs):
     return timings["solver"][(robot, num_envs)]

-track_time_initialize_solverMuJoCo.unit = "ms"
+track_time_initialize_solver_mujoco.unit = "ms"

Also applies to: 175-179


149-169: De-duplicate nearly identical benchmarking logic between classes.

The warm-up, timing, and cleanup loops are duplicated. Factor into a shared helper to reduce drift and maintenance.

Example helper (add at module scope):

def _measure_init_timings(robot_list, env_list, repeat):
    timings = {"model_builder": {}, "solver": {}}
    for robot, num_envs in itertools.product(robot_list, env_list):
        builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
        _model = builder.finalize()
        solver = SolverMuJoCo(_model, ncon_per_env=NCON_PER_ENV[robot])
        del solver, _model, builder
        gc.collect(); wp.synchronize_device()

        t_mb, t_sv = [], []
        for _ in range(repeat):
            wp.synchronize_device()
            t0 = time.perf_counter()
            builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
            model = builder.finalize(); wp.synchronize_device()
            t_mb.append(time.perf_counter() - t0)

            t1 = time.perf_counter()
            solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])
            wp.synchronize_device()
            t_sv.append(time.perf_counter() - t1)

            del solver, model, builder
            gc.collect(); wp.synchronize_device()

        timings["model_builder"][(robot, num_envs)] = np.median(t_mb) * 1000
        timings["solver"][(robot, num_envs)] = np.median(t_sv) * 1000
    return timings

Then in each setup_cache:

return _measure_init_timings(self.params[0], self.params[1], self.repeat)

Also applies to: 76-97, 132-173, 49-101


188-196: Free the builder in peakmem CPU path as well.

Small cleanup to avoid retaining host-side allocations in memory measurements.

 with wp.ScopedDevice("cpu"):
     builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)

     # finalize model
     model = builder.finalize()

 del model
+del builder
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e96c89 and 05f26c8.

📒 Files selected for processing (1)
  • asv/benchmarks/setup/bench_model.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
  • Example (191-329)
  • create_model_builder (300-329)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
  • SolverMuJoCo (1088-2559)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (2)
asv/benchmarks/setup/bench_model.py (2)

59-72: Warm-up + cleanup looks solid.

Kernel preloading plus explicit deletion and sync before measurement is the right tradeoff for stable timings.

Also applies to: 132-145


29-36: Confirm ncon_per_env values for g1/h1 are safe at high env counts.

Hard-coding 150 may under- or over-allocate depending on contact richness. Please verify against ROBOT_CONFIGS and/or empirical rigid_contact_max.

You can locate any configured per-robot contact limits and cross-check:

#!/bin/bash
# Show where ncon/contact-related caps are defined for MuJoCo examples
rg -n -C2 -S '(ROBOT_CONFIGS|nconmax|ncon_per_env|rigid_contact_max)' newton

If uncertainty remains, consider using None (fallback to model.rigid_contact_max) or a conservative per-robot cap derived from Example builders.

Comment on lines +92 to +97
del solver
del model
del builder
gc.collect()

timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure GPU memory is actually reclaimed between repeats (add sync after gc).

Without a post-gc device sync, freeing large MJ/solver buffers can lag and accumulate across repeats, risking OOM at 4096/8192 envs. Add a sync after cleanup (outside the timed region).

                 del solver
                 del model
                 del builder
                 gc.collect()
+                wp.synchronize_device()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
del solver
del model
del builder
gc.collect()
timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000
del solver
del model
del builder
gc.collect()
wp.synchronize_device()
timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_model.py around lines 92 to 97, GPU memory from
MJ/solver buffers may not be reclaimed immediately after deleting
solver/model/builder and calling gc.collect(), so add an explicit device
synchronization call (e.g. torch.cuda.synchronize() if using PyTorch)
immediately after gc.collect() and before recording timings to ensure freed GPU
memory is actually released between repeats; place this sync outside the timed
region so it does not affect benchmark measurements.

Comment on lines +165 to +169
del solver
del model
del builder
gc.collect()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Mirror the post-gc device sync in the fast benchmark too.

Same memory pressure risk as above; add a sync here as well.

             del solver
             del model
             del builder
             gc.collect()
+            wp.synchronize_device()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
del solver
del model
del builder
gc.collect()
del solver
del model
del builder
gc.collect()
wp.synchronize_device()
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_model.py around lines 165 to 169, after deleting
solver/model/builder and calling gc.collect() in the fast benchmark you must
mirror the post-gc device synchronization done elsewhere: add a device sync call
(e.g., call the CUDA/device synchronize function used elsewhere in the file)
immediately after gc.collect(). Ensure the sync is conditional on the
device/backend being used (only call if CUDA/accelerator is available) and
import or reference the same sync utility already used in the file to keep
behavior consistent.

@eric-heiden

Copy link
Copy Markdown
Member

Closing this PR, since #778 is adding solver initialization benchmarks (and merges this PR branch).

auto-merge was automatically disabled September 18, 2025 17:04

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants