SolverMuJoCo: Benchmarks for solver initialization#733
SolverMuJoCo: Benchmarks for solver initialization#733adenzler-nvidia wants to merge 33 commits into
Conversation
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
📝 WalkthroughWalkthroughAdds benchmark caching and parameterized invocation: run_benchmark can call setup_cache once to return cached_data; params may be converted to dicts when param_names provided; benchmark calls route through a wrapper that injects cached_data; model benchmarks preload SolverMuJoCo kernels per (robot, num_envs) and record median ModelBuilder and SolverMuJoCo timings. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant ASV as ASV Runner
participant RUN as run_benchmark
participant BEN as Benchmark (Kpi/Fast)
participant CUDA as CUDA
participant MB as ModelBuilder
participant SMJ as SolverMuJoCo
Note over RUN,BEN: param combination & cached setup
ASV->>RUN: start benchmark(params)
RUN->>BEN: compute param combos
RUN->>BEN: call setup_cache() -- first combo
BEN->>CUDA: check availability
alt CUDA unavailable
BEN-->>RUN: raise NotImplementedError
else CUDA available
loop for each (robot,num_envs)
BEN->>MB: build + finalize model
BEN->>SMJ: instantiate solver (preload kernels)
BEN->>CUDA: synchronize device
BEN-->>BEN: record median timings -> timings[(robot,num_envs)]
BEN->>BEN: cleanup
end
BEN-->>RUN: return cached timings
end
RUN->>BEN: _call_with_params(track_time_initialize_model, param_dict, params, cached_data)
BEN-->>RUN: return model timing (ms)
RUN->>BEN: _call_with_params(track_time_initialize_solverMuJoCo, ...)
BEN-->>RUN: return solver timing (ms)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
asv/benchmarks/setup/bench_solver_mujoco.py (4)
45-49: Comment does not match code behavior.
The comment says “Load a small model,” but you instantiate the solver withself._model(which may be very large). Either update the comment or actually use a small model for warm-up.Apply one of:
- Update comment (preferred, since kernel specialization often depends on shape):
- # Load a small model to cache the kernels + # Warm up: instantiate once to JIT-compile kernels for this model shape
- Or construct a tiny warm-up model (if acceptable).
16-16: Remove unused import.
gcis not used in this module.-import gc
27-35: Optional: sanity-check large param sizes.
num_envsof 4096/8192 can OOM typical 16–24GB GPUs during solver construction. If CI runs these benches, consider adding an environment toggle or reducing defaults.
52-52: Optional: align method names with PEP 8 / other benches.
Considertime_initialize_solver_mujocofor consistency withtime_initialize_model.Also applies to: 79-79
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
asv/benchmarks/setup/bench_model.py(1 hunks)asv/benchmarks/setup/bench_solver_mujoco.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_solver_mujoco.py (2)
asv/benchmarks/setup/bench_model.py (2)
setup(36-37)setup_cache(57-61)newton/examples/example_mujoco.py (1)
create_model_builder(296-325)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_solver_mujoco.py
53-53: Local variable solver is assigned to but never used
Remove assignment to unused variable solver
(F841)
80-80: Local variable solver is assigned to but never used
Remove assignment to unused variable solver
(F841)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_model.py (1)
81-81: No-op whitespace change; OK to keep.
No functional impact. Safe to merge as-is.
| def setup(self, robot, num_envs): | ||
| wp.init() | ||
|
|
||
| builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) | ||
|
|
||
| # finalize model | ||
| self._model = builder.finalize() | ||
|
|
||
| # Load a small model to cache the kernels | ||
| solver = SolverMuJoCo(self._model) | ||
| del solver | ||
|
|
||
| wp.synchronize_device() | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Guard setup on CPU-only runners and finalize on CUDA explicitly.
Without a guard, setup will still instantiate the solver on CPU-only environments even though the timing method is skipped, likely breaking CI runners without GPUs. Also, make the model device explicit to avoid accidental CPU models skewing solver init times.
Apply:
def setup(self, robot, num_envs):
wp.init()
-
- builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+ if wp.get_cuda_device_count() == 0:
+ # CPU-only: time method is skipped; avoid creating GPU-dependent resources.
+ return
+
+ builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
# finalize model
- self._model = builder.finalize()
+ self._model = builder.finalize(device="cuda:0")
- # Load a small model to cache the kernels
- solver = SolverMuJoCo(self._model)
- del solver
+ # Warm up: instantiate once to JIT-compile kernels for this model shape
+ solver = SolverMuJoCo(self._model)
+ del solver
wp.synchronize_device()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def setup(self, robot, num_envs): | |
| wp.init() | |
| builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) | |
| # finalize model | |
| self._model = builder.finalize() | |
| # Load a small model to cache the kernels | |
| solver = SolverMuJoCo(self._model) | |
| del solver | |
| wp.synchronize_device() | |
| def setup(self, robot, num_envs): | |
| wp.init() | |
| if wp.get_cuda_device_count() == 0: | |
| # CPU-only: time method is skipped; avoid creating GPU-dependent resources. | |
| return | |
| builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) | |
| # finalize model | |
| self._model = builder.finalize(device="cuda:0") | |
| # Warm up: instantiate once to JIT-compile kernels for this model shape | |
| solver = SolverMuJoCo(self._model) | |
| del solver | |
| wp.synchronize_device() |
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_solver_mujoco.py around lines 37-50, guard the
setup so it does nothing on CPU-only runners and ensure the model is finalized
on CUDA before creating the solver: check for a CUDA device at the start of
setup (e.g., if wp.get_device_count("cuda") == 0: return), then call
builder.finalize specifying the CUDA device (e.g., finalize(device="cuda:0") or
finalize(device=wp.Device("cuda", 0))) before instantiating SolverMuJoCo, so the
solver creation only runs when CUDA is available and the model is explicitly on
the GPU.
| @skip_benchmark_if(wp.get_cuda_device_count() == 0) | ||
| def time_initialize_solverMuJoCo(self, robot, num_envs): | ||
| solver = SolverMuJoCo(self._model) | ||
| wp.synchronize_device() | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fix Ruff F841 and ensure deterministic lifetime in the timed section.
Variable solver is unused; switch to _ and explicitly delete after sync.
Apply:
- solver = SolverMuJoCo(self._model)
- wp.synchronize_device()
+ _ = SolverMuJoCo(self._model)
+ wp.synchronize_device()
+ del _📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @skip_benchmark_if(wp.get_cuda_device_count() == 0) | |
| def time_initialize_solverMuJoCo(self, robot, num_envs): | |
| solver = SolverMuJoCo(self._model) | |
| wp.synchronize_device() | |
| @skip_benchmark_if(wp.get_cuda_device_count() == 0) | |
| def time_initialize_solverMuJoCo(self, robot, num_envs): | |
| _ = SolverMuJoCo(self._model) | |
| wp.synchronize_device() | |
| del _ |
🧰 Tools
🪛 Ruff (0.12.2)
53-53: Local variable solver is assigned to but never used
Remove assignment to unused variable solver
(F841)
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_solver_mujoco.py around lines 51 to 55, the timed
section assigns SolverMuJoCo to an unused variable (causing Ruff F841) and
doesn't guarantee deterministic cleanup; replace the unused name with an
underscore (e.g., _ = SolverMuJoCo(self._model)), call wp.synchronize_device(),
then explicitly delete the instance with del _ immediately after sync to ensure
deterministic lifetime and satisfy the linter.
| def setup_cache(self, robot, num_envs): | ||
| wp.init() | ||
| builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) | ||
| # finalize model | ||
| self._model = builder.finalize() | ||
|
|
||
| # Load a small model to cache the kernels | ||
| solver = SolverMuJoCo(self._model) | ||
| del solver | ||
|
|
||
| wp.synchronize_device() | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Mirror the CPU guard and explicit CUDA device in setup_cache.
Avoid GPU work on CPU-only runners and make device explicit for consistency with KPI class.
Apply:
def setup_cache(self, robot, num_envs):
- wp.init()
- builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
+ wp.init()
+ if wp.get_cuda_device_count() == 0:
+ return
+ builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123)
# finalize model
- self._model = builder.finalize()
+ self._model = builder.finalize(device="cuda:0")
- # Load a small model to cache the kernels
- solver = SolverMuJoCo(self._model)
- del solver
+ # Warm up: instantiate once to JIT-compile kernels for this model shape
+ solver = SolverMuJoCo(self._model)
+ del solver
wp.synchronize_device()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def setup_cache(self, robot, num_envs): | |
| wp.init() | |
| builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) | |
| # finalize model | |
| self._model = builder.finalize() | |
| # Load a small model to cache the kernels | |
| solver = SolverMuJoCo(self._model) | |
| del solver | |
| wp.synchronize_device() | |
| def setup_cache(self, robot, num_envs): | |
| wp.init() | |
| if wp.get_cuda_device_count() == 0: | |
| return | |
| builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) | |
| # finalize model | |
| self._model = builder.finalize(device="cuda:0") | |
| # Warm up: instantiate once to JIT-compile kernels for this model shape | |
| solver = SolverMuJoCo(self._model) | |
| del solver | |
| wp.synchronize_device() |
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_solver_mujoco.py around lines 66-77, setup_cache
currently always initializes GPU work; add the same CPU guard and explicit CUDA
device selection used in the KPI class: first check if the runtime is CPU-only
and return early (e.g. if wp.get_device().is_cpu or wp.context.is_cpu then
return), otherwise set the CUDA device explicitly (e.g. select "cuda:0" via the
same wp API your KPI class uses) before calling wp.init(), build/finalize the
model and create/delete SolverMuJoCo, then call wp.synchronize_device(); this
avoids GPU calls on CPU-only runners and makes the device explicit and
consistent.
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
…/solver-creation-benchmarks
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
asv.conf.json (1)
22-22: SHA verified; packaging metadata present in pyproject.toml
Optional:
- Add
--no-depsto your pip install line to lock transitive dependencies:- "python -m pip install git+https://github.com/google-deepmind/mujoco_warp@64bb938f87b6aaf840753ef9ddc198db655020c0", + "python -m pip install --no-deps git+https://github.com/google-deepmind/mujoco_warp@64bb938f87b6aaf840753ef9ddc198db655020c0",
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
asv.conf.json(1 hunks)asv/benchmarks/setup/bench_solver_mujoco.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- asv/benchmarks/setup/bench_solver_mujoco.py
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (5)
asv/benchmarks/setup/bench_solver_mujoco.py (5)
52-56: Fix Ruff F841 in timed section and ensure deterministic teardown.@skip_benchmark_if(wp.get_cuda_device_count() == 0) def time_initialize_solverMuJoCo(self, robot, num_envs): - _solver = SolverMuJoCo(self._model) - wp.synchronize_device() + _ = SolverMuJoCo(self._model) + wp.synchronize_device() + del _
73-78: Warm-up: same deterministic lifetime pattern as above.-# Load a small model to cache the kernels -solver = SolverMuJoCo(self._model) -del solver - -wp.synchronize_device() +# Warm up: instantiate once to JIT-compile kernels for this model shape +_ = SolverMuJoCo(self._model) +wp.synchronize_device() +del _
79-83: Fix Ruff F841 in fast timed section and delete after sync.@skip_benchmark_if(wp.get_cuda_device_count() == 0) def time_initialize_solverMuJoCo(self, robot, num_envs): - _solver = SolverMuJoCo(self._model) - wp.synchronize_device() + _ = SolverMuJoCo(self._model) + wp.synchronize_device() + del _
38-45: Guard setup on CPU-only and finalize model on CUDA explicitly.Without a guard, setup still creates GPU-dependent resources on CPU-only runners; also make the model device explicit to avoid accidental CPU placement skewing results.
def setup(self, robot, num_envs): - wp.init() - - builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) + wp.init() + if wp.get_cuda_device_count() == 0: + # CPU-only: time method is skipped; avoid creating GPU-dependent resources. + return + builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model - self._model = builder.finalize() + self._model = builder.finalize(device="cuda:0")
67-72: Mirror the CPU guard and explicit CUDA device in the fast benchmark setup.def setup(self, robot, num_envs): - wp.init() - builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) + wp.init() + if wp.get_cuda_device_count() == 0: + return + builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model - self._model = builder.finalize() + self._model = builder.finalize(device="cuda:0")
🧹 Nitpick comments (3)
asv/benchmarks/setup/bench_solver_mujoco.py (3)
46-51: Warm-up: use throwaway name, sync, then delete for deterministic lifetime.Also fix the stale comment (“small model”) to reflect actual behavior.
-# Load a small model to cache the kernels -solver = SolverMuJoCo(self._model) -del solver - -wp.synchronize_device() +# Warm up: instantiate once to JIT-compile kernels for this model shape +_ = SolverMuJoCo(self._model) +wp.synchronize_device() +del _
62-66: Consider setting a timeout for symmetry with KPI class.Optional, but keeps benchmark configs consistent.
rounds = 1 number = 1 repeat = 3 min_run_count = 1 +timeout = 3600
90-93: Rename CLI keys for clarity (they benchmark SolverMuJoCo, not “Model”).Purely cosmetic but reduces confusion when invoking via
--bench.benchmark_list = { - "KpiInitializeModel": KpiInitializeSolverMuJoCo, - "FastInitializeModel": FastInitializeSolverMuJoCo, + "KpiInitializeSolverMuJoCo": KpiInitializeSolverMuJoCo, + "FastInitializeSolverMuJoCo": FastInitializeSolverMuJoCo, }
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_solver_mujoco.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_solver_mujoco.py (3)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1094-2565)asv/benchmarks/setup/bench_model.py (1)
setup(38-39)newton/examples/example_mujoco.py (1)
create_model_builder(291-320)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_solver_mujoco.py (1)
29-30: Validate feasibility of large env counts across robots.8192 envs for humanoid/quadruped may OOM on common GPUs; confirm target hardware or downscale for CI. If needed, we can split heavy robots into a separate KPI suite.
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
eric-heiden
left a comment
There was a problem hiding this comment.
Thanks, LGTM! Just a small nit on a comment string.
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (5)
asv/benchmarks/setup/bench_solver_mujoco.py (5)
61-65: Free the solver deterministically after timing.Helps keep GPU memory steady across repeats.
def time_initialize_solverMuJoCo(self, robot, num_envs): - _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) - wp.synchronize_device() + _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) + wp.synchronize_device() + del _
88-91: Deterministic cleanup in fast timing too.def time_initialize_solverMuJoCo(self, robot, num_envs): - _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) - wp.synchronize_device() + _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) + wp.synchronize_device() + del _
68-69: “Fast” still feels large; confirm intent or downscale.128–256 envs for humanoid/ant/quadruped may still be heavy. Either rename to “Medium” or drop to, e.g., [64, 128].
47-60: Guard setup on CPU-only runners and finalize on CUDA explicitly.Without a guard, setup will still instantiate the solver on CPU-only CI and likely fail. Also, finalize the model on a CUDA device to avoid skewing timings with CPU models. Replace the warm-up variable and order to avoid Ruff F841 and ensure deterministic lifetime after a device sync.
def setup(self, robot, num_envs): wp.init() - - builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) + if wp.get_cuda_device_count() == 0: + # CPU-only: timing is skipped; avoid creating GPU-dependent resources. + return + + builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model - self._model = builder.finalize() + self._model = builder.finalize(device="cuda:0") - # Load the model to cache the kernels - solver = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) - del solver - - wp.synchronize_device() + # Warm up: instantiate once to JIT-compile kernels for this model shape + _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) + wp.synchronize_device() + del _
76-87: Mirror the CPU guard and explicit CUDA device in the fast benchmark setup.Same failure mode as above; also fix the warm-up variable and ordering.
def setup(self, robot, num_envs): wp.init() - builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) + if wp.get_cuda_device_count() == 0: + return + builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model - self._model = builder.finalize() + self._model = builder.finalize(device="cuda:0") - # Load the model to cache the kernels - solver = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) - del solver - - wp.synchronize_device() + # Warm up: instantiate once to JIT-compile kernels for this model shape + _ = SolverMuJoCo(self._model, ncon_per_env=nconmax[robot]) + wp.synchronize_device() + del _
🧹 Nitpick comments (4)
asv/benchmarks/setup/bench_solver_mujoco.py (4)
27-34: Revisit ncon_per_env for g1/h1 to avoid OOM at large env counts.At 4096–8192 envs, 150 contacts/env yields extremely large allocations (nworld * ncon_per_env) and may OOM on 16–24GB GPUs. Consider scaling by env count, capping, or skipping large envs for these robots.
Would you like me to propose a helper (e.g., compute_ncon_per_env(robot, num_envs)) and wire it through setup/time for safer defaults?
19-21: Avoid global config side-effects at import time.Setting wp.config globally on import can leak into other ASV modules in the same process. Move these into setup or guard with “if not already set”.
-wp.config.enable_backward = False -wp.config.quiet = True +wp.config.enable_backward = False # consider moving into setup() to limit scope +wp.config.quiet = True # or gate with an env flag if needed
71-75: Add a timeout for the fast benchmark too.Keeps runaway runs contained on constrained GPUs.
rounds = 1 number = 1 repeat = 3 min_run_count = 1 + timeout = 1200
99-102: Rename CLI keys for clarity.These benchmarks initialize the MuJoCo solver, not the model. Rename keys to match class names.
- "KpiInitializeModel": KpiInitializeSolverMuJoCo, - "FastInitializeModel": FastInitializeSolverMuJoCo, + "KpiInitializeSolverMuJoCo": KpiInitializeSolverMuJoCo, + "FastInitializeSolverMuJoCo": FastInitializeSolverMuJoCo,
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_solver_mujoco.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_solver_mujoco.py (3)
newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1094-2565)asv/benchmarks/setup/bench_model.py (1)
setup(38-39)newton/examples/example_mujoco.py (1)
create_model_builder(291-320)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_solver_mujoco.py (1)
47-54:builder.finalize()already supports an optionaldeviceparameter – its signature isfinalize(self, device: Devicelike | None = None, requires_grad: bool = False), so no changes are needed.
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
|
@eric-heiden I refactored the entire thing now to minimize benchmark runtime, integrating the solver setup benchmarks with the model initialization benchmarks. I had to jump through a few hoops to make it work but I think it's worthwhile to keep the benchmark runtime low. also tagging @shi-eric for visibility, mostly to make sure I'm not making an obvious mistake here.. |
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)
127-137: Same warm-up release issue as KPI.Delete _model/builder and GC after warm-up to avoid cumulative memory use.
Apply:
solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot]) del solver + del _model + del builder + gc.collect() wp.synchronize_device()
141-159: Same per-repeat cleanup improvements as KPI.Delete builder post-finalize and GC after releasing solver/model.
Apply:
model = builder.finalize() wp.synchronize_device() modelBuilder_end = time.perf_counter() timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg) + del builder ... del solver del model + gc.collect()
🧹 Nitpick comments (4)
asv/benchmarks/setup/bench_model.py (4)
27-36: Validate and document ncon_per_env overrides.Hard-coding 150 for g1/h1 may be too low/high depending on contact-rich scenes and large num_envs. Please confirm against worst-case contacts per env, and add a short comment with rationale or derive this once from a 1-env probe.
73-92: Tighten per-repeat cleanup to avoid transient pressure.Delete builder after finalize and GC after releasing solver/model.
Apply:
model = builder.finalize() wp.synchronize_device() modelBuilder_end = time.perf_counter() timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg) + del builder solver_beg = time.perf_counter() solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) wp.synchronize_device() solver_end = time.perf_counter() timings_solver.append(solver_end - solver_beg) del solver del model + gc.collect()
49-95: Deduplicate the timing logic.Both classes duplicate setup_cache/loop code. Factor into a small helper to reduce drift and ease maintenance.
Example (sketch):
def _compute_timings(robots, envs, repeats): ... # shared body here return timingsThen call it from each setup_cache with the class-specific params.
Also applies to: 117-164
39-48: Guard against OOM for largest configs.4096/8192 envs for humanoid/g1/h1 can exceed common GPU RAM. Consider a try/except around finalize/solver to skip combos gracefully (record None and short-circuit track_*), or downscale per-robot env counts.
Also applies to: 108-116
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_model.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
Example(185-320)create_model_builder(291-320)newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1094-2565)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
🔇 Additional comments (5)
asv/benchmarks/setup/bench_model.py (5)
49-58: Good use of setup_cache and cached timings.ASV will pass the cache to track_* and peakmem_*; this aligns with asv’s API and reduces run time.
92-95: Median in ms is appropriate for stability.Looks good.
97-106: track_time accessors look correct.Key lookup, units, and signatures align with asv expectations.
117-126: Fast path setup mirrors KPI path correctly.CUDA gate and cache structure are consistent.
175-185: peakmem hook signature change is correct.Absorbing the cache arg with “_” keeps asv compatible; CPU-scoped finalize is fine.
… setup_cache Signed-off-by: Alain Denzler <adenzler@nvidia.com>
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
newton/_src/utils/benchmark.py (4)
20-30: Add type hints and improve docstring for better maintainability.The new helper function would benefit from type hints and a more comprehensive docstring to clarify its purpose and expected input/output types.
-def _convert_params_to_dict(params, param_names): - """Convert params to keyword arguments using param_names.""" +from typing import Any, Dict, List, Optional, Tuple, Union + +def _convert_params_to_dict( + params: Union[Any, List, Tuple], + param_names: Optional[List[str]] +) -> Optional[Dict[str, Any]]: + """Convert params to keyword arguments using param_names. + + Args: + params: Parameter values, can be a single value or tuple/list of values + param_names: List of parameter names to map to the values + + Returns: + Dictionary mapping parameter names to values, or None if no param_names + """
32-44: Add type hints and improve documentation for the parameter calling helper.The function handles complex parameter passing logic that would benefit from explicit type annotations.
-def _call_with_params(method, param_dict, params, cached_data): - """Call a method with appropriate parameters (keyword or positional).""" +def _call_with_params( + method: callable, + param_dict: Optional[Dict[str, Any]], + params: Tuple, + cached_data: Optional[Any] +) -> Any: + """Call a method with appropriate parameters (keyword or positional). + + Args: + method: The method to call + param_dict: Keyword arguments if param_names was provided, None otherwise + params: Positional arguments to pass + cached_data: Cached data from setup_cache, passed as first positional arg + + Returns: + The result of the method call + """
74-75: Consider handling the enumerate index more explicitly.While the variable
iis used correctly, consider making its purpose clearer with a more descriptive name.- for i, params in enumerate(combinations): + for combination_index, params in enumerate(combinations):Then update line 88:
- if i == 0 and hasattr(benchmark_cls, "setup_cache"): + if combination_index == 0 and hasattr(benchmark_cls, "setup_cache"):
99-99: Consider improving parameter display for better readability.When params is a tuple with complex objects, the print output might be hard to read. Consider formatting it more nicely.
- print(f"\n[Benchmark] Running {benchmark_cls.__name__}.{attr} with parameters {params}") + param_str = ", ".join(str(p) for p in params) if isinstance(params, (list, tuple)) else str(params) + print(f"\n[Benchmark] Running {benchmark_cls.__name__}.{attr} with parameters ({param_str})")
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
newton/_src/utils/benchmark.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
newton/_src/utils/benchmark.py
26-26: Use X | Y in isinstance call instead of (X, Y)
Convert to X | Y
(UP038)
29-29: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
63-63: Use X | Y in isinstance call instead of (X, Y)
Convert to X | Y
(UP038)
84-84: Use X | Y in isinstance call instead of (X, Y)
Convert to X | Y
(UP038)
85-85: for loop variable params overwritten by assignment target
(PLW2901)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (4)
newton/_src/utils/benchmark.py (4)
61-66: LGTM! Clever handling of parameter combinations.The logic correctly distinguishes between multiple parameter lists (requiring cartesian product) and a single parameter list, which improves flexibility for benchmark configurations.
88-91: Good implementation of the caching mechanism.The setup_cache is correctly called only once at the beginning, which aligns with the PR objective of minimizing benchmark runtime by avoiding redundant initialization.
29-29: Addstrict=Truetozip()call for Python 3.10+ compatibility.Using
strict=Trueensures that the lengths ofparam_namesandparamsmatch, catching configuration errors early.- return dict(zip(param_names, params)) + return dict(zip(param_names, params, strict=True))Note: If you need to maintain compatibility with Python < 3.10, you can add a length check instead:
if len(param_names) != len(params): raise ValueError(f"Length mismatch: {len(param_names)} param_names vs {len(params)} params") return dict(zip(param_names, params))⛔ Skipped due to learnings
Learnt from: dylanturpin PR: newton-physics/newton#635 File: newton/examples/example_anymal_c_walk_on_sand.py:316-316 Timestamp: 2025-08-25T20:21:18.407Z Learning: PEP 618 introduced the strict parameter to zip() in Python 3.10 with strict=False as the default. Adding explicit strict=False to zip() calls maintains existing behavior while satisfying linters that require explicit strict parameter specification in Python 3.10+ codebases.Learnt from: dylanturpin PR: newton-physics/newton#635 File: newton/_src/utils/gizmo.py:483-487 Timestamp: 2025-08-25T20:20:59.822Z Learning: Adding `strict=False` explicitly to zip() calls in Python version upgrade PRs is appropriate for maintaining existing behavior while satisfying linters that expect the parameter to be explicit in Python 3.10+. This follows PEP 618 where strict=False is the default, so explicit usage doesn't change runtime behavior.
88-91: Cached_data compatibility verified
setup_cachereturns a timings dict and alltrack_time_*methods accept it as their first parameter; no changes required.
| if not isinstance(params, (list, tuple)): | ||
| params = (params,) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Variable reassignment could cause confusion.
The params variable is reassigned here, which overwrites the loop variable. This could lead to confusion about the variable's state.
- # Ensure params is always a tuple for consistent handling
- if not isinstance(params, (list, tuple)):
- params = (params,)
+ # Ensure params is always a tuple for consistent handling
+ params_tuple = params if isinstance(params, (list, tuple)) else (params,)Then update the subsequent uses of params to use params_tuple where the tuple form is needed:
- _call_with_params(instance.setup, param_dict, params, cached_data)
+ _call_with_params(instance.setup, param_dict, params_tuple, cached_data)And similarly for lines 105, 111, and 117.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not isinstance(params, (list, tuple)): | |
| params = (params,) | |
| # Ensure params is always a tuple for consistent handling | |
| - if not isinstance(params, (list, tuple)): | |
| params_tuple = params if isinstance(params, (list, tuple)) else (params,) | |
| param_dict = dict(zip(_names, params)) | |
| _call_with_params(instance.setup, param_dict, params_tuple, cached_data) |
🧰 Tools
🪛 Ruff (0.12.2)
84-84: Use X | Y in isinstance call instead of (X, Y)
Convert to X | Y
(UP038)
85-85: for loop variable params overwritten by assignment target
(PLW2901)
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
newton/_src/utils/benchmark.py (1)
82-85: Fix: params_tuple may be undefined when params is already a list/tupleDefine params_tuple in both branches to avoid UnboundLocalError on subsequent calls.
- # Ensure params is always a tuple for consistent handling - if not isinstance(params, (list | tuple)): - params_tuple = (params,) + # Ensure params is always a tuple for consistent handling + params_tuple = tuple(params) if isinstance(params, (list | tuple)) else (params,)
🧹 Nitpick comments (3)
newton/_src/utils/benchmark.py (3)
20-30: Validate param_names/params length to prevent silent mismatchesGuard against dropped/extra parameters; avoid relying on zip(strict=False).
def _convert_params_to_dict(params, param_names): """Convert params to keyword arguments using param_names.""" if not param_names: return None # Handle single value case by wrapping in tuple if not isinstance(params, (list | tuple)): params = (params,) - return dict(zip(param_names, params, strict=False)) + if len(param_names) != len(params): + raise ValueError( + f"param_names length ({len(param_names)}) != params length ({len(params)})" + ) + return dict(zip(param_names, params))
87-90: Reduce noise: print setup_cache message only when printing resultsKeeps output quiet for programmatic runs.
- print(f"\n[Benchmark] Running {benchmark_cls.__name__}.setup_cache") + if print_results: + print(f"\n[Benchmark] Running {benchmark_cls.__name__}.setup_cache")
100-115: Gracefully skip NotImplemented benchmarks; avoid ZeroDivisionErrorCatching NotImplementedError lets the runner continue; also skip averaging when there are no samples.
if attr.startswith("time_"): # Run timing benchmarks multiple times and measure elapsed time. for _ in range(number): start = time.perf_counter() - _call_with_params(method, param_dict, params_tuple, cached_data) - t = time.perf_counter() - start - samples.append(t) + try: + _call_with_params(method, param_dict, params_tuple, cached_data) + t = time.perf_counter() - start + samples.append(t) + except NotImplementedError: + samples.clear() + break elif attr.startswith("track_"): # Run tracking benchmarks multiple times and record returned values. for _ in range(number): - val = _call_with_params(method, param_dict, params_tuple, cached_data) - samples.append(val) + try: + val = _call_with_params(method, param_dict, params_tuple, cached_data) + samples.append(val) + except NotImplementedError: + samples.clear() + break # Compute the average result. - avg = sum(samples) / len(samples) - results[(attr, params)] = avg + if not samples: + if print_results: + print(f"[Benchmark] Skipping {benchmark_cls.__name__}.{attr} (NotImplemented).") + continue + avg = sum(samples) / len(samples) + results[(attr, params)] = avg
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
newton/_src/utils/benchmark.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
newton/_src/utils/benchmark.py (2)
asv/benchmarks/compilation/bench_example_load.py (6)
setup(32-34)setup(60-62)setup(87-89)setup(114-116)setup(142-144)setup(170-172)asv/benchmarks/simulation/bench_cloth.py (2)
setup(32-34)setup(47-49)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (1)
newton/_src/utils/benchmark.py (1)
32-44: Ignore name-based signature checks forcached_data
Benchmark methods uniformly expect the cached data as the first positional argument (often namedtimings), so probing for a parameter literally namedcached_dataor for varargs will break the existing benchmarks. Leave the unconditional positional prepend in place.Likely an incorrect or invalid review comment.
| # If param_lists contains multiple lists, generate all combinations | ||
| # If it's a single list, just use it directly | ||
| if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists): | ||
| combinations = itertools.product(*param_lists) | ||
| else: | ||
| combinations = param_lists | ||
| else: |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid Cartesian product for already-expanded parameter sets
Current check will incorrectly product a list-of-tuples like [('ant', 32), ('humanoid', 64)]. Gate the product by matching param_names length.
- # If param_lists contains multiple lists, generate all combinations
- # If it's a single list, just use it directly
- if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists):
- combinations = itertools.product(*param_lists)
- else:
- combinations = param_lists
+ # If params are given as axes (one list per param_name), take the Cartesian product.
+ # Otherwise, treat as a sequence of already-expanded parameter sets.
+ param_names_for_axes = getattr(benchmark_cls, "param_names", None)
+ if (
+ isinstance(param_lists, (list | tuple))
+ and param_names_for_axes
+ and len(param_lists) == len(param_names_for_axes)
+ and all(isinstance(item, (list | tuple)) for item in param_lists)
+ ):
+ combinations = itertools.product(*param_lists)
+ else:
+ combinations = param_lists📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # If param_lists contains multiple lists, generate all combinations | |
| # If it's a single list, just use it directly | |
| if len(param_lists) > 1 and all(isinstance(item, (list | tuple)) for item in param_lists): | |
| combinations = itertools.product(*param_lists) | |
| else: | |
| combinations = param_lists | |
| else: | |
| # If params are given as axes (one list per param_name), take the Cartesian product. | |
| # Otherwise, treat as a sequence of already-expanded parameter sets. | |
| param_names_for_axes = getattr(benchmark_cls, "param_names", None) | |
| if ( | |
| isinstance(param_lists, (list | tuple)) | |
| and param_names_for_axes | |
| and len(param_lists) == len(param_names_for_axes) | |
| and all(isinstance(item, (list | tuple)) for item in param_lists) | |
| ): | |
| combinations = itertools.product(*param_lists) | |
| else: | |
| combinations = param_lists |
🤖 Prompt for AI Agents
In newton/_src/utils/benchmark.py around lines 61 to 67, the current logic uses
itertools.product whenever param_lists has length >1 and all items are
list/tuple, which incorrectly expands already-paired parameter tuples like
[('ant', 32), ('humanoid', 64)]; change the condition to only use
itertools.product when the number of param_lists matches the number of
param_names (i.e., each param name has its own list of candidate values),
otherwise treat param_lists as already-expanded combinations and use it
directly. Ensure the check compares len(param_lists) == len(param_names) (and
optionally that each element is a sequence) before calling itertools.product.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)
64-71: Warm-up cleanup for KPI looks correct and resolves prior OOM risk.
del solver/_model/builder+gc.collect()+wp.synchronize_device()is in place. Good fix.
132-142: Free warm-up model/builder in FastInitializeModel to prevent memory growth across (robot, num_envs).
_modelandbuilderpersist after warm-up; this can accumulate. Mirror KPI cleanup and sync after deletes.- solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot]) - del solver - - wp.synchronize_device() + solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot]) + del solver + del _model + del builder + gc.collect() + wp.synchronize_device()
🧹 Nitpick comments (5)
asv/benchmarks/setup/bench_model.py (5)
23-24: Avoid global Warp config side effects; set per-benchmark.Move
wp.config.enable_backward = Falseandwp.config.quiet = Trueinto eachsetup_cacheto prevent global impact when this module is imported.Apply:
- wp.config.enable_backward = False - wp.config.quiet = TrueAnd add at the beginning of both setup methods:
@@ - def setup_cache(self): + def setup_cache(self): + wp.config.enable_backward = False + wp.config.quiet = True wp.init()
146-165: Delete builder inside the repeat loop (Fast) to keep host memory flat.Currently only
solverandmodelare deleted.del solver del model + del builder + gc.collect()
29-36: Document/verify nconmax heuristics.If 150 is empirical for g1/h1, add a brief comment; otherwise consider deriving from model or leaving None for consistency.
117-121: Align benchmark metadata across classes.Optional: add
timeouttoFastInitializeModelfor consistency with KPI.rounds = 1 number = 1 repeat = 3 min_run_count = 1 + timeout = 1200
180-191: Release builder in peakmem bench to avoid skewing measurements.Delete
builderand force GC after finalizing/del model.with wp.ScopedDevice("cpu"): builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model model = builder.finalize() del model + del builder + gc.collect()
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_model.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (1)
create_model_builder(291-320)newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1094-2565)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_model.py (1)
52-54: The current directory listing isn’t required for resolving whether ASV treats NotImplementedError as a skip. Instead, confirm via ASV documentation or test harness.
Next step: search ASV source or docs for exception handling semantics of NotImplementedError.
| timings_solver = [] | ||
|
|
||
| for _ in range(self.repeat): | ||
| modelBuilder_beg = time.perf_counter() |
There was a problem hiding this comment.
These variable names could be improved, e.g. use snake case model_builder and don't abbreviate beg since we're not being charged by the character
There was a problem hiding this comment.
ack, you're totally right. Will improve this.
| timeout = 3600 | ||
|
|
||
| def setup(self, robot, num_envs): | ||
| def setup_cache(self): |
There was a problem hiding this comment.
I don't quite understand the logic for using setup_cache. A track_ benchmark already runs once total so it seems less confusing to put specific quantity you want to measure in the track_ function itself instead of doing everything in setup_cache.
I would also recommend trying to make things work with as time_ benchmarks if possible since they will also report some measure of variability (in addition to collecting multiple samples) and report them in the summary tables
There was a problem hiding this comment.
Thanks for the feedback. It feels really overengineered to me as well..
The main thing I wanted to accomplish is to avoid having to run the modelBuilder setup for each run of the SolverMuJoCo initialization because of the way it impacts the total time needed to run the benchmarks.
I would strongly prefer using time_* benchmarks, but for both time_ and track_ I cannot assume any ordering of the runs. I also cannot measure 2 things in both of these, so I need a way to combine it. setup_cache seemed to be the logical place because it only runs once, and I can forward the data to the track_ functions.
Please correct me if my assumptions are wrong here.
Thinking of it, maybe it's easiest to just leave the existing model benchmark as-is, and revert to the MuJoCo specific one that does the modelBuilder stuff in setup_cache to avoid running it for every repeat. I think we can pay that price.
There was a problem hiding this comment.
separate question you might know the answer to - is the warp cache cleared for each repeat or does it persist?
There was a problem hiding this comment.
I played some more with this. I tried to move the mujoco parts to it's own file again and cache the model. That doesn't really work because setup_cache isn't parametrizable, so I would end up implementing the same thing I did here with the combined solution.
The other option is obviously also to just forget about optimizing the runtime and revert back to the simple version that was already approved by Eric before. I'm fine with that as well.
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)
132-141: Missing resource cleanup compared to KpiInitializeModel.The FastInitializeModel class is missing the
del _modelandgc.collect()cleanup that exists in KpiInitializeModel after the kernel preloading step. This could lead to memory accumulation across iterations.Apply this diff to add consistent resource cleanup:
# Load the model to cache the kernels solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot]) del solver + del _model + del builder + gc.collect() wp.synchronize_device()
76-100: Fix variable name inconsistencies.Several variable naming issues need to be addressed:
modelBuilder_begshould use snake_case asmodel_builder_begin- The timing collection uses inconsistent variable names
Apply this diff to fix the variable naming:
for _ in range(self.repeat): - modelBuilder_beg = time.perf_counter() + model_builder_begin = time.perf_counter() builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model model = builder.finalize() wp.synchronize_device() - modelBuilder_end = time.perf_counter() - timings_modelBuilder.append(modelBuilder_end - modelBuilder_beg) + model_builder_end = time.perf_counter() + timings_modelBuilder.append(model_builder_end - model_builder_begin) - solver_beg = time.perf_counter() + solver_begin = time.perf_counter() solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) wp.synchronize_device() solver_end = time.perf_counter() - timings_solver.append(solver_end - solver_beg) + timings_solver.append(solver_end - solver_begin)
🧹 Nitpick comments (2)
asv/benchmarks/setup/bench_model.py (2)
29-36: Consider making the constraint configuration more explicit.The
nconmaxdictionary mixesNonevalues with specific integers, making the intent unclear. Consider documenting why some robots useNonewhile others have specific values, or use a more explicit configuration structure.Add a comment explaining the configuration:
+# Per-robot constraint limits for SolverMuJoCo initialization +# None = use solver's automatic estimation, int = explicit limit nconmax = { "humanoid": None, "g1": 150, "h1": 150, "cartpole": None, "ant": None, "quadruped": None, }
180-180: Unused parameter should be named with underscore prefix.The first parameter appears to be unused cached data. Consider making this more explicit by using a descriptive name or double underscore prefix.
Apply this diff for better clarity:
- def peakmem_initialize_model_cpu(self, _, robot, num_envs): + def peakmem_initialize_model_cpu(self, _cached_data, robot, num_envs):
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_model.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
Example(185-320)create_model_builder(291-320)newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1094-2565)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_model.py
156-156: Local variable solver_begin is assigned to but never used
Remove assignment to unused variable solver_begin
(F841)
160-160: Undefined name solver_beg
(F821)
165-165: Undefined name timings_modelBuilder
(F821)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (3)
asv/benchmarks/setup/bench_model.py (3)
16-28: LGTM! Appropriate imports added for the new functionality.The new imports (
itertools,time,numpy) andSolverMuJoCoare correctly imported and necessary for the benchmark caching implementation.
49-72: LGTM! Well-structured setup caching with proper resource management.The setup_cache method correctly implements kernel preloading and resource cleanup to optimize benchmark runtime. The CUDA availability check prevents runtime errors in environments without GPU support.
102-110: LGTM! Clean implementation of cached timing retrieval.The track method correctly retrieves cached solver timing data and properly specifies the unit as milliseconds.
|
@eric-heiden would you mind taking this over in the context of #548 ? |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
asv/benchmarks/setup/bench_model.py (1)
1-221: Fix undefined name and incorrect deletions in FastInitializeModel.setup_cache
- At line 163, change
timings_solver.append(solver_end - solver_beg)to use the correct variablesolver_begin.- At lines 93, 140, 166, and 194, replace
del modelwithdel _modelto match the variable initialized bybuilder.finalize().
♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)
59-72: Warm-up and cleanup looks good.Preloading kernels and freeing
_model/builder between (robot, num_envs) avoids memory growth and noisy timings. Nice.
159-164: Undefined namesolver_beg→ usesolver_begin.Typo breaks timing; use the variable you define.
solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) wp.synchronize_device() solver_end = time.perf_counter() - timings_solver.append(solver_end - solver_beg) + timings_solver.append(solver_end - solver_begin)
🧹 Nitpick comments (5)
asv/benchmarks/setup/bench_model.py (5)
185-196: Also deletebuilderin peakmem benchmark to avoid lingering allocations.Minor hygiene; helps stabilizing RSS measurements.
with wp.ScopedDevice("cpu"): builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model model = builder.finalize() del model + del builder + gc.collect()
65-65: Defensive access fornconmax.Future robot additions won’t crash the benchmark if the map isn’t updated.
- solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot]) + solver = SolverMuJoCo(_model, ncon_per_env=nconmax.get(robot))- solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) + solver = SolverMuJoCo(model, ncon_per_env=nconmax.get(robot))Also applies to: 87-87, 138-138, 160-160
117-121: Consider addingtimeoutlike KPI class for consistency.Keeps runner behavior uniform across benchmarks.
rounds = 1 number = 1 repeat = 3 min_run_count = 1 + timeout = 3600
23-25: Module-wide Warp config can leak to other benchmarks.Optional: set/restore these in
setup_cacheto localize side effects.
59-101: DRY: factor shared benchmarking loop.Both classes duplicate the same warm-up/measure/cleanup logic; a small helper would reduce maintenance.
If helpful, I can sketch a
_collect_timings(robots, envs, repeat)helper and integrate it into both classes.Also applies to: 132-173
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_model.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (1)
create_model_builder(300-329)newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1088-2559)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_model.py
140-140: Undefined name model
(F821)
159-159: Local variable solver_begin is assigned to but never used
Remove assignment to unused variable solver_begin
(F841)
163-163: Undefined name solver_beg
(F821)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (1)
asv/benchmarks/setup/bench_model.py (1)
52-54: Confirm skip semantics when CUDA is unavailable.Raising NotImplementedError is fine if your
run_benchmarktreats it as SKIPPED; otherwise the run will error. Please confirm behavior.Also applies to: 125-127
Signed-off-by: Alain Denzler <adenzler@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
asv/benchmarks/setup/bench_model.py (2)
59-72: Warm-up + explicit cleanup looks good; prevents memory growth across sweeps.Preloading kernels, deleting solver/model/builder, gc.collect(), and synchronizing are correctly in place.
159-164: Fix NameError in timing: usesolver_begin, notsolver_beg.This currently crashes and also triggers Ruff F821/F841. Use the correct variable.
solver_begin = time.perf_counter() solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) wp.synchronize_device() solver_end = time.perf_counter() - timings_solver.append(solver_end - solver_beg) + timings_solver.append(solver_end - solver_begin)
🧹 Nitpick comments (2)
asv/benchmarks/setup/bench_model.py (2)
185-196: Release builder and force GC in peakmem path (post-measurement cleanup).Minor cleanup parity with other paths; avoids lingering CPU allocations after peakmem sampling.
with wp.ScopedDevice("cpu"): builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model model = builder.finalize() del model + del builder + gc.collect()
39-111: Optional: de-duplicate common benchmarking logic between classes.Both classes share identical loops; extracting a small helper (e.g., _measure_timings(robots, envs, repeat)) would reduce maintenance and keep fixes (like the NameError above) in one place.
Also applies to: 113-184
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_model.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
Example(191-329)create_model_builder(300-329)newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1088-2559)
🪛 Ruff (0.12.2)
asv/benchmarks/setup/bench_model.py
159-159: Local variable solver_begin is assigned to but never used
Remove assignment to unused variable solver_begin
(F841)
163-163: Undefined name solver_beg
(F821)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (4)
asv/benchmarks/setup/bench_model.py (4)
76-85: Timing instrumentation is correct.Includes builder creation and finalize; device syncs bracket timings appropriately. Same for solver init.
Also applies to: 86-91
97-101: Returning cached medians in ms via track_ methods: LGTM.Interface aligns with run_benchmark’s cached-arg calling convention.
Also applies to: 102-111
132-145: FastInitializeModel warm-up/cleanup mirrors KPI path correctly.Consistent kernel preload and cleanup sequence.
52-54: Confirm ASV skip on CUDA unavailable.
Noexcept NotImplementedErroris present innewton/_src/utils/benchmark.py(around therun_benchmarkdefinition), so verify that ASV’s runner will catch the exception raised insetup_cacheand skip the benchmark when CUDA isn’t available to prevent CPU-only CI failures.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
asv/benchmarks/setup/bench_model.py (5)
76-78: Synchronize before starting the timed region to isolate measurements.Insert a device sync before each timed block to avoid overlap from prior work influencing the next measurement. This does not affect timing since it’s outside the measured window.
- for _ in range(self.repeat): - model_builder_begin = time.perf_counter() + for _ in range(self.repeat): + wp.synchronize_device() + model_builder_begin = time.perf_counter()- for _ in range(self.repeat): - model_builder_begin = time.perf_counter() + for _ in range(self.repeat): + wp.synchronize_device() + model_builder_begin = time.perf_counter()Also applies to: 149-151
29-36: Use a constant-style name and clarify None semantics for ncon_per_env.Minor naming/readability nit: make it an ALL_CAPS module constant and keep usage consistent. Also add a short comment that None falls back to model.rigid_contact_max in SolverMuJoCo.
-nconmax = { +NCON_PER_ENV = { "humanoid": None, "g1": 150, "h1": 150, "cartpole": None, "ant": None, "quadruped": None, }- solver = SolverMuJoCo(_model, ncon_per_env=nconmax[robot]) + solver = SolverMuJoCo(_model, ncon_per_env=NCON_PER_ENV[robot])- solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) + solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])- solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) + solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])- solver = SolverMuJoCo(model, ncon_per_env=nconmax[robot]) + solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot])Also applies to: 65-66, 87-88, 138-141, 160-161
102-106: Prefer snake_case and consistent naming for track methods.Rename for consistency with Python style and the class name: solver_mujoco.
-def track_time_initialize_solverMuJoCo(self, timings, robot, num_envs): +def track_time_initialize_solver_mujoco(self, timings, robot, num_envs): return timings["solver"][(robot, num_envs)] -track_time_initialize_solverMuJoCo.unit = "ms" +track_time_initialize_solver_mujoco.unit = "ms"-def track_time_initialize_solverMuJoCo(self, timings, robot, num_envs): +def track_time_initialize_solver_mujoco(self, timings, robot, num_envs): return timings["solver"][(robot, num_envs)] -track_time_initialize_solverMuJoCo.unit = "ms" +track_time_initialize_solver_mujoco.unit = "ms"Also applies to: 175-179
149-169: De-duplicate nearly identical benchmarking logic between classes.The warm-up, timing, and cleanup loops are duplicated. Factor into a shared helper to reduce drift and maintenance.
Example helper (add at module scope):
def _measure_init_timings(robot_list, env_list, repeat): timings = {"model_builder": {}, "solver": {}} for robot, num_envs in itertools.product(robot_list, env_list): builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) _model = builder.finalize() solver = SolverMuJoCo(_model, ncon_per_env=NCON_PER_ENV[robot]) del solver, _model, builder gc.collect(); wp.synchronize_device() t_mb, t_sv = [], [] for _ in range(repeat): wp.synchronize_device() t0 = time.perf_counter() builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) model = builder.finalize(); wp.synchronize_device() t_mb.append(time.perf_counter() - t0) t1 = time.perf_counter() solver = SolverMuJoCo(model, ncon_per_env=NCON_PER_ENV[robot]) wp.synchronize_device() t_sv.append(time.perf_counter() - t1) del solver, model, builder gc.collect(); wp.synchronize_device() timings["model_builder"][(robot, num_envs)] = np.median(t_mb) * 1000 timings["solver"][(robot, num_envs)] = np.median(t_sv) * 1000 return timingsThen in each setup_cache:
return _measure_init_timings(self.params[0], self.params[1], self.repeat)Also applies to: 76-97, 132-173, 49-101
188-196: Free the builder in peakmem CPU path as well.Small cleanup to avoid retaining host-side allocations in memory measurements.
with wp.ScopedDevice("cpu"): builder = Example.create_model_builder(robot, num_envs, randomize=True, seed=123) # finalize model model = builder.finalize() del model +del builder
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
asv/benchmarks/setup/bench_model.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
asv/benchmarks/setup/bench_model.py (2)
newton/examples/example_mujoco.py (2)
Example(191-329)create_model_builder(300-329)newton/_src/solvers/mujoco/solver_mujoco.py (1)
SolverMuJoCo(1088-2559)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: Run GPU Benchmarks (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (2)
asv/benchmarks/setup/bench_model.py (2)
59-72: Warm-up + cleanup looks solid.Kernel preloading plus explicit deletion and sync before measurement is the right tradeoff for stable timings.
Also applies to: 132-145
29-36: Confirm ncon_per_env values for g1/h1 are safe at high env counts.Hard-coding 150 may under- or over-allocate depending on contact richness. Please verify against ROBOT_CONFIGS and/or empirical rigid_contact_max.
You can locate any configured per-robot contact limits and cross-check:
#!/bin/bash # Show where ncon/contact-related caps are defined for MuJoCo examples rg -n -C2 -S '(ROBOT_CONFIGS|nconmax|ncon_per_env|rigid_contact_max)' newtonIf uncertainty remains, consider using None (fallback to model.rigid_contact_max) or a conservative per-robot cap derived from Example builders.
| del solver | ||
| del model | ||
| del builder | ||
| gc.collect() | ||
|
|
||
| timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000 |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Ensure GPU memory is actually reclaimed between repeats (add sync after gc).
Without a post-gc device sync, freeing large MJ/solver buffers can lag and accumulate across repeats, risking OOM at 4096/8192 envs. Add a sync after cleanup (outside the timed region).
del solver
del model
del builder
gc.collect()
+ wp.synchronize_device()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| del solver | |
| del model | |
| del builder | |
| gc.collect() | |
| timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000 | |
| del solver | |
| del model | |
| del builder | |
| gc.collect() | |
| wp.synchronize_device() | |
| timings["model_builder"][(robot, num_envs)] = np.median(timings_model_builder) * 1000 |
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_model.py around lines 92 to 97, GPU memory from
MJ/solver buffers may not be reclaimed immediately after deleting
solver/model/builder and calling gc.collect(), so add an explicit device
synchronization call (e.g. torch.cuda.synchronize() if using PyTorch)
immediately after gc.collect() and before recording timings to ensure freed GPU
memory is actually released between repeats; place this sync outside the timed
region so it does not affect benchmark measurements.
| del solver | ||
| del model | ||
| del builder | ||
| gc.collect() | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Mirror the post-gc device sync in the fast benchmark too.
Same memory pressure risk as above; add a sync here as well.
del solver
del model
del builder
gc.collect()
+ wp.synchronize_device()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| del solver | |
| del model | |
| del builder | |
| gc.collect() | |
| del solver | |
| del model | |
| del builder | |
| gc.collect() | |
| wp.synchronize_device() |
🤖 Prompt for AI Agents
In asv/benchmarks/setup/bench_model.py around lines 165 to 169, after deleting
solver/model/builder and calling gc.collect() in the fast benchmark you must
mirror the post-gc device synchronization done elsewhere: add a device sync call
(e.g., call the CUDA/device synchronize function used elsewhere in the file)
immediately after gc.collect(). Ensure the sync is conditional on the
device/backend being used (only call if CUDA/accelerator is available) and
import or reference the same sync utility already used in the file to keep
behavior consistent.
|
Closing this PR, since #778 is adding solver initialization benchmarks (and merges this PR branch). |
Pull request was closed
completes #591 for alpha P0 envs. Once allegro hand is up, we can include here as well.
We should figure out whether we really need all the robots here or whether it's ok to have a subset.
Newton Migration Guide
Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this PR.
docs/migration.rstis up-to dateBefore your PR is "Ready for review"
newton/tests/test_examples.py)pre-commit run -aSummary by CodeRabbit
New Features
Refactor
Chores