Skip to content

Mass rearrangement of the ASV benchmark suite#475

Merged
shi-eric merged 9 commits into
newton-physics:mainfrom
shi-eric:dev/kvilella/update_benchmark_scripts
Jul 26, 2025
Merged

Mass rearrangement of the ASV benchmark suite#475
shi-eric merged 9 commits into
newton-physics:mainfrom
shi-eric:dev/kvilella/update_benchmark_scripts

Conversation

@shi-eric

@shi-eric shi-eric commented Jul 26, 2025

Copy link
Copy Markdown
Member

Description

Starting from @Kenny-Vilella's branch in #461, the benchmark files have been moved around into different subdirectories depending on the purpose.

Main ways to run:

Run everything:

asv run --launch-method spawn --durations all main^!

Run the fast benchmarks (e.g. for merge requests)

asv run --launch-method spawn --bench Fast --durations all main^!

Run the KPI-relevant benchmarks

asv run --launch-method spawn --bench Kpi --durations all main^!

Newton Migration Guide

Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this MR.

  • The migration guide in docs/migration.rst is up-to date

Before your PR is "Ready for review"

  • All commits are signed-off to indicate that your contribution adheres to the Developer Certificate of Origin requirements
  • Necessary tests have been added and new examples are tested (see newton/tests/test_examples.py)
  • Documentation is up-to-date
  • Code passes formatting and linting checks with pre-commit run -a

Summary by CodeRabbit

  • New Features

    • Introduced comprehensive benchmarking modules for various simulation examples, including model initialization and simulation performance across multiple robot types and environments.
    • Added new benchmarks for MuJoCo-based environments, quadruped XPBD simulations, and example load time measurements.
  • Bug Fixes

    • Updated benchmarks to conditionally skip execution when no CUDA device is available, improving robustness.
  • Refactor

    • Replaced subprocess-based benchmarks with in-process simulation stepping for improved accuracy and maintainability.
    • Renamed and reorganized benchmark classes for clarity and consistency.
  • Chores

    • Updated linter configuration to ignore additional error codes for benchmark files.
  • Removals

    • Removed outdated and redundant benchmark files and classes for Cartpole, Cloth Manipulation, Humanoid, Quadruped, AnyMal, and selection simulation examples.

@coderabbitai

coderabbitai Bot commented Jul 26, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

"""

Walkthrough

This change refactors and reorganizes the ASV benchmarking suite for the Newton project. It removes several example-specific benchmark files, consolidates subprocess-based load benchmarks into a single module, introduces new in-process simulation benchmarks, and adds comprehensive benchmarking modules for model initialization and MuJoCo-based simulations. Linter configuration is also updated.

Changes

File(s) Change Summary
asv/benchmarks/examples/example_cartpole.py
asv/benchmarks/examples/example_cloth_manipulation.py
asv/benchmarks/examples/example_humanoid.py
asv/benchmarks/examples/example_quadruped.py
Removed example-specific benchmark files and all their benchmark classes/methods.
asv/benchmarks/compilation/bench_example_load.py Added; new consolidated benchmark module for subprocess-based load benchmarks across multiple Newton examples.
asv/benchmarks/setup/bench_model.py Added; new benchmark module for model initialization performance across robot types and environment counts.
asv/benchmarks/simulation/bench_anymal.py Removed FeatherstoneSolverLoad class; renamed PretrainedSimulate to FastExampleAnymalPretrained.
asv/benchmarks/simulation/bench_cloth.py Removed VBDSolverLoad; added FastExampleClothManipulation (in-process simulation benchmark); renamed VBDSolverSimulate.
asv/benchmarks/simulation/bench_mujoco.py Added; new comprehensive benchmarking module for MuJoCo-based simulation environments with "Fast" and "Kpi" classes per robot.
asv/benchmarks/simulation/bench_quadruped_xpbd.py Added; new in-process simulation benchmark for quadruped XPBD example.
asv/benchmarks/simulation/bench_selection.py Removed MuJoCoSolverLoad; replaced or renamed simulation benchmark class.
pyproject.toml Updated Ruff linter ignore rules for ASV benchmark files to include E402.

Sequence Diagram(s)

sequenceDiagram
    participant BenchmarkClass
    participant Warp
    participant ExampleScript
    participant Subprocess
    participant CUDA_Device

    BenchmarkClass->>Warp: Clear LTO and kernel caches (setup)
    alt Subprocess-based benchmark
        BenchmarkClass->>Subprocess: Run ExampleScript (1 frame, headless)
        Subprocess->>CUDA_Device: Execute simulation
        Subprocess-->>BenchmarkClass: Return output
    else In-process simulation benchmark
        BenchmarkClass->>ExampleScript: Initialize simulation (N envs)
        loop For each frame
            ExampleScript->>CUDA_Device: Step simulation
            opt Cloth solver
                ExampleScript->>ExampleScript: Rebuild BVH every 10 frames
            end
        end
        ExampleScript-->>BenchmarkClass: Complete timing
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • newton-physics/newton#361: Adds new ASV benchmark classes for running example simulations as subprocesses, similar to the consolidated approach in this PR.
  • newton-physics/newton#458: Refactors and renames example benchmark classes, standardizes method names, and reorganizes benchmarks, which aligns with the restructuring in this PR.

Suggested reviewers

  • eric-heiden
    """

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ffb2ec8464c1779f17362edb6ed85a964c0e082 and 6972d08.

📒 Files selected for processing (12)
  • asv/benchmarks/compilation/bench_example_load.py (1 hunks)
  • asv/benchmarks/examples/example_cartpole.py (0 hunks)
  • asv/benchmarks/examples/example_cloth_manipulation.py (0 hunks)
  • asv/benchmarks/examples/example_humanoid.py (0 hunks)
  • asv/benchmarks/examples/example_quadruped.py (0 hunks)
  • asv/benchmarks/setup/bench_model.py (1 hunks)
  • asv/benchmarks/simulation/bench_anymal.py (1 hunks)
  • asv/benchmarks/simulation/bench_cloth.py (2 hunks)
  • asv/benchmarks/simulation/bench_mujoco.py (1 hunks)
  • asv/benchmarks/simulation/bench_quadruped_xpbd.py (1 hunks)
  • asv/benchmarks/simulation/bench_selection.py (1 hunks)
  • pyproject.toml (1 hunks)
💤 Files with no reviewable changes (4)
  • asv/benchmarks/examples/example_cartpole.py
  • asv/benchmarks/examples/example_cloth_manipulation.py
  • asv/benchmarks/examples/example_humanoid.py
  • asv/benchmarks/examples/example_quadruped.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • pyproject.toml
  • asv/benchmarks/simulation/bench_quadruped_xpbd.py
  • asv/benchmarks/setup/bench_model.py
  • asv/benchmarks/simulation/bench_cloth.py
  • asv/benchmarks/simulation/bench_anymal.py
  • asv/benchmarks/simulation/bench_selection.py
  • asv/benchmarks/compilation/bench_example_load.py
  • asv/benchmarks/simulation/bench_mujoco.py
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@shi-eric shi-eric marked this pull request as draft July 26, 2025 00:22

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between debe21d and 6a7e0556635a66f1099ab2c602c55887f02ab016.

📒 Files selected for processing (12)
  • asv/benchmarks/compilation/bench_example_load.py (1 hunks)
  • asv/benchmarks/examples/example_cartpole.py (0 hunks)
  • asv/benchmarks/examples/example_cloth_manipulation.py (0 hunks)
  • asv/benchmarks/examples/example_humanoid.py (0 hunks)
  • asv/benchmarks/examples/example_quadruped.py (0 hunks)
  • asv/benchmarks/setup/bench_model.py (1 hunks)
  • asv/benchmarks/simulation/bench_anymal.py (1 hunks)
  • asv/benchmarks/simulation/bench_cloth.py (1 hunks)
  • asv/benchmarks/simulation/bench_mujoco.py (1 hunks)
  • asv/benchmarks/simulation/bench_quadruped_xpbd.py (1 hunks)
  • asv/benchmarks/simulation/bench_selection.py (1 hunks)
  • pyproject.toml (1 hunks)
💤 Files with no reviewable changes (4)
  • asv/benchmarks/examples/example_cartpole.py
  • asv/benchmarks/examples/example_quadruped.py
  • asv/benchmarks/examples/example_humanoid.py
  • asv/benchmarks/examples/example_cloth_manipulation.py
🧰 Additional context used
🧠 Learnings (4)
asv/benchmarks/simulation/bench_quadruped_xpbd.py (1)

Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.

asv/benchmarks/simulation/bench_anymal.py (1)

Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.

asv/benchmarks/simulation/bench_selection.py (1)

Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.

asv/benchmarks/setup/bench_model.py (1)

Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.

🧬 Code Graph Analysis (1)
asv/benchmarks/simulation/bench_cloth.py (6)
asv/benchmarks/simulation/bench_anymal.py (2)
  • setup (26-28)
  • time_simulate (31-34)
asv/benchmarks/compilation/bench_example_load.py (6)
  • setup (29-31)
  • setup (60-62)
  • setup (90-92)
  • setup (119-121)
  • setup (149-151)
  • setup (181-183)
asv/benchmarks/simulation/bench_quadruped_xpbd.py (2)
  • setup (26-28)
  • time_simulate (31-34)
asv/benchmarks/simulation/bench_selection.py (2)
  • setup (26-28)
  • time_simulate (31-34)
newton/solvers/vbd/solver_vbd.py (1)
  • rebuild_bvh (2787-2797)
newton/examples/example_robot_manipulating_cloth.py (1)
  • capture_cuda_graph (350-355)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
🔇 Additional comments (33)
pyproject.toml (1)

128-128: LGTM: Appropriate linter configuration for benchmark restructuring.

Adding "E402" to the ignore list for benchmark files is sensible given the extensive reorganization and new benchmarking patterns that may require non-standard import positioning.

asv/benchmarks/simulation/bench_anymal.py (1)

22-22: LGTM: Consistent naming convention applied.

The class rename to FastExampleAnymalPretrained follows the new naming pattern for in-process simulation benchmarks established in this refactoring.

asv/benchmarks/simulation/bench_quadruped_xpbd.py (1)

22-34: LGTM: Well-structured benchmark following established patterns.

The implementation correctly follows the standard benchmark pattern with proper CUDA device checking, appropriate scaling (200 environments, 1000 frames), and device synchronization. The absence of explicit wp.init() is appropriate as per Warp benchmarking best practices.

asv/benchmarks/simulation/bench_selection.py (1)

22-22: LGTM: Appropriate naming and configuration.

The class rename to FastExampleSelectionCartpoleMuJoCo follows the new naming convention, and the use of CUDA graphs provides appropriate performance optimization for benchmarking.

asv/benchmarks/simulation/bench_cloth.py (1)

42-42: LGTM: Consistent class renaming.

The rename to FastExampleClothSelfContactVBD follows the established naming convention for fast simulation benchmarks.

asv/benchmarks/setup/bench_model.py (7)

1-24: LGTM! Clean imports and configuration.

The license header is properly formatted, imports are appropriate, and the Warp configuration to disable backward mode is correct for benchmarking scenarios.


25-34: LGTM! Well-configured KPI benchmark class.

The parameter setup with various robot types and high environment counts (4096, 8192) is appropriate for KPI tracking. The timeout of 1800 seconds provides adequate time for initialization of large-scale environments.


35-37: LGTM! Proper Warp initialization in setup.

Based on the retrieved learnings, explicit wp.init() calls in setup methods are helpful when benchmarking Warp API calls, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.


38-45: LGTM! Proper benchmark implementation with device synchronization.

The benchmark correctly:

  • Skips when no CUDA device is available
  • Uses consistent parameters (randomize=True, seed=123) for reproducible results
  • Synchronizes the device after model finalization for accurate timing

47-55: LGTM! Consistent Fast benchmark configuration.

The FastInitializeModel class uses smaller environment counts (128, 256) appropriate for quick benchmarks, maintaining consistency with the KPI class structure.


56-66: LGTM! Consistent implementation across benchmark classes.

The setup method and time_initialize_model method mirror the KPI class implementation, ensuring consistency in benchmarking approach.


67-73: LGTM! CPU memory benchmark addition.

The CPU-specific memory benchmark provides valuable insights into memory usage patterns during model initialization, complementing the GPU timing benchmarks.

asv/benchmarks/compilation/bench_example_load.py (8)

1-22: LGTM! Clean imports and proper structure.

The license header is correctly formatted and imports are appropriate for subprocess-based benchmarking.


23-32: LGTM! Proper cache clearing in setup.

Clearing both LTO and kernel caches ensures clean benchmarking conditions for compilation/load time measurements.


33-52: LGTM! Well-structured subprocess benchmark.

The benchmark properly:

  • Skips when no CUDA device is available
  • Uses subprocess.run with proper error handling (check=True)
  • Captures both stdout and stderr for debugging
  • Uses appropriate command-line arguments for headless single-frame execution

54-83: LGTM! Consistent benchmark implementation.

The CartpoleMuJoCo benchmark follows the same pattern with appropriate command-line arguments including --no-use-cuda-graph for load testing.


85-112: LGTM! Consistent structure maintained.

The ClothManipulation benchmark maintains the same reliable pattern of cache clearing, subprocess execution, and output capture.


114-141: LGTM! Another consistent benchmark implementation.

The ClothSelfContactVBD benchmark follows the established pattern correctly.


143-173: LGTM! Proper humanoid benchmark configuration.

The HumanoidMuJoCo benchmark includes appropriate flags (--headless, --no-use-cuda-graph) and timeout configuration for this more complex simulation.


175-203: LGTM! Complete and consistent benchmark suite.

The QuadrupedXPBD benchmark completes the suite with consistent implementation and appropriate timeout settings.

asv/benchmarks/simulation/bench_mujoco.py (13)

1-25: LGTM! Proper imports and configuration.

The license header, imports, and Warp configuration are correctly set up for MuJoCo simulation benchmarking.


27-51: LGTM! Well-configured Fast benchmark with proper builder caching.

The FastAnt class properly:

  • Caches the builder to avoid redundant construction
  • Uses appropriate parameters for fast benchmarking
  • Synchronizes device after setup for accurate timing

52-57: LGTM! Proper simulation timing with device synchronization.

The timing method correctly synchronizes the device before and after simulation steps to ensure accurate GPU timing measurements.


59-71: LGTM! Efficient builder caching strategy.

The KpiAnt class uses a dictionary-based caching approach to avoid rebuilding models for different environment counts, which is more efficient than the single builder approach in Fast classes.


72-97: LGTM! Comprehensive KPI tracking with proper normalization.

The track_simulate method correctly:

  • Runs multiple samples for statistical reliability
  • Measures total time with proper device synchronization
  • Normalizes by frames, substeps, environments, and samples
  • Returns meaningful per-environment-step timing in milliseconds

99-129: LGTM! Consistent Fast benchmark implementation.

The FastCartpole class maintains the same reliable pattern as FastAnt with appropriate parameter adjustments.


131-169: LGTM! Consistent KPI benchmark with adjusted parameters.

The KpiCartpole class uses fewer frames (50 vs 100) which is appropriate for the simpler cartpole model while maintaining the same measurement methodology.


171-201: LGTM! Appropriate adjustments for more complex robot.

The FastG1 class reduces frames to 25 and repeats to 2, which is sensible for the more computationally intensive G1 humanoid robot.


203-242: LGTM! Well-configured KPI tracking for complex robot.

The KpiG1 class appropriately reduces samples to 2 and adds a 900-second timeout for the more complex G1 robot simulations.


244-274: LGTM! Consistent configuration for H1 robot.

The FastH1 class uses the same conservative parameters as G1, which is appropriate for another complex humanoid robot.


276-315: LGTM! Matching KPI configuration for H1.

The KpiH1 class maintains consistency with KpiG1 in terms of samples and timeout, ensuring fair comparison between similar complexity robots.


317-347: LGTM! Standard humanoid benchmark configuration.

The FastHumanoid class uses more frames (50) and repeats (8) compared to G1/H1, suggesting this might be a less complex humanoid model.


349-387: LGTM! Complete benchmark suite with proper normalization.

The KpiHumanoid class completes the benchmark suite with higher frame count (100) and samples (4), consistent with it being a standard humanoid model. The metric calculation maintains the same normalization approach across all KPI classes.

Comment thread asv/benchmarks/simulation/bench_cloth.py Outdated
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
@shi-eric shi-eric force-pushed the dev/kvilella/update_benchmark_scripts branch from 6a7e055 to 3ffb2ec Compare July 26, 2025 18:38

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
asv/benchmarks/simulation/bench_cloth.py (1)

42-65: Add device synchronization for accurate timing measurement.

The simulation loop lacks wp.synchronize_device() after completion, which is necessary for accurate GPU timing measurement in benchmarks.

Apply this diff to add device synchronization:

     def time_simulate(self):
         for i in range(self.num_frames):
             self.example.step()
 
             if (
                 i != 0
                 and not i % self.example.bvh_rebuild_frames
                 and self.example.use_cuda_graph
                 and self.example.solver.handle_self_contact
             ):
                 self.example.solver.rebuild_bvh(self.example.state_0)
                 with wp.ScopedCapture() as capture:
                     self.example.simulate_substeps()
                 self.example.cuda_graph = capture.graph
+        wp.synchronize_device()

Additionally, consider whether the BVH rebuilding and CUDA graph operations should be moved to setup() or measured separately to ensure consistent benchmark timing across all iterations.

♻️ Duplicate comments (1)
asv/benchmarks/simulation/bench_cloth.py (1)

23-40: Add device synchronization and move expensive operations out of timing loop.

The issues identified in the previous review are still present:

  1. Missing wp.synchronize_device() call after the simulation loop (line 35), which is needed for accurate timing measurement.
  2. BVH rebuilding and CUDA graph capture (lines 37-39) occur inside the timing loop every 10 frames, causing inconsistent benchmark results.

Apply this diff to add device synchronization:

     def time_simulate(self):
         for frame_idx in range(self.num_frames):
             self.example.step()
 
             if self.example.cloth_solver and not (frame_idx % 10):
                 self.example.cloth_solver.rebuild_bvh(self.example.state_0)
                 self.example.capture_cuda_graph()
+        wp.synchronize_device()

Consider moving BVH rebuilding and CUDA graph capture to setup() or measuring them separately for consistent timing.

🧹 Nitpick comments (1)
asv/benchmarks/setup/bench_model.py (1)

49-57: Consider adding timeout configuration for consistency.

The FastInitializeModel class is well-implemented, but it's missing a timeout configuration unlike KpiInitializeModel. Consider adding a timeout to handle potential long-running CPU model initializations.

 class FastInitializeModel:
     params = (["humanoid", "g1", "h1", "cartpole", "ant", "quadruped"], [256, 512])
     param_names = ["robot", "num_envs"]
 
     rounds = 1
     number = 1
     repeat = 3
     min_run_count = 1
+    timeout = 1800
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a7e0556635a66f1099ab2c602c55887f02ab016 and 3ffb2ec8464c1779f17362edb6ed85a964c0e082.

📒 Files selected for processing (12)
  • asv/benchmarks/compilation/bench_example_load.py (1 hunks)
  • asv/benchmarks/examples/example_cartpole.py (0 hunks)
  • asv/benchmarks/examples/example_cloth_manipulation.py (0 hunks)
  • asv/benchmarks/examples/example_humanoid.py (0 hunks)
  • asv/benchmarks/examples/example_quadruped.py (0 hunks)
  • asv/benchmarks/setup/bench_model.py (1 hunks)
  • asv/benchmarks/simulation/bench_anymal.py (1 hunks)
  • asv/benchmarks/simulation/bench_cloth.py (1 hunks)
  • asv/benchmarks/simulation/bench_mujoco.py (1 hunks)
  • asv/benchmarks/simulation/bench_quadruped_xpbd.py (1 hunks)
  • asv/benchmarks/simulation/bench_selection.py (1 hunks)
  • pyproject.toml (1 hunks)
💤 Files with no reviewable changes (4)
  • asv/benchmarks/examples/example_cloth_manipulation.py
  • asv/benchmarks/examples/example_cartpole.py
  • asv/benchmarks/examples/example_quadruped.py
  • asv/benchmarks/examples/example_humanoid.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • pyproject.toml
  • asv/benchmarks/simulation/bench_selection.py
  • asv/benchmarks/simulation/bench_anymal.py
  • asv/benchmarks/simulation/bench_quadruped_xpbd.py
  • asv/benchmarks/compilation/bench_example_load.py
  • asv/benchmarks/simulation/bench_mujoco.py
🧰 Additional context used
🧠 Learnings (1)
asv/benchmarks/setup/bench_model.py (1)

Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (4)
asv/benchmarks/simulation/bench_cloth.py (1)

20-20: LGTM: Import aligns with refactoring approach.

The import of ExampleClothManipulation supports the transition from subprocess-based to in-process benchmarking.

asv/benchmarks/setup/bench_model.py (3)

1-25: LGTM! Clean imports and configuration.

The license header, imports, and Warp configuration are properly set up for benchmarking. Disabling backward mode with wp.config.enable_backward = False is appropriate for performance benchmarks.


27-47: Well-structured KPI benchmark class.

The implementation correctly follows ASV benchmarking patterns and aligns with retrieved learnings about explicit wp.init() calls in setup methods to exclude initialization overhead from timing measurements. The parameter combinations and timeout are appropriate for KPI-level benchmarks.


69-79: Excellent CPU memory measurement implementation.

The peakmem_initialize_model_cpu method is well-designed with proper memory cleanup (gc.collect()), CPU-only execution scoping, and resource cleanup. The absence of CUDA device requirement is appropriate for CPU memory benchmarking.

@shi-eric shi-eric force-pushed the dev/kvilella/update_benchmark_scripts branch from 3ffb2ec to c57bd40 Compare July 26, 2025 18:54
Signed-off-by: Eric Shi <ershi@nvidia.com>
@shi-eric shi-eric force-pushed the dev/kvilella/update_benchmark_scripts branch from c57bd40 to 6972d08 Compare July 26, 2025 19:00
@shi-eric

shi-eric commented Jul 26, 2025

Copy link
Copy Markdown
Member Author

Fast tier

[100.00%] ·· ============================================================================= ================
                                               benchmark                                    total duration 
             ----------------------------------------------------------------------------- ----------------
                   simulation.bench_anymal.FastExampleAnymalPretrained.time_simulate            1.87m      
                      setup.bench_model.FastInitializeModel.time_initialize_model               1.78m      
                   simulation.bench_cloth.FastExampleClothManipulation.time_simulate            1.62m      
                              simulation.bench_mujoco.FastG1.time_simulate                      1.36m      
                             simulation.bench_mujoco.FastAnt.time_simulate                      1.05m      
                              simulation.bench_mujoco.FastH1.time_simulate                      59.9s      
                           simulation.bench_mujoco.FastHumanoid.time_simulate                   57.2s      
                 simulation.bench_quadruped_xpbd.FastExampleQuadrupedXPBD.time_simulate         52.0s      
                           simulation.bench_mujoco.FastCartpole.time_simulate                   46.8s      
              simulation.bench_selection.FastExampleSelectionCartpoleMuJoCo.time_simulate       46.4s      
                   setup.bench_model.FastInitializeModel.peakmem_initialize_model_cpu           45.2s      
                  simulation.bench_cloth.FastExampleClothSelfContactVBD.time_simulate           25.6s      
                                                <build>                                         11.3s      
                                   <setup_cache setup.bench_model:58>                           1.79s      
                                                 total                                          13.5m      
             ============================================================================= ================

Kpi tier

[100.00%] ·· ============================================================ ================
                                      benchmark                            total duration 
             ------------------------------------------------------------ ----------------
              setup.bench_model.KpiInitializeModel.time_initialize_model       31.0m      
                     simulation.bench_mujoco.KpiG1.track_simulate              9.27m      
                     simulation.bench_mujoco.KpiH1.track_simulate              8.46m      
                    simulation.bench_mujoco.KpiAnt.track_simulate              4.77m      
                  simulation.bench_mujoco.KpiCartpole.track_simulate           2.41m      
                  simulation.bench_mujoco.KpiHumanoid.track_simulate           1.91m      
                                       <build>                                 11.3s      
                                        total                                  58.1m      
             ============================================================ ================

@shi-eric shi-eric added this to the Alpha MVP milestone Jul 26, 2025
@shi-eric shi-eric marked this pull request as ready for review July 26, 2025 20:41
@shi-eric shi-eric merged commit 03b0d44 into newton-physics:main Jul 26, 2025
10 checks passed
@shi-eric shi-eric deleted the dev/kvilella/update_benchmark_scripts branch July 26, 2025 20:42
@coderabbitai coderabbitai Bot mentioned this pull request Sep 2, 2025
4 tasks
eric-heiden pushed a commit to eric-heiden/newton that referenced this pull request Jan 28, 2026
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Eric Shi <ershi@nvidia.com>
Co-authored-by: Kenny Vilella <kvilella@nvidia.com>
vidurv-nvidia pushed a commit to vidurv-nvidia/newton that referenced this pull request Mar 6, 2026
# Description

<!--
Add a how-to guide for setting-up and explaining fabric cloning and
stage in memory.
-->

Please include a summary of the change and which issue is fixed. Please
also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->

## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- This change requires a documentation update

## Screenshots

Please attach before and after screenshots of the change if applicable.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------

Signed-off-by: rwiltz <165190220+rwiltz@users.noreply.github.com>
Signed-off-by: Kelly Guo <kellyguo123@hotmail.com>
Signed-off-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com>
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
Signed-off-by: Michael Gussert <michael@gussert.com>
Signed-off-by: samibouziri <79418773+samibouziri@users.noreply.github.com>
Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: Kyle Morgenstein <34984693+KyleM73@users.noreply.github.com>
Signed-off-by: Hongyu Li <lihongyu0807@icloud.com>
Signed-off-by: Toni-SM <toni.semu@gmail.com>
Signed-off-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com>
Signed-off-by: Pascal Roth <57946385+pascal-roth@users.noreply.github.com>
Signed-off-by: Victor Khaustov <3192677+vi3itor@users.noreply.github.com>
Signed-off-by: AlvinC <alvincny529@gmail.com>
Signed-off-by: Tyler Lum <tylergwlum@gmail.com>
Signed-off-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Signed-off-by: renaudponcelet <renaud.poncelet@gmail.com>
Signed-off-by: matthewtrepte <mtrepte@nvidia.com>
Co-authored-by: lotusl-code <lotusl@nvidia.com>
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Co-authored-by: jaczhangnv <jaczhang@nvidia.com>
Co-authored-by: rwiltz <165190220+rwiltz@users.noreply.github.com>
Co-authored-by: Yanzi Zhu <yanziz@nvidia.com>
Co-authored-by: nv-mhaselton <mhaselton@nvidia.com>
Co-authored-by: cosmith-nvidia <141183495+cosmith-nvidia@users.noreply.github.com>
Co-authored-by: Michael Gussert <michael@gussert.com>
Co-authored-by: CY Chen <cyc@nvidia.com>
Co-authored-by: oahmednv <oahmed@Nvidia.com>
Co-authored-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com>
Co-authored-by: Rafael Wiltz <rwiltz@nvidia.com>
Co-authored-by: Peter Du <peterd@nvidia.com>
Co-authored-by: chengronglai <chengrongl@nvidia.com>
Co-authored-by: pulkitg01 <pulkitg@nvidia.com>
Co-authored-by: Connor Smith <cosmith@nvidia.com>
Co-authored-by: Ashwin Varghese Kuruttukulam <ashwinvk@nvidia.com>
Co-authored-by: Kelly Guo <kellyguo123@hotmail.com>
Co-authored-by: shauryadNv <shauryad@nvidia.com>
Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Co-authored-by: samibouziri <79418773+samibouziri@users.noreply.github.com>
Co-authored-by: James Smith <142246516+jsmith-bdai@users.noreply.github.com>
Co-authored-by: Shundo Kishi <syundo0730@gmail.com>
Co-authored-by: Sheikh Dawood <sabdulajees@nvidia.com>
Co-authored-by: Toni-SM <aserranomuno@nvidia.com>
Co-authored-by: Gonglitian <70052908+Gonglitian@users.noreply.github.com>
Co-authored-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com>
Co-authored-by: Mayank Mittal <mittalma@leggedrobotics.com>
Co-authored-by: Kyle Morgenstein <34984693+KyleM73@users.noreply.github.com>
Co-authored-by: Johnson Sun <20457146+j3soon@users.noreply.github.com>
Co-authored-by: Pascal Roth <57946385+pascal-roth@users.noreply.github.com>
Co-authored-by: Hongyu Li <lihongyu0807@icloud.com>
Co-authored-by: Jean-Francois-Lafleche <57650687+Jean-Francois-Lafleche@users.noreply.github.com>
Co-authored-by: Wei Jinqi <changshanshi@outlook.com>
Co-authored-by: Louis LE LAY <le.lay.louis@gmail.com>
Co-authored-by: Harsh Patel <hapatel@theaiinstitute.com>
Co-authored-by: Kousheek Chakraborty <kousheekc@gmail.com>
Co-authored-by: Victor Khaustov <3192677+vi3itor@users.noreply.github.com>
Co-authored-by: AlvinC <alvincny529@gmail.com>
Co-authored-by: Felipe Mohr <50018670+felipemohr@users.noreply.github.com>
Co-authored-by: AdAstra7 <87345760+likecanyon@users.noreply.github.com>
Co-authored-by: gao <ziqi.gao@iff-extern.fraunhofer.de>
Co-authored-by: Tyler Lum <tylergwlum@gmail.com>
Co-authored-by: -T.K.- <t_k_233@outlook.com>
Co-authored-by: Clemens Schwarke <96480707+ClemensSchwarke@users.noreply.github.com>
Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Co-authored-by: Miguel Alonso Jr. <miguel.alonso@nfinite.app>
Co-authored-by: renaudponcelet <renaud.poncelet@gmail.com>
Co-authored-by: Ales Borovicka <aborovicka@nvidia.com>
Co-authored-by: nv-mm <mmagdics@nvidia.com>
Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
mmacklin pushed a commit to mmacklin/newton that referenced this pull request Apr 7, 2026
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Eric Shi <ershi@nvidia.com>
Co-authored-by: Kenny Vilella <kvilella@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants