Mass rearrangement of the ASV benchmark suite#475
Conversation
📝 Walkthrough""" WalkthroughThis change refactors and reorganizes the ASV benchmarking suite for the Newton project. It removes several example-specific benchmark files, consolidates subprocess-based load benchmarks into a single module, introduces new in-process simulation benchmarks, and adds comprehensive benchmarking modules for model initialization and MuJoCo-based simulations. Linter configuration is also updated. Changes
Sequence Diagram(s)sequenceDiagram
participant BenchmarkClass
participant Warp
participant ExampleScript
participant Subprocess
participant CUDA_Device
BenchmarkClass->>Warp: Clear LTO and kernel caches (setup)
alt Subprocess-based benchmark
BenchmarkClass->>Subprocess: Run ExampleScript (1 frame, headless)
Subprocess->>CUDA_Device: Execute simulation
Subprocess-->>BenchmarkClass: Return output
else In-process simulation benchmark
BenchmarkClass->>ExampleScript: Initialize simulation (N envs)
loop For each frame
ExampleScript->>CUDA_Device: Step simulation
opt Cloth solver
ExampleScript->>ExampleScript: Rebuild BVH every 10 frames
end
end
ExampleScript-->>BenchmarkClass: Complete timing
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yml 📥 CommitsReviewing files that changed from the base of the PR and between 3ffb2ec8464c1779f17362edb6ed85a964c0e082 and 6972d08. 📒 Files selected for processing (12)
💤 Files with no reviewable changes (4)
🚧 Files skipped from review as they are similar to previous changes (8)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between debe21d and 6a7e0556635a66f1099ab2c602c55887f02ab016.
📒 Files selected for processing (12)
asv/benchmarks/compilation/bench_example_load.py(1 hunks)asv/benchmarks/examples/example_cartpole.py(0 hunks)asv/benchmarks/examples/example_cloth_manipulation.py(0 hunks)asv/benchmarks/examples/example_humanoid.py(0 hunks)asv/benchmarks/examples/example_quadruped.py(0 hunks)asv/benchmarks/setup/bench_model.py(1 hunks)asv/benchmarks/simulation/bench_anymal.py(1 hunks)asv/benchmarks/simulation/bench_cloth.py(1 hunks)asv/benchmarks/simulation/bench_mujoco.py(1 hunks)asv/benchmarks/simulation/bench_quadruped_xpbd.py(1 hunks)asv/benchmarks/simulation/bench_selection.py(1 hunks)pyproject.toml(1 hunks)
💤 Files with no reviewable changes (4)
- asv/benchmarks/examples/example_cartpole.py
- asv/benchmarks/examples/example_quadruped.py
- asv/benchmarks/examples/example_humanoid.py
- asv/benchmarks/examples/example_cloth_manipulation.py
🧰 Additional context used
🧠 Learnings (4)
asv/benchmarks/simulation/bench_quadruped_xpbd.py (1)
Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.
asv/benchmarks/simulation/bench_anymal.py (1)
Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.
asv/benchmarks/simulation/bench_selection.py (1)
Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.
asv/benchmarks/setup/bench_model.py (1)
Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.
🧬 Code Graph Analysis (1)
asv/benchmarks/simulation/bench_cloth.py (6)
asv/benchmarks/simulation/bench_anymal.py (2)
setup(26-28)time_simulate(31-34)asv/benchmarks/compilation/bench_example_load.py (6)
setup(29-31)setup(60-62)setup(90-92)setup(119-121)setup(149-151)setup(181-183)asv/benchmarks/simulation/bench_quadruped_xpbd.py (2)
setup(26-28)time_simulate(31-34)asv/benchmarks/simulation/bench_selection.py (2)
setup(26-28)time_simulate(31-34)newton/solvers/vbd/solver_vbd.py (1)
rebuild_bvh(2787-2797)newton/examples/example_robot_manipulating_cloth.py (1)
capture_cuda_graph(350-355)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
🔇 Additional comments (33)
pyproject.toml (1)
128-128: LGTM: Appropriate linter configuration for benchmark restructuring.Adding "E402" to the ignore list for benchmark files is sensible given the extensive reorganization and new benchmarking patterns that may require non-standard import positioning.
asv/benchmarks/simulation/bench_anymal.py (1)
22-22: LGTM: Consistent naming convention applied.The class rename to
FastExampleAnymalPretrainedfollows the new naming pattern for in-process simulation benchmarks established in this refactoring.asv/benchmarks/simulation/bench_quadruped_xpbd.py (1)
22-34: LGTM: Well-structured benchmark following established patterns.The implementation correctly follows the standard benchmark pattern with proper CUDA device checking, appropriate scaling (200 environments, 1000 frames), and device synchronization. The absence of explicit
wp.init()is appropriate as per Warp benchmarking best practices.asv/benchmarks/simulation/bench_selection.py (1)
22-22: LGTM: Appropriate naming and configuration.The class rename to
FastExampleSelectionCartpoleMuJoCofollows the new naming convention, and the use of CUDA graphs provides appropriate performance optimization for benchmarking.asv/benchmarks/simulation/bench_cloth.py (1)
42-42: LGTM: Consistent class renaming.The rename to
FastExampleClothSelfContactVBDfollows the established naming convention for fast simulation benchmarks.asv/benchmarks/setup/bench_model.py (7)
1-24: LGTM! Clean imports and configuration.The license header is properly formatted, imports are appropriate, and the Warp configuration to disable backward mode is correct for benchmarking scenarios.
25-34: LGTM! Well-configured KPI benchmark class.The parameter setup with various robot types and high environment counts (4096, 8192) is appropriate for KPI tracking. The timeout of 1800 seconds provides adequate time for initialization of large-scale environments.
35-37: LGTM! Proper Warp initialization in setup.Based on the retrieved learnings, explicit
wp.init()calls in setup methods are helpful when benchmarking Warp API calls, aswp.init()has non-trivial overhead that should be excluded from the benchmark timing.
38-45: LGTM! Proper benchmark implementation with device synchronization.The benchmark correctly:
- Skips when no CUDA device is available
- Uses consistent parameters (randomize=True, seed=123) for reproducible results
- Synchronizes the device after model finalization for accurate timing
47-55: LGTM! Consistent Fast benchmark configuration.The FastInitializeModel class uses smaller environment counts (128, 256) appropriate for quick benchmarks, maintaining consistency with the KPI class structure.
56-66: LGTM! Consistent implementation across benchmark classes.The setup method and time_initialize_model method mirror the KPI class implementation, ensuring consistency in benchmarking approach.
67-73: LGTM! CPU memory benchmark addition.The CPU-specific memory benchmark provides valuable insights into memory usage patterns during model initialization, complementing the GPU timing benchmarks.
asv/benchmarks/compilation/bench_example_load.py (8)
1-22: LGTM! Clean imports and proper structure.The license header is correctly formatted and imports are appropriate for subprocess-based benchmarking.
23-32: LGTM! Proper cache clearing in setup.Clearing both LTO and kernel caches ensures clean benchmarking conditions for compilation/load time measurements.
33-52: LGTM! Well-structured subprocess benchmark.The benchmark properly:
- Skips when no CUDA device is available
- Uses subprocess.run with proper error handling (check=True)
- Captures both stdout and stderr for debugging
- Uses appropriate command-line arguments for headless single-frame execution
54-83: LGTM! Consistent benchmark implementation.The CartpoleMuJoCo benchmark follows the same pattern with appropriate command-line arguments including
--no-use-cuda-graphfor load testing.
85-112: LGTM! Consistent structure maintained.The ClothManipulation benchmark maintains the same reliable pattern of cache clearing, subprocess execution, and output capture.
114-141: LGTM! Another consistent benchmark implementation.The ClothSelfContactVBD benchmark follows the established pattern correctly.
143-173: LGTM! Proper humanoid benchmark configuration.The HumanoidMuJoCo benchmark includes appropriate flags (
--headless,--no-use-cuda-graph) and timeout configuration for this more complex simulation.
175-203: LGTM! Complete and consistent benchmark suite.The QuadrupedXPBD benchmark completes the suite with consistent implementation and appropriate timeout settings.
asv/benchmarks/simulation/bench_mujoco.py (13)
1-25: LGTM! Proper imports and configuration.The license header, imports, and Warp configuration are correctly set up for MuJoCo simulation benchmarking.
27-51: LGTM! Well-configured Fast benchmark with proper builder caching.The FastAnt class properly:
- Caches the builder to avoid redundant construction
- Uses appropriate parameters for fast benchmarking
- Synchronizes device after setup for accurate timing
52-57: LGTM! Proper simulation timing with device synchronization.The timing method correctly synchronizes the device before and after simulation steps to ensure accurate GPU timing measurements.
59-71: LGTM! Efficient builder caching strategy.The KpiAnt class uses a dictionary-based caching approach to avoid rebuilding models for different environment counts, which is more efficient than the single builder approach in Fast classes.
72-97: LGTM! Comprehensive KPI tracking with proper normalization.The track_simulate method correctly:
- Runs multiple samples for statistical reliability
- Measures total time with proper device synchronization
- Normalizes by frames, substeps, environments, and samples
- Returns meaningful per-environment-step timing in milliseconds
99-129: LGTM! Consistent Fast benchmark implementation.The FastCartpole class maintains the same reliable pattern as FastAnt with appropriate parameter adjustments.
131-169: LGTM! Consistent KPI benchmark with adjusted parameters.The KpiCartpole class uses fewer frames (50 vs 100) which is appropriate for the simpler cartpole model while maintaining the same measurement methodology.
171-201: LGTM! Appropriate adjustments for more complex robot.The FastG1 class reduces frames to 25 and repeats to 2, which is sensible for the more computationally intensive G1 humanoid robot.
203-242: LGTM! Well-configured KPI tracking for complex robot.The KpiG1 class appropriately reduces samples to 2 and adds a 900-second timeout for the more complex G1 robot simulations.
244-274: LGTM! Consistent configuration for H1 robot.The FastH1 class uses the same conservative parameters as G1, which is appropriate for another complex humanoid robot.
276-315: LGTM! Matching KPI configuration for H1.The KpiH1 class maintains consistency with KpiG1 in terms of samples and timeout, ensuring fair comparison between similar complexity robots.
317-347: LGTM! Standard humanoid benchmark configuration.The FastHumanoid class uses more frames (50) and repeats (8) compared to G1/H1, suggesting this might be a less complex humanoid model.
349-387: LGTM! Complete benchmark suite with proper normalization.The KpiHumanoid class completes the benchmark suite with higher frame count (100) and samples (4), consistent with it being a standard humanoid model. The metric calculation maintains the same normalization approach across all KPI classes.
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com>
6a7e055 to
3ffb2ec
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
asv/benchmarks/simulation/bench_cloth.py (1)
42-65: Add device synchronization for accurate timing measurement.The simulation loop lacks
wp.synchronize_device()after completion, which is necessary for accurate GPU timing measurement in benchmarks.Apply this diff to add device synchronization:
def time_simulate(self): for i in range(self.num_frames): self.example.step() if ( i != 0 and not i % self.example.bvh_rebuild_frames and self.example.use_cuda_graph and self.example.solver.handle_self_contact ): self.example.solver.rebuild_bvh(self.example.state_0) with wp.ScopedCapture() as capture: self.example.simulate_substeps() self.example.cuda_graph = capture.graph + wp.synchronize_device()Additionally, consider whether the BVH rebuilding and CUDA graph operations should be moved to
setup()or measured separately to ensure consistent benchmark timing across all iterations.
♻️ Duplicate comments (1)
asv/benchmarks/simulation/bench_cloth.py (1)
23-40: Add device synchronization and move expensive operations out of timing loop.The issues identified in the previous review are still present:
- Missing
wp.synchronize_device()call after the simulation loop (line 35), which is needed for accurate timing measurement.- BVH rebuilding and CUDA graph capture (lines 37-39) occur inside the timing loop every 10 frames, causing inconsistent benchmark results.
Apply this diff to add device synchronization:
def time_simulate(self): for frame_idx in range(self.num_frames): self.example.step() if self.example.cloth_solver and not (frame_idx % 10): self.example.cloth_solver.rebuild_bvh(self.example.state_0) self.example.capture_cuda_graph() + wp.synchronize_device()Consider moving BVH rebuilding and CUDA graph capture to
setup()or measuring them separately for consistent timing.
🧹 Nitpick comments (1)
asv/benchmarks/setup/bench_model.py (1)
49-57: Consider adding timeout configuration for consistency.The FastInitializeModel class is well-implemented, but it's missing a timeout configuration unlike KpiInitializeModel. Consider adding a timeout to handle potential long-running CPU model initializations.
class FastInitializeModel: params = (["humanoid", "g1", "h1", "cartpole", "ant", "quadruped"], [256, 512]) param_names = ["robot", "num_envs"] rounds = 1 number = 1 repeat = 3 min_run_count = 1 + timeout = 1800
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 6a7e0556635a66f1099ab2c602c55887f02ab016 and 3ffb2ec8464c1779f17362edb6ed85a964c0e082.
📒 Files selected for processing (12)
asv/benchmarks/compilation/bench_example_load.py(1 hunks)asv/benchmarks/examples/example_cartpole.py(0 hunks)asv/benchmarks/examples/example_cloth_manipulation.py(0 hunks)asv/benchmarks/examples/example_humanoid.py(0 hunks)asv/benchmarks/examples/example_quadruped.py(0 hunks)asv/benchmarks/setup/bench_model.py(1 hunks)asv/benchmarks/simulation/bench_anymal.py(1 hunks)asv/benchmarks/simulation/bench_cloth.py(1 hunks)asv/benchmarks/simulation/bench_mujoco.py(1 hunks)asv/benchmarks/simulation/bench_quadruped_xpbd.py(1 hunks)asv/benchmarks/simulation/bench_selection.py(1 hunks)pyproject.toml(1 hunks)
💤 Files with no reviewable changes (4)
- asv/benchmarks/examples/example_cloth_manipulation.py
- asv/benchmarks/examples/example_cartpole.py
- asv/benchmarks/examples/example_quadruped.py
- asv/benchmarks/examples/example_humanoid.py
🚧 Files skipped from review as they are similar to previous changes (6)
- pyproject.toml
- asv/benchmarks/simulation/bench_selection.py
- asv/benchmarks/simulation/bench_anymal.py
- asv/benchmarks/simulation/bench_quadruped_xpbd.py
- asv/benchmarks/compilation/bench_example_load.py
- asv/benchmarks/simulation/bench_mujoco.py
🧰 Additional context used
🧠 Learnings (1)
asv/benchmarks/setup/bench_model.py (1)
Learnt from: shi-eric
PR: #461
File: asv/benchmarks/envs/example_humanoid.py:40-41
Timestamp: 2025-07-23T14:36:42.182Z
Learning: In Warp benchmarks, explicit wp.init() calls are not needed in most circumstances since the first Warp API call that requires initialization will automatically call wp.init(). Explicit wp.init() in setup() methods is helpful when the ASV benchmark is measuring a Warp API call, as wp.init() has non-trivial overhead that should be excluded from the benchmark timing.
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
- GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (4)
asv/benchmarks/simulation/bench_cloth.py (1)
20-20: LGTM: Import aligns with refactoring approach.The import of
ExampleClothManipulationsupports the transition from subprocess-based to in-process benchmarking.asv/benchmarks/setup/bench_model.py (3)
1-25: LGTM! Clean imports and configuration.The license header, imports, and Warp configuration are properly set up for benchmarking. Disabling backward mode with
wp.config.enable_backward = Falseis appropriate for performance benchmarks.
27-47: Well-structured KPI benchmark class.The implementation correctly follows ASV benchmarking patterns and aligns with retrieved learnings about explicit
wp.init()calls in setup methods to exclude initialization overhead from timing measurements. The parameter combinations and timeout are appropriate for KPI-level benchmarks.
69-79: Excellent CPU memory measurement implementation.The
peakmem_initialize_model_cpumethod is well-designed with proper memory cleanup (gc.collect()), CPU-only execution scoping, and resource cleanup. The absence of CUDA device requirement is appropriate for CPU memory benchmarking.
3ffb2ec to
c57bd40
Compare
Signed-off-by: Eric Shi <ershi@nvidia.com>
c57bd40 to
6972d08
Compare
|
|
Signed-off-by: Kenny Vilella <kvilella@nvidia.com> Signed-off-by: Eric Shi <ershi@nvidia.com> Co-authored-by: Kenny Vilella <kvilella@nvidia.com>
# Description <!-- Add a how-to guide for setting-up and explaining fabric cloning and stage in memory. --> Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. Fixes # (issue) <!-- As a practice, it is recommended to open an issue to have discussions on the proposed pull request. This makes it easier for the community to keep track of what is being developed or added, and if a given feature is demanded by more than one party. --> ## Type of change <!-- As you go through the list, delete the ones that are not applicable. --> - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update ## Screenshots Please attach before and after screenshots of the change if applicable. <!-- Example: | Before | After | | ------ | ----- | | _gif/png before_ | _gif/png after_ | To upload images to a PR -- simply drag and drop an image while in edit mode and it should upload the image directly. You can then paste that source into the above before/after sections. --> ## Checklist - [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task --> --------- Signed-off-by: rwiltz <165190220+rwiltz@users.noreply.github.com> Signed-off-by: Kelly Guo <kellyguo123@hotmail.com> Signed-off-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com> Signed-off-by: Kelly Guo <kellyg@nvidia.com> Signed-off-by: Michael Gussert <michael@gussert.com> Signed-off-by: samibouziri <79418773+samibouziri@users.noreply.github.com> Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Signed-off-by: Kyle Morgenstein <34984693+KyleM73@users.noreply.github.com> Signed-off-by: Hongyu Li <lihongyu0807@icloud.com> Signed-off-by: Toni-SM <toni.semu@gmail.com> Signed-off-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com> Signed-off-by: Pascal Roth <57946385+pascal-roth@users.noreply.github.com> Signed-off-by: Victor Khaustov <3192677+vi3itor@users.noreply.github.com> Signed-off-by: AlvinC <alvincny529@gmail.com> Signed-off-by: Tyler Lum <tylergwlum@gmail.com> Signed-off-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com> Signed-off-by: renaudponcelet <renaud.poncelet@gmail.com> Signed-off-by: matthewtrepte <mtrepte@nvidia.com> Co-authored-by: lotusl-code <lotusl@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: jaczhangnv <jaczhang@nvidia.com> Co-authored-by: rwiltz <165190220+rwiltz@users.noreply.github.com> Co-authored-by: Yanzi Zhu <yanziz@nvidia.com> Co-authored-by: nv-mhaselton <mhaselton@nvidia.com> Co-authored-by: cosmith-nvidia <141183495+cosmith-nvidia@users.noreply.github.com> Co-authored-by: Michael Gussert <michael@gussert.com> Co-authored-by: CY Chen <cyc@nvidia.com> Co-authored-by: oahmednv <oahmed@Nvidia.com> Co-authored-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com> Co-authored-by: Rafael Wiltz <rwiltz@nvidia.com> Co-authored-by: Peter Du <peterd@nvidia.com> Co-authored-by: chengronglai <chengrongl@nvidia.com> Co-authored-by: pulkitg01 <pulkitg@nvidia.com> Co-authored-by: Connor Smith <cosmith@nvidia.com> Co-authored-by: Ashwin Varghese Kuruttukulam <ashwinvk@nvidia.com> Co-authored-by: Kelly Guo <kellyguo123@hotmail.com> Co-authored-by: shauryadNv <shauryad@nvidia.com> Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Co-authored-by: samibouziri <79418773+samibouziri@users.noreply.github.com> Co-authored-by: James Smith <142246516+jsmith-bdai@users.noreply.github.com> Co-authored-by: Shundo Kishi <syundo0730@gmail.com> Co-authored-by: Sheikh Dawood <sabdulajees@nvidia.com> Co-authored-by: Toni-SM <aserranomuno@nvidia.com> Co-authored-by: Gonglitian <70052908+Gonglitian@users.noreply.github.com> Co-authored-by: James Tigue <166445701+jtigue-bdai@users.noreply.github.com> Co-authored-by: Mayank Mittal <mittalma@leggedrobotics.com> Co-authored-by: Kyle Morgenstein <34984693+KyleM73@users.noreply.github.com> Co-authored-by: Johnson Sun <20457146+j3soon@users.noreply.github.com> Co-authored-by: Pascal Roth <57946385+pascal-roth@users.noreply.github.com> Co-authored-by: Hongyu Li <lihongyu0807@icloud.com> Co-authored-by: Jean-Francois-Lafleche <57650687+Jean-Francois-Lafleche@users.noreply.github.com> Co-authored-by: Wei Jinqi <changshanshi@outlook.com> Co-authored-by: Louis LE LAY <le.lay.louis@gmail.com> Co-authored-by: Harsh Patel <hapatel@theaiinstitute.com> Co-authored-by: Kousheek Chakraborty <kousheekc@gmail.com> Co-authored-by: Victor Khaustov <3192677+vi3itor@users.noreply.github.com> Co-authored-by: AlvinC <alvincny529@gmail.com> Co-authored-by: Felipe Mohr <50018670+felipemohr@users.noreply.github.com> Co-authored-by: AdAstra7 <87345760+likecanyon@users.noreply.github.com> Co-authored-by: gao <ziqi.gao@iff-extern.fraunhofer.de> Co-authored-by: Tyler Lum <tylergwlum@gmail.com> Co-authored-by: -T.K.- <t_k_233@outlook.com> Co-authored-by: Clemens Schwarke <96480707+ClemensSchwarke@users.noreply.github.com> Co-authored-by: Miguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com> Co-authored-by: Miguel Alonso Jr. <miguel.alonso@nfinite.app> Co-authored-by: renaudponcelet <renaud.poncelet@gmail.com> Co-authored-by: Ales Borovicka <aborovicka@nvidia.com> Co-authored-by: nv-mm <mmagdics@nvidia.com> Co-authored-by: Antoine RICHARD <antoiner@nvidia.com>
Signed-off-by: Kenny Vilella <kvilella@nvidia.com> Signed-off-by: Eric Shi <ershi@nvidia.com> Co-authored-by: Kenny Vilella <kvilella@nvidia.com>
Description
Starting from @Kenny-Vilella's branch in #461, the benchmark files have been moved around into different subdirectories depending on the purpose.
Main ways to run:
Run everything:
Run the fast benchmarks (e.g. for merge requests)
Run the KPI-relevant benchmarks
Newton Migration Guide
Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this MR.
docs/migration.rstis up-to dateBefore your PR is "Ready for review"
newton/tests/test_examples.py)pre-commit run -aSummary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores
Removals