Skip to content

Add ring buffer support to RecorderModelAndState#789

Merged
eric-heiden merged 21 commits into
newton-physics:mainfrom
nvtw:dev/tw/ringbufer_sim_recorder
Oct 1, 2025
Merged

Add ring buffer support to RecorderModelAndState#789
eric-heiden merged 21 commits into
newton-physics:mainfrom
nvtw:dev/tw/ringbufer_sim_recorder

Conversation

@nvtw

@nvtw nvtw commented Sep 19, 2025

Copy link
Copy Markdown
Member

Description

Implements #775

Newton Migration Guide

Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this PR.

  • [~] The migration guide in docs/migration.rst is up-to date

Before your PR is "Ready for review"

  • Necessary tests have been added and new examples are tested (see newton/tests/test_examples.py)
  • Documentation is up-to-date
  • Code passes formatting and linting checks with pre-commit run -a

Summary by CodeRabbit

  • New Features

    • Optional capped recording history (fixed-size ring buffer) to limit memory use while preserving backward-compatible default behavior.
    • Payload-level deduplication and pointer-based references for large arrays/meshes/objects to shrink saved files and enable cross-run restoration, including forward-reference resolution during load.
  • Tests

    • Added comprehensive tests for ring buffer behavior, capped-history save/load and playback, serialization round-trips, and backward compatibility.

@coderabbitai

coderabbitai Bot commented Sep 19, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the primary change in this pull request by indicating that ring buffer support has been added to the RecorderModelAndState component, which directly reflects the implementation of the new RingBuffer-based history capping and its integration into the serialization workflow.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nvtw nvtw requested a review from eric-heiden September 19, 2025 07:10
@nvtw

nvtw commented Sep 19, 2025

Copy link
Copy Markdown
Member Author

@eric-heiden could you test the RecorderModelAndState with RingBuffer on a real world scene to see if it offers all the functionality you need?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
newton/tests/test_recorder.py (2)

107-111: Test clear() behavior more comprehensively.

After clearing the buffer, verify that attempting to access elements raises the expected IndexError to ensure the buffer is truly empty.

 # Test clear
 rb.clear()
 test.assertEqual(len(rb), 0)
 test.assertEqual(rb.to_list(), [])
+# Verify IndexError after clear
+with test.assertRaises(IndexError):
+    _ = rb[0]

207-207: Use strict=True for zip() to catch length mismatches.

When comparing collections that should have the same length, using strict=True helps catch bugs early.

-        for original_state_data, loaded_state_data in zip(recorder.history, new_recorder.history, strict=False):
+        for original_state_data, loaded_state_data in zip(recorder.history, new_recorder.history, strict=True):
newton/_src/utils/recorder.py (4)

110-113: Add return type annotation for __iter__.

For consistency and better type hints, add a return type annotation to the iterator method.

-    def __iter__(self):
+    def __iter__(self) -> Iterator[T]:
         """Iterate over items in order (oldest to newest)."""
         for i in range(self._size):
             yield self[i]

You'll also need to import Iterator:

from typing import Generic, TypeVar, Iterator

710-713: Consider adding type hints for the union type.

To improve type checking, consider using explicit type annotations for the history attribute.

+from typing import Union
+
 class RecorderModelAndState:
     """A class to record and playback simulation model and state using JSON serialization."""
+    
+    history: Union[list[dict], RingBuffer[dict]]
 
     def __init__(self, max_history_size: int | None = None):

700-709: Validate max_history_size parameter.

The constructor should validate that max_history_size, when provided, is a positive integer.

 def __init__(self, max_history_size: int | None = None):
     """
     Initializes the Recorder.
 
     Args:
         max_history_size (int | None): Maximum number of states to keep in history.
             If None, uses unlimited history (regular list). If specified, uses a
             ring buffer that keeps only the last N states. Default is None for
             backward compatibility.
     """
+    if max_history_size is not None and max_history_size <= 0:
+        raise ValueError(f"max_history_size must be positive, got {max_history_size}")
     if max_history_size is None:
         self.history: list[dict] = []
     else:
         self.history: RingBuffer[dict] = RingBuffer(max_history_size)

49-55: Consider validating capacity parameter in RingBuffer.init.

The RingBuffer constructor should validate that capacity is a positive integer.

 def __init__(self, capacity: int = 100):
     """
     Initialize the ring buffer.
 
     Args:
         capacity (int): Maximum number of items to store. Default is 100.
     """
+    if capacity <= 0:
+        raise ValueError(f"Capacity must be positive, got {capacity}")
     self.capacity = capacity
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08ff1bf and 130cac1.

📒 Files selected for processing (2)
  • newton/_src/utils/recorder.py (5 hunks)
  • newton/tests/test_recorder.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
newton/tests/test_recorder.py (1)
newton/_src/utils/recorder.py (14)
  • RingBuffer (41-129)
  • to_list (121-123)
  • append (61-70)
  • clear (115-119)
  • from_list (125-129)
  • RecorderModelAndState (697-871)
  • record_model (766-773)
  • record (613-626)
  • record (736-747)
  • save_to_file (647-660)
  • save_to_file (792-825)
  • load_from_file (662-694)
  • load_from_file (827-871)
  • playback_model (775-790)
🪛 Ruff (0.12.2)
newton/_src/utils/recorder.py

79-79: Avoid specifying long messages outside the exception class

(TRY003)


82-82: Avoid specifying long messages outside the exception class

(TRY003)


96-96: Avoid specifying long messages outside the exception class

(TRY003)


99-99: Avoid specifying long messages outside the exception class

(TRY003)

newton/tests/test_recorder.py

37-37: Unused function argument: device

(ARG001)


81-81: Unused function argument: device

(ARG001)


118-118: Unused function argument: device

(ARG001)


141-141: Unused function argument: device

(ARG001)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)
🔇 Additional comments (5)
newton/tests/test_recorder.py (1)

160-213: LGTM! Comprehensive ring buffer save/load test.

Excellent test coverage for the ring buffer save/load functionality. The test properly verifies:

  • Ring buffer capacity enforcement (only keeping last 3 states)
  • Serialization to JSON format
  • Deserialization with different buffer capacity
  • Model reconstruction and validation
  • State data consistency after round-trip

The cleanup in the finally block ensures no test artifacts are left behind.

newton/_src/utils/recorder.py (4)

41-130: LGTM! Well-designed RingBuffer implementation.

The RingBuffer class provides an excellent fixed-capacity circular buffer with:

  • Efficient O(1) append operations that overwrite oldest items when full
  • Proper logical-to-physical index mapping for both unfilled and circular states
  • Comprehensive list-like interface with getitem, setitem, len, and iteration
  • Clear documentation and appropriate error handling

The design elegantly handles the transition from unfilled to circular buffer states.


700-714: LGTM! Clean implementation of optional history size limit.

The RecorderModelAndState constructor elegantly handles backward compatibility by:

  • Defaulting to None for unlimited history (preserving existing behavior)
  • Using a regular list when max_history_size is None
  • Using RingBuffer when a capacity is specified

This design ensures existing code continues to work unchanged while enabling the new memory-limited feature.


813-815: LGTM! Proper serialization handling for RingBuffer.

The save logic correctly converts RingBuffer to a list for serialization, ensuring compatibility with both JSON and CBOR2 formats. This approach maintains the serialization format while supporting the new ring buffer feature.


864-871: LGTM! Proper deserialization handling for different history types.

The load logic correctly handles both regular list and RingBuffer history containers:

  • For RingBuffer: Uses from_list() to properly populate the circular buffer
  • For regular list: Direct assignment as before

This ensures loaded states respect the configured history capacity.

Comment thread newton/tests/test_recorder.py
@nvtw nvtw self-assigned this Sep 19, 2025
@preist-nvidia preist-nvidia linked an issue Sep 22, 2025 that may be closed by this pull request
@nvtw

nvtw commented Sep 25, 2025

Copy link
Copy Markdown
Member Author

I tested recording of 8192 G1 robots with the ringbuffer and it seems to work. Recording is fast. Saving is quite slow. I had to add an ArrayCache class to avoid serializing the same array over and over (e. g. all G1 use the same mesh but the numpy array is in multiple locations but pointing to the same data - so the serializer now recognizes this and saves it only once). The gl_viewer only manages 0.4 FPS for me during replay, but at least one can see the scene.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 130cac1 and e5e378b.

📒 Files selected for processing (2)
  • newton/_src/utils/recorder.py (26 hunks)
  • newton/examples/robot/example_robot_g1.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
newton/examples/robot/example_robot_g1.py (2)
newton/_src/utils/recorder.py (6)
  • RecorderModelAndState (920-1104)
  • record_model (995-1002)
  • record (836-849)
  • record (965-976)
  • save_to_file (870-883)
  • save_to_file (1021-1056)
newton/examples/__init__.py (3)
  • create_parser (104-135)
  • init (138-181)
  • run (36-48)
🪛 Ruff (0.13.1)
newton/_src/utils/recorder.py

79-79: Avoid specifying long messages outside the exception class

(TRY003)


82-82: Avoid specifying long messages outside the exception class

(TRY003)


96-96: Avoid specifying long messages outside the exception class

(TRY003)


99-99: Avoid specifying long messages outside the exception class

(TRY003)


183-183: Avoid specifying long messages outside the exception class

(TRY003)


196-196: Unpacked variable existing_value is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


198-200: Avoid specifying long messages outside the exception class

(TRY003)


204-204: Avoid specifying long messages outside the exception class

(TRY003)


215-215: Avoid specifying long messages outside the exception class

(TRY003)


271-271: Do not catch blind exception: Exception

(BLE001)


272-272: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


279-279: Do not catch blind exception: Exception

(BLE001)


280-280: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


280-280: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


365-365: Avoid specifying long messages outside the exception class

(TRY003)


493-493: Unused function argument: path

(ARG001)


740-740: Avoid specifying long messages outside the exception class

(TRY003)


768-768: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
  • GitHub Check: run-newton-tests / newton-unittests (ubuntu-latest)

Comment thread newton/_src/utils/recorder.py
Comment thread newton/examples/robot/example_robot_g1.py Outdated
Comment thread newton/examples/robot/example_robot_g1.py Outdated
@nvtw

nvtw commented Sep 30, 2025

Copy link
Copy Markdown
Member Author

@eric-heiden can we merge or is more testing required?

Comment thread newton/_src/utils/recorder.py Outdated
Comment thread newton/_src/utils/recorder.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
newton/_src/utils/recorder.py (1)

49-56: Validate capacity parameter.

This issue was flagged in a previous review but remains unaddressed. Passing capacity <= 0 will cause an IndexError on the first append (line 69) when accessing self._buffer[self._start] on an empty list. Since this is a public API, fail fast with clear validation.

Apply this diff:

 def __init__(self, capacity: int = 100):
     """
     Initialize the ring buffer.

     Args:
         capacity (int): Maximum number of items to store. Default is 100.
     """
+    if capacity <= 0:
+        raise ValueError("RingBuffer capacity must be a positive integer")
     self.capacity = capacity
     self._buffer: list[T] = []
     self._start = 0  # Index of the oldest item
     self._size = 0  # Current number of items
🧹 Nitpick comments (1)
newton/_src/utils/recorder.py (1)

268-281: Narrow exception handling to specific types.

The broad Exception catches at lines 271 and 279 are flagged by static analysis. While fallbacks are appropriate, catching all exceptions can hide unexpected errors. Narrow to specific exceptions like AttributeError, TypeError, and KeyError.

Apply this diff:

 def _warp_key(x) -> int:
-    try:
+    try:
         base = int(x.ptr)
-    except Exception:
+    except (AttributeError, TypeError):
         base = int(id(x))
     return _WARP_TAG + base


 def _mesh_key_from_vertices(vertices: np.ndarray, fallback_obj=None) -> int:
     try:
         base = _ptr_key_from_numpy(vertices)
-    except Exception:
+    except (AttributeError, KeyError, TypeError):
         base = int(id(fallback_obj)) if fallback_obj is not None else int(id(vertices))
     return _MESH_TAG + base
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48546c3 and 8908bb0.

📒 Files selected for processing (1)
  • newton/_src/utils/recorder.py (25 hunks)
🧰 Additional context used
🪛 Ruff (0.13.1)
newton/_src/utils/recorder.py

79-79: Avoid specifying long messages outside the exception class

(TRY003)


82-82: Avoid specifying long messages outside the exception class

(TRY003)


96-96: Avoid specifying long messages outside the exception class

(TRY003)


99-99: Avoid specifying long messages outside the exception class

(TRY003)


196-198: Avoid specifying long messages outside the exception class

(TRY003)


203-203: Avoid specifying long messages outside the exception class

(TRY003)


215-215: Avoid specifying long messages outside the exception class

(TRY003)


271-271: Do not catch blind exception: Exception

(BLE001)


272-272: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


279-279: Do not catch blind exception: Exception

(BLE001)


280-280: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


280-280: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


365-365: Avoid specifying long messages outside the exception class

(TRY003)


493-493: Unused function argument: path

(ARG001)


745-745: Avoid specifying long messages outside the exception class

(TRY003)


774-774: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run GPU Benchmarks (Pull Request)
  • GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
  • GitHub Check: run-newton-tests / newton-unittests (windows-latest)
🔇 Additional comments (11)
newton/_src/utils/recorder.py (11)

132-226: LGTM!

The ArrayCache implementation correctly addresses previous review feedback by using dict.get() to avoid repeated dictionary lookups and returning direct access from _index_to_entry. The deduplication logic properly handles both serialization (assigning new indices) and deserialization (validating explicit indices) paths.


284-349: LGTM!

The cache-aware serialization correctly differentiates between first occurrences (full payload + cache_index) and subsequent references (cache_index only). The deduplication logic properly integrates with ArrayCache.


352-400: LGTM!

The forward reference handling via __cache_ref__ correctly defers resolution when cache entries haven't been populated yet. The cache registration at line 392 properly associates the deserialized array with its cache index.


403-489: LGTM!

The cache parameter is correctly threaded through all recursive serialization paths, maintaining consistent deduplication context.


492-556: LGTM!

The cache-aware serialization for warp arrays and meshes correctly implements the first-occurrence/reference pattern. The intentional cache=None at lines 512 and 518 ensures warp-level deduplication remains authoritative (as noted in the comment). The unused path parameter flagged by static analysis is part of the required callback signature.


559-632: LGTM!

The optimizations using sentinel pattern (line 595) and cached length (line 609) effectively reduce redundant operations. The lazy path construction improves performance for deep object graphs.


635-701: LGTM!

The cache parameter is correctly threaded through all recursive deserialization paths, enabling consistent reference resolution.


727-833: LGTM!

The two-phase deserialization correctly handles forward references: first pass creates objects and registers them in cache, second pass (_resolve_cache_refs) resolves any deferred references. The optimizations using single dict lookups (lines 741, 758, 799, 820) improve performance. The static analysis warnings about long error messages are pedantic—the messages are clear and informative.


933-955: LGTM!

The optional max_history_size parameter correctly enables ring buffer mode while maintaining backward compatibility (None defaults to unlimited list). The conditional history type assignment (lines 943-946) is clean and well-documented.

Note: The streaming state variables (lines 949-954) are initialized but not used in the provided code. If streaming functionality was removed or is pending, consider cleaning up these unused attributes.


1031-1067: LGTM!

The serialization correctly handles both history types (line 1053) and uses a single ArrayCache (line 1056) for cross-document deduplication. This enables efficient storage when multiple states share the same mesh or array data.


1068-1114: LGTM!

The deserialization correctly mirrors the serialization approach with a single ArrayCache (line 1103) and properly handles loading into both RingBuffer and list history types (lines 1109-1114). The conditional loading preserves the original history container type.

@eric-heiden eric-heiden left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@eric-heiden eric-heiden merged commit d6c1984 into newton-physics:main Oct 1, 2025
13 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Jan 20, 2026
4 tasks
eric-heiden added a commit to eric-heiden/newton that referenced this pull request Jan 28, 2026
Co-authored-by: Eric Heiden <eheiden@nvidia.com>
@coderabbitai coderabbitai Bot mentioned this pull request Feb 13, 2026
4 tasks
mmacklin pushed a commit to mmacklin/newton that referenced this pull request Apr 7, 2026
Co-authored-by: Eric Heiden <eheiden@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement RingBuffer for simulation recorder

2 participants