Skip to content

Fix garbage collector issues in Kamino RCM solver#2575

Merged
nvtw merged 1 commit into
newton-physics:mainfrom
nvtw:dev/tw3/fix_kamino_gc_issue
Apr 27, 2026
Merged

Fix garbage collector issues in Kamino RCM solver#2575
nvtw merged 1 commit into
newton-physics:mainfrom
nvtw:dev/tw3/fix_kamino_gc_issue

Conversation

@nvtw

@nvtw nvtw commented Apr 24, 2026

Copy link
Copy Markdown
Member

Description

Fix garbage collector issues in Kamino RCM solver

Checklist

  • New or existing tests cover these changes
  • The documentation is up to date with these changes
  • CHANGELOG.md has been updated (if user-facing change)

Test plan

Manual testing

Bug fix

Steps to reproduce:

Hard to reproduce. GC frees memory but only in CPU mode. There is some inconsistency between cuda and CPU mode about GC handles are tracked.

Minimal reproduction:

import newton

# Code that demonstrates the bug

Summary by CodeRabbit

  • Refactor
    • Improved internal memory management for linear algebra solvers by optimizing scratch buffer allocation and lifetime handling during matrix reordering operations. Enhanced robustness of buffer handling in solver callbacks to ensure correct memory access patterns.

@nvtw nvtw self-assigned this Apr 24, 2026
@nvtw nvtw requested a review from vastsoun April 24, 2026 13:16
@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The changes refactor scratch buffer ownership in RCM (Reverse Cuthill-McKee) reordering: the LLTBlockedRCMSolver now allocates and owns dedicated batched-RCM scratch buffers during initialization, passing them to create_rcm_batch_launch which accepts caller-allocated scratch instead of allocating internally.

Changes

Cohort / File(s) Summary
Solver Scratch Allocation & Threading
newton/_src/solvers/kamino/_src/linalg/factorize/llt_blocked_rcm_solver.py
Solver allocates dedicated batched-RCM scratch storage during _allocate_impl and threads it into the RCM launch callback creation via _ensure_reorder_launches_bound, ensuring callback-captured inputs include scratch buffers.
RCM Batch Launch Refactor
newton/_src/solvers/kamino/_src/linalg/factorize/rcm_batch.py
create_rcm_batch_launch signature updated to accept caller-allocated scratch dict parameter (placed after vio: wp.array); removes internal scratch allocation and updates docstring with explicit lifetime requirements for Warp CPU backend compatibility.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Kamino Cholesky solver with RCM support #2554: Directly modifies the same RCM batching and LLTBlockedRCMSolver code paths, changing create_rcm_batch_launch to accept caller-owned scratch buffers and updating LLTBlockedRCMSolver to allocate and thread that scratch into recorded RCM launches.

Suggested reviewers

  • vastsoun
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix garbage collector issues in Kamino RCM solver' accurately and specifically describes the main change. It references the correct component (Kamino RCM solver) and the primary issue being addressed (garbage collector issues).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
newton/_src/solvers/kamino/_src/linalg/factorize/llt_blocked_rcm_solver.py (1)

222-274: ⚠️ Potential issue | 🟡 Minor

Reset stale _reorder_callback and _reorder_attached_to after re-allocation.

_allocate_impl allocates new _P, _rcm_scratch, and other buffers but does not reset _reorder_callback / _reorder_attached_to. If _allocate_impl is called a second time and a later _factorize_impl receives an A object that is is-identical to the one seen before, _ensure_reorder_launches_bound will return early (line 302). The stale callback will replay and write into the previously-allocated buffers, now deallocated or repurposed. This is a pre-existing hazard for _P, but widens the captured surface to _rcm_scratch.

Suggested defensive reset
             self._rcm_scratch = _rcm_batch.allocate_rcm_batch_scratch(
                 total_vec=info.total_vec_size,
                 num_blocks=info.num_blocks,
                 device=self._device,
             )

+        # Any previously-recorded reorder callback captured the old _P /
+        # _rcm_scratch; force a rebind on the next factorize.
+        self._reorder_callback = None
+        self._reorder_attached_to = None
+
         # The batched-RCM launch callback (``self._reorder_callback``) is
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@newton/_src/solvers/kamino/_src/linalg/factorize/llt_blocked_rcm_solver.py`
around lines 222 - 274, Allocate_impl currently recreates buffers like _P and
_rcm_scratch but does not clear the stale reorder state; reset _reorder_callback
and _reorder_attached_to to None (or their initial empty state) at the end of
_allocate_impl so subsequent calls won't early-return in
_ensure_reorder_launches_bound and replay callbacks into newly
allocated/repurposed buffers; update _allocate_impl (reference symbols:
_allocate_impl, _reorder_callback, _reorder_attached_to, _P, _rcm_scratch,
_ensure_reorder_launches_bound, _factorize_impl) to defensively clear those
fields after allocating the new buffers.
🧹 Nitpick comments (1)
newton/_src/solvers/kamino/_src/linalg/factorize/rcm_batch.py (1)

339-373: Ownership change and docstring look good.

Hoisting scratch allocation to the caller is the right fix: with record_cmd=True, the Warp CPU backend does not keep strong Python references to recorded inputs, so locally-allocated scratch was being reclaimed as soon as create_rcm_batch_launch returned. Calling out that contract explicitly in the docstring is very helpful.

Optional nit: tightening the annotation to dict[str, wp.array] (or a small TypedDict/dataclass returned by allocate_rcm_batch_scratch) would make the expected keys self-documenting and catch a missing "degree"/"level"/… at type-check time rather than at launch time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@newton/_src/solvers/kamino/_src/linalg/factorize/rcm_batch.py` around lines
339 - 373, The scratch parameter for create_rcm_batch_launch is currently
untyped as dict; tighten it to a specific mapping so missing keys are caught
earlier: change the annotation to dict[str, wp.array] or better, define a small
TypedDict or dataclass (e.g., ScratchBuffers or AllocateRCMScratchTypedDict)
that reflects the keys produced by allocate_rcm_batch_scratch (include expected
keys like "degree", "level", etc.), update create_rcm_batch_launch's signature
to accept that TypedDict/dataclass, and use that type in callers and the
allocate_rcm_batch_scratch return so the contract is explicit and
type-checkable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@newton/_src/solvers/kamino/_src/linalg/factorize/llt_blocked_rcm_solver.py`:
- Around line 222-274: Allocate_impl currently recreates buffers like _P and
_rcm_scratch but does not clear the stale reorder state; reset _reorder_callback
and _reorder_attached_to to None (or their initial empty state) at the end of
_allocate_impl so subsequent calls won't early-return in
_ensure_reorder_launches_bound and replay callbacks into newly
allocated/repurposed buffers; update _allocate_impl (reference symbols:
_allocate_impl, _reorder_callback, _reorder_attached_to, _P, _rcm_scratch,
_ensure_reorder_launches_bound, _factorize_impl) to defensively clear those
fields after allocating the new buffers.

---

Nitpick comments:
In `@newton/_src/solvers/kamino/_src/linalg/factorize/rcm_batch.py`:
- Around line 339-373: The scratch parameter for create_rcm_batch_launch is
currently untyped as dict; tighten it to a specific mapping so missing keys are
caught earlier: change the annotation to dict[str, wp.array] or better, define a
small TypedDict or dataclass (e.g., ScratchBuffers or
AllocateRCMScratchTypedDict) that reflects the keys produced by
allocate_rcm_batch_scratch (include expected keys like "degree", "level", etc.),
update create_rcm_batch_launch's signature to accept that TypedDict/dataclass,
and use that type in callers and the allocate_rcm_batch_scratch return so the
contract is explicit and type-checkable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: e402c4f7-8a7f-4367-a2fc-eb33d9e8ac9d

📥 Commits

Reviewing files that changed from the base of the PR and between f9a39d7 and 0201c67.

📒 Files selected for processing (2)
  • newton/_src/solvers/kamino/_src/linalg/factorize/llt_blocked_rcm_solver.py
  • newton/_src/solvers/kamino/_src/linalg/factorize/rcm_batch.py

@codecov

codecov Bot commented Apr 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@vastsoun vastsoun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nvtw nvtw added this pull request to the merge queue Apr 27, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 27, 2026
@nvtw nvtw added this pull request to the merge queue Apr 27, 2026
Merged via the queue into newton-physics:main with commit 91d21b1 Apr 27, 2026
25 checks passed
preist-nvidia added a commit to preist-nvidia/newton that referenced this pull request May 4, 2026
The pre-release audit found user-facing commits that landed since v1.1.0
without an [Unreleased] entry. Backfill five of them here. Kamino-related
commits (newton-physics#2280, newton-physics#2517, newton-physics#2554, newton-physics#2575) are intentionally omitted: Kamino is
still alpha and not currently tracked in CHANGELOG.

- Changed: contact-conversion overhead reduction in SolverMuJoCo
  ([newton-physicsGH-2393](newton-physics#2393))
- Fixed: target_voxel_size ignored on the texture-SDF path
  ([newton-physicsGH-2464](newton-physics#2464))
- Fixed: Newton-to-mujoco-warp material-combination mismatch
  ([newton-physicsGH-2439](newton-physics#2439))
- Fixed: eq_objtype mismatch on joint equality / mimic constraints in
  SolverMuJoCo ([newton-physicsGH-2459](newton-physics#2459))
- Fixed: _tiled_sum_kernel launch-dim handling under warp-lang 1.13
  ([newton-physicsGH-2546](newton-physics#2546))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants