Skip to content

fix(onboard): write fallback fingerprint when git source is unavailable (#4623)#4706

Merged
jyaunches merged 3 commits into
NVIDIA:mainfrom
kagura-agent:fix/model-router-fingerprint-fallback
Jun 9, 2026
Merged

fix(onboard): write fallback fingerprint when git source is unavailable (#4623)#4706
jyaunches merged 3 commits into
NVIDIA:mainfrom
kagura-agent:fix/model-router-fingerprint-fallback

Conversation

@kagura-agent

@kagura-agent kagura-agent commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

When NemoClaw is installed from a release tarball or npm pack instead of a git clone, getModelRouterSourceFingerprint() returns null because there's no .git directory to hash. This causes two problems:

  1. Missing fingerprint file: writeModelRouterInstalledFingerprint() was only called when sourceFingerprint was non-null, so no fingerprint file was written at all.
  2. Reinstall loop: isManagedModelRouterCurrent() short-circuited on null sourceFingerprint, returning false and triggering a full reinstall on every onboard.

Changes

  • Write a fallback fingerprint (install:<nemoclaw-version>) when git source fingerprint is unavailable, using the NemoClaw version from package.json as a stable token.
  • Update isManagedModelRouterCurrent() to accept install:-prefixed fingerprints when source fingerprint is null, preventing unnecessary reinstalls within the same NemoClaw version.
  • Add test coverage verifying both the fallback file creation and the reinstall-avoidance behavior.

Testing

  • All 5 model router tests pass
  • New test verifies isManagedModelRouterCurrent() returns true on subsequent checks with fallback fingerprint

Closes #4623

Signed-off-by: Kagura kagura-agent@users.noreply.github.com

@copy-pr-bot

copy-pr-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Install now writes a non-null effective fingerprint (either the computed source fingerprint or install:<package.json version>), and the managed check accepts installed fingerprints starting with install: when source hashing is unavailable. A new integration test verifies the fallback is written and recognized.

Changes

Model Router Fingerprint Fallback

Layer / File(s) Summary
Install + managed verification
src/lib/onboard/model-router.ts
installModelRouterCommand writes an effectiveFingerprint as sourceFingerprint ?? install:<version-from-ROOT-package.json>. isManagedModelRouterCurrent is exported and now exact-matches the installed fingerprint to sourceFingerprint when available, otherwise treats the router as managed if the installed fingerprint exists and starts with install:.
Integration test for fallback
test/onboard-model-router.test.ts
New Vitest integration test stubs git fingerprinting to be unavailable, runs setupInference, asserts .nemoclaw-source-fingerprint exists in the managed venv and matches ^install:.+$ (and is not a pure long timestamp), and verifies isManagedModelRouterCurrent(...) returns true in that scenario.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 In a venv where git went mute and still,
I left an install: seed, a tiny will.
When hashing fails and silence grows,
a little token marks what the venv knows.
🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the core fix: ensuring a fallback fingerprint is written when git source is unavailable, addressing the missing .fingerprint file issue.
Linked Issues check ✅ Passed The PR fulfills the linked issue #4623 requirement: it ensures the .fingerprint file is created during Model Router venv setup with a reliable fallback token (install:) and makes isManagedModelRouterCurrent recognize it.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue: modifying fingerprint logic in model-router.ts and adding a test verifying fingerprint creation when git source is unavailable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/model-router.ts`:
- Around line 269-270: isManagedModelRouterCurrent currently returns false when
getModelRouterSourceFingerprint() is null, causing repeated reinstalls; modify
isManagedModelRouterCurrent so that if sourceFingerprint is null it treats an
existing installed fingerprint/venv as current instead of forcing reinstall.
Concretely, in isManagedModelRouterCurrent, call
readModelRouterInstalledFingerprint(venvDir) and if sourceFingerprint is null
return Boolean(installedFingerprint) (or check venv/existence) otherwise keep
the existing equality check (installedFingerprint === sourceFingerprint); update
any callers that rely on the boolean result accordingly.

In `@test/onboard-model-router.test.ts`:
- Around line 988-1007: The test is fragile because runner.runCapture stub makes
git commands return empty, causing getModelRouterSourceFingerprint to call
hashModelRouterSourceTree and read the real llm-router files; fix by stubbing or
forcing hashModelRouterSourceTree to return null (or a stable value) for the
test so the fingerprint path becomes deterministic—e.g., in the test replace/spy
on hashModelRouterSourceTree (or modify getModelRouterSourceFingerprint
behavior) to return null or a fixed "null fingerprint" value instead of walking
the real filesystem, and keep runner.runCapture unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 16dc7852-51df-458f-9749-8c8b53ac053c

📥 Commits

Reviewing files that changed from the base of the PR and between 325ed77 and 4e48c36.

📒 Files selected for processing (2)
  • src/lib/onboard/model-router.ts
  • test/onboard-model-router.test.ts

Comment thread src/lib/onboard/model-router.ts Outdated
Comment thread test/onboard-model-router.test.ts
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression labels Jun 3, 2026
@wscurran

wscurran commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this detailed PR about fixing the Model Router venv missing fingerprint file issue, which breaks the intended venv version tracking and rebuild detection. This proposes a way to address the bug in the Model Router provider that affects the onboard process and defaults to an incorrect venv setup.


Related open issues:

1 similar comment
@wscurran

wscurran commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this detailed PR about fixing the Model Router venv missing fingerprint file issue, which breaks the intended venv version tracking and rebuild detection. This proposes a way to address the bug in the Model Router provider that affects the onboard process and defaults to an incorrect venv setup.


Related open issues:

@cv cv added v0.0.59 Release target v0.0.60 Release target and removed v0.0.59 Release target labels Jun 4, 2026
@kagura-agent kagura-agent force-pushed the fix/model-router-fingerprint-fallback branch from adaf95b to cd5c0f9 Compare June 5, 2026 13:18
@prekshivyas prekshivyas self-assigned this Jun 5, 2026

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — the file-existence fix is real and the change is harmless, but I don't think it delivers the reinstall-avoidance it advertises, and the test gives false confidence. I verified by running the test and tracing the path.

The fallback only fires when sourceFingerprint is null, and in that exact case reinstalls still happen.

isManagedModelRouterCurrent is:

const sourceFingerprint = getModelRouterSourceFingerprint(routerDir);
return Boolean(sourceFingerprint && readModelRouterInstalledFingerprint(venvDir) === sourceFingerprint);

It short-circuits on a falsy sourceFingerprint. So:

  1. ?? install:${Date.now()} fires only when sourceFingerprint === null.
  2. On the next onboard in the same environment, getModelRouterSourceFingerprint returns null again → isManagedModelRouterCurrent returns false → the model-router venv reinstalls. The install:<ts> content is never compared (short-circuited before readInstalled).
  3. Date.now() makes it a write-only token — even if the check were reached, a fresh timestamp could never match the stored one.

So the PR description's "No spurious reinstalls within the same NemoClaw version" doesn't hold for the very scenario the fallback targets. The root cause of the false is the null sourceFingerprint short-circuit, not the missing file — writing the file doesn't change the reinstall behavior.

The new test doesn't catch this — it asserts only fpExists and fpContent =~ /^install:\d+$/. It never calls isManagedModelRouterCurrent or runs onboarding twice, so reinstall-avoidance is unverified.

Requesting changes — pick one:

  1. Narrow the scope: keep the file-existence fix, but correct the description to drop the reinstall-avoidance claim and file a follow-up for the reinstall loop. (If some consumer just needs the file present, this is fine on its own.)
  2. Complete the fix: use a stable token (e.g. install:<nemoclaw-version> rather than Date.now()) AND teach isManagedModelRouterCurrent to accept that token when sourceFingerprint is null, then add a test that calls it twice and asserts the second run does not reinstall.

Happy to re-review either way.

@kagura-agent

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @prekshivyas! You're absolutely right — the Date.now() token was write-only and never prevented reinstalls.

I went with option 2 (complete the fix):

  • Changed the fallback token from install:${Date.now()} to install:${version} using the NemoClaw version from package.json — stable across runs in the same version
  • Updated isManagedModelRouterCurrent to accept an existing install:-prefixed fingerprint when sourceFingerprint is null, so the venv is treated as current
  • Updated the test to verify both the stable format and the reinstall-avoidance behavior (isManagedModelRouterCurrent returns true when the fallback fingerprint exists)

Ready for re-review!

@kagura-agent

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough trace, @prekshivyas — you're right on both points.

The Date.now() token is write-only and isManagedModelRouterCurrent short-circuits before it's ever compared, so the reinstall-avoidance claim doesn't hold.

I'll go with option 1 (narrow scope): keep the file-existence fix, correct the description to drop the reinstall-avoidance claim, and file a follow-up issue for the actual reinstall loop fix (isManagedModelRouterCurrent accepting a stable fallback token when sourceFingerprint is null).

Will push the update shortly.

@kagura-agent

Copy link
Copy Markdown
Contributor Author

Apologies for the duplicate — please disregard the "option 1" comment above. The pushed code follows option 2 (complete the fix): stable install:<version> token + isManagedModelRouterCurrent updated to accept it when sourceFingerprint is null, plus a test verifying reinstall-avoidance. Ready for re-review!

@cv cv added v0.0.61 Release target and removed v0.0.60 Release target labels Jun 6, 2026
@kagura-agent kagura-agent force-pushed the fix/model-router-fingerprint-fallback branch from 5e5c8c1 to b811e31 Compare June 7, 2026 00:12
@kagura-agent

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @prekshivyas — you're absolutely right that the original fix didn't deliver the reinstall-avoidance it advertised. I've pushed changes implementing your Option 2:

  1. Stable token: Replaced Date.now() with the NemoClaw version from package.json (install:<version>), so the fingerprint is deterministic within the same NemoClaw version.

  2. Short-circuit fix: Updated isManagedModelRouterCurrent() to handle the null sourceFingerprint case — when git isn't available, it now accepts an existing install:-prefixed fingerprint instead of short-circuiting to false.

  3. Test coverage: The test now calls isManagedModelRouterCurrent() directly to verify it returns true on a subsequent check when the install fingerprint exists but source fingerprint is unavailable.

All 5 model router tests pass locally. The branch is also rebased on latest main.

@cv cv added v0.0.62 Release target and removed v0.0.61 Release target labels Jun 8, 2026

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed against the current head — the rework fully addresses my earlier concern. Verified by reading + building + running the tests (5/5 pass):

  • Stable token: install:${version} now uses the NemoClaw package.json version, not Date.now().
  • Reinstall loop is actually fixed: isManagedModelRouterCurrent no longer short-circuits on a null source fingerprint — when source is unhashable it accepts an existing install:-prefixed fingerprint and returns true, so a second onboard doesn't reinstall.
  • Test proves it: the fallback test calls isManagedModelRouterCurrent, asserts it's true when source is unavailable but the install: file exists, and asserts the token is not a timestamp (guards the old regression).

Approving on the code. One thing to clear before merge, @kagura-agent: dco-check is red — the merge commit 7a89059f ("Merge branch 'main'…") has no Signed-off-by trailer (the fix commit b811e312 is signed off, just not the merge). Re-sign that merge commit (or redo the merge with -s) and CI goes green.

@prekshivyas

Copy link
Copy Markdown
Contributor

Correction to my note above: dco-check validates the PR description, not the merge commit. @kagura-agent please add a Signed-off-by: Your Name <email> line to the PR description and it'll go green. Approval stands on the code.

@kagura-agent

Copy link
Copy Markdown
Contributor Author

Thanks @prekshivyas! The Signed-off-by line is already in the PR description — looks like the dco-check ran before it was added. Could you re-trigger the check? I don't have permissions to rerun workflows.

@kagura-agent kagura-agent force-pushed the fix/model-router-fingerprint-fallback branch from 7a89059 to 7c9f38e Compare June 9, 2026 03:11
@kagura-agent

Copy link
Copy Markdown
Contributor Author

Rebased onto latest main to fix the DCO check failure (the previous merge commit lacked a sign-off). The commit itself already had the correct DCO trailer.

@kagura-agent

Copy link
Copy Markdown
Contributor Author

All CI checks are green now (including dco-check) after the rebase. Ready for merge whenever convenient — thanks @prekshivyas for the thorough review! 🙏

@kagura-agent kagura-agent force-pushed the fix/model-router-fingerprint-fallback branch from 7c9f38e to 0054338 Compare June 9, 2026 16:14
@kagura-agent

Copy link
Copy Markdown
Contributor Author

Rebased onto latest main to resolve merge conflicts. Single commit cherry-picked cleanly — only the test file had a minor conflict (upstream added assertions to the existing test; resolved by keeping both the updated existing test and the new fallback fingerprint test).

… in model router check (NVIDIA#4623)

Address review feedback from prekshivyas:

1. Replace Date.now() with NemoClaw version as the fallback fingerprint
   token (install:<version> instead of install:<timestamp>), making it
   stable across onboard runs within the same NemoClaw version.

2. Teach isManagedModelRouterCurrent() to accept install:-prefixed
   fingerprints when sourceFingerprint is null (no git). This prevents
   the short-circuit that previously caused reinstalls on every onboard
   even when the fallback fingerprint file existed.

3. Export isManagedModelRouterCurrent and add a test that verifies
   the function returns true on a subsequent check when the install:
   fingerprint is present but git source is unavailable.

Signed-off-by: kagura-agent <kagura-agent@users.noreply.github.com>
Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
@kagura-agent kagura-agent force-pushed the fix/model-router-fingerprint-fallback branch from 561ce86 to 70c1fcd Compare June 9, 2026 18:08
@jyaunches jyaunches merged commit c29408c into NVIDIA:main Jun 9, 2026
30 checks passed
jyaunches pushed a commit that referenced this pull request Jun 10, 2026
## Summary
- Add v0.0.62 release notes from Discussion #5100 and link release
highlights to the relevant docs pages.
- Document the release's GPU sandbox recreation, sandbox-side local
inference verification, and Hermes dashboard port guard in the command
and inference references.
- Refresh generated NemoClaw user skills for the release-prep docs set.

## Source Summary
- #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU
recreation behavior for Linux Docker-driver sandboxes.
- #5024 -> `docs/inference/use-local-inference.mdx`: Document
sandbox-runtime verification of the `inference.local` local inference
route.
- #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra
device-node group propagation for sandbox CUDA initialization.
- #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`:
Summarize onboarding and recovery reliability fixes, including the
reserved Hermes API port guard.
- #5017 and #5043 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Summarize mutable OpenClaw config
recovery and host-side `agents list` coverage.
- #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes
upstream metadata visibility and WhatsApp QR rendering reliability.
- #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`:
Refresh generated user-skill references from the current docs source.

## Skipped
- #5019 -> skipped for new prose because it touched
`openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`.
Existing source docs remain the source for generated skill
synchronization.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning)
- Pre-commit hooks passed during commit, including docs-to-skills
verification, markdown lint, gitleaks, and skills YAML tests.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Added `nemoclaw <name> agents list` command.
* v0.0.62 release notes added summarizing onboarding and recovery
improvements.

* **Bug Fixes**
* Improved GPU sandbox onboarding reliability (NVIDIA CDI path,
Jetson/Tegra device handling).
* Better local inference verification and recovery for Linux
Docker-driver GPU sandboxes.
  * Quieter/earlier handling of onboarding drift and port collisions.

* **Documentation**
* Expanded GPU passthrough, inference verification, writable paths
(`/dev/pts`), port 8642 restriction, and command examples.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression v0.0.62 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[All Platforms][Inference] Model Router venv missing fingerprint file

5 participants