fix(onboard): write fallback fingerprint when git source is unavailable (#4623)#4706
Conversation
📝 WalkthroughWalkthroughInstall now writes a non-null effective fingerprint (either the computed source fingerprint or ChangesModel Router Fingerprint Fallback
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard/model-router.ts`:
- Around line 269-270: isManagedModelRouterCurrent currently returns false when
getModelRouterSourceFingerprint() is null, causing repeated reinstalls; modify
isManagedModelRouterCurrent so that if sourceFingerprint is null it treats an
existing installed fingerprint/venv as current instead of forcing reinstall.
Concretely, in isManagedModelRouterCurrent, call
readModelRouterInstalledFingerprint(venvDir) and if sourceFingerprint is null
return Boolean(installedFingerprint) (or check venv/existence) otherwise keep
the existing equality check (installedFingerprint === sourceFingerprint); update
any callers that rely on the boolean result accordingly.
In `@test/onboard-model-router.test.ts`:
- Around line 988-1007: The test is fragile because runner.runCapture stub makes
git commands return empty, causing getModelRouterSourceFingerprint to call
hashModelRouterSourceTree and read the real llm-router files; fix by stubbing or
forcing hashModelRouterSourceTree to return null (or a stable value) for the
test so the fingerprint path becomes deterministic—e.g., in the test replace/spy
on hashModelRouterSourceTree (or modify getModelRouterSourceFingerprint
behavior) to return null or a fixed "null fingerprint" value instead of walking
the real filesystem, and keep runner.runCapture unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 16dc7852-51df-458f-9749-8c8b53ac053c
📒 Files selected for processing (2)
src/lib/onboard/model-router.tstest/onboard-model-router.test.ts
|
✨ Thanks for submitting this detailed PR about fixing the Model Router venv missing fingerprint file issue, which breaks the intended venv version tracking and rebuild detection. This proposes a way to address the bug in the Model Router provider that affects the onboard process and defaults to an incorrect venv setup. Related open issues: |
1 similar comment
|
✨ Thanks for submitting this detailed PR about fixing the Model Router venv missing fingerprint file issue, which breaks the intended venv version tracking and rebuild detection. This proposes a way to address the bug in the Model Router provider that affects the onboard process and defaults to an incorrect venv setup. Related open issues: |
adaf95b to
cd5c0f9
Compare
prekshivyas
left a comment
There was a problem hiding this comment.
Thanks for this — the file-existence fix is real and the change is harmless, but I don't think it delivers the reinstall-avoidance it advertises, and the test gives false confidence. I verified by running the test and tracing the path.
The fallback only fires when sourceFingerprint is null, and in that exact case reinstalls still happen.
isManagedModelRouterCurrent is:
const sourceFingerprint = getModelRouterSourceFingerprint(routerDir);
return Boolean(sourceFingerprint && readModelRouterInstalledFingerprint(venvDir) === sourceFingerprint);It short-circuits on a falsy sourceFingerprint. So:
?? install:${Date.now()}fires only whensourceFingerprint === null.- On the next onboard in the same environment,
getModelRouterSourceFingerprintreturns null again →isManagedModelRouterCurrentreturnsfalse→ the model-router venv reinstalls. Theinstall:<ts>content is never compared (short-circuited beforereadInstalled). Date.now()makes it a write-only token — even if the check were reached, a fresh timestamp could never match the stored one.
So the PR description's "No spurious reinstalls within the same NemoClaw version" doesn't hold for the very scenario the fallback targets. The root cause of the false is the null sourceFingerprint short-circuit, not the missing file — writing the file doesn't change the reinstall behavior.
The new test doesn't catch this — it asserts only fpExists and fpContent =~ /^install:\d+$/. It never calls isManagedModelRouterCurrent or runs onboarding twice, so reinstall-avoidance is unverified.
Requesting changes — pick one:
- Narrow the scope: keep the file-existence fix, but correct the description to drop the reinstall-avoidance claim and file a follow-up for the reinstall loop. (If some consumer just needs the file present, this is fine on its own.)
- Complete the fix: use a stable token (e.g.
install:<nemoclaw-version>rather thanDate.now()) AND teachisManagedModelRouterCurrentto accept that token whensourceFingerprintis null, then add a test that calls it twice and asserts the second run does not reinstall.
Happy to re-review either way.
|
Thanks for the thorough review @prekshivyas! You're absolutely right — the I went with option 2 (complete the fix):
Ready for re-review! |
|
Thanks for the thorough trace, @prekshivyas — you're right on both points. The I'll go with option 1 (narrow scope): keep the file-existence fix, correct the description to drop the reinstall-avoidance claim, and file a follow-up issue for the actual reinstall loop fix ( Will push the update shortly. |
|
Apologies for the duplicate — please disregard the "option 1" comment above. The pushed code follows option 2 (complete the fix): stable |
5e5c8c1 to
b811e31
Compare
|
Thanks for the thorough review @prekshivyas — you're absolutely right that the original fix didn't deliver the reinstall-avoidance it advertised. I've pushed changes implementing your Option 2:
All 5 model router tests pass locally. The branch is also rebased on latest |
prekshivyas
left a comment
There was a problem hiding this comment.
Re-reviewed against the current head — the rework fully addresses my earlier concern. Verified by reading + building + running the tests (5/5 pass):
- Stable token:
install:${version}now uses the NemoClawpackage.jsonversion, notDate.now(). - Reinstall loop is actually fixed:
isManagedModelRouterCurrentno longer short-circuits on a null source fingerprint — when source is unhashable it accepts an existinginstall:-prefixed fingerprint and returnstrue, so a second onboard doesn't reinstall. - Test proves it: the fallback test calls
isManagedModelRouterCurrent, asserts it'struewhen source is unavailable but theinstall:file exists, and asserts the token is not a timestamp (guards the old regression).
Approving on the code. One thing to clear before merge, @kagura-agent: dco-check is red — the merge commit 7a89059f ("Merge branch 'main'…") has no Signed-off-by trailer (the fix commit b811e312 is signed off, just not the merge). Re-sign that merge commit (or redo the merge with -s) and CI goes green.
|
Correction to my note above: |
|
Thanks @prekshivyas! The |
7a89059 to
7c9f38e
Compare
|
Rebased onto latest main to fix the DCO check failure (the previous merge commit lacked a sign-off). The commit itself already had the correct DCO trailer. |
|
All CI checks are green now (including dco-check) after the rebase. Ready for merge whenever convenient — thanks @prekshivyas for the thorough review! 🙏 |
7c9f38e to
0054338
Compare
|
Rebased onto latest main to resolve merge conflicts. Single commit cherry-picked cleanly — only the test file had a minor conflict (upstream added assertions to the existing test; resolved by keeping both the updated existing test and the new fallback fingerprint test). |
… in model router check (NVIDIA#4623) Address review feedback from prekshivyas: 1. Replace Date.now() with NemoClaw version as the fallback fingerprint token (install:<version> instead of install:<timestamp>), making it stable across onboard runs within the same NemoClaw version. 2. Teach isManagedModelRouterCurrent() to accept install:-prefixed fingerprints when sourceFingerprint is null (no git). This prevents the short-circuit that previously caused reinstalls on every onboard even when the fallback fingerprint file existed. 3. Export isManagedModelRouterCurrent and add a test that verifies the function returns true on a subsequent check when the install: fingerprint is present but git source is unavailable. Signed-off-by: kagura-agent <kagura-agent@users.noreply.github.com> Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
561ce86 to
70c1fcd
Compare
## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Summary
When NemoClaw is installed from a release tarball or
npm packinstead of a git clone,getModelRouterSourceFingerprint()returnsnullbecause there's no.gitdirectory to hash. This causes two problems:writeModelRouterInstalledFingerprint()was only called whensourceFingerprintwas non-null, so no fingerprint file was written at all.isManagedModelRouterCurrent()short-circuited on nullsourceFingerprint, returningfalseand triggering a full reinstall on every onboard.Changes
install:<nemoclaw-version>) when git source fingerprint is unavailable, using the NemoClaw version frompackage.jsonas a stable token.isManagedModelRouterCurrent()to acceptinstall:-prefixed fingerprints when source fingerprint is null, preventing unnecessary reinstalls within the same NemoClaw version.Testing
isManagedModelRouterCurrent()returnstrueon subsequent checks with fallback fingerprintCloses #4623
Signed-off-by: Kagura kagura-agent@users.noreply.github.com