feat(hermes): add provider onboarding foundation#3320
Conversation
Replay the Hermes Provider onboarding foundation from NousResearch PR #3237 while keeping durable credential ownership in OpenShell provider storage. Co-authored-by: Shannon Sands <shannon.sands.1979@gmail.com> Signed-off-by: Shannon Sands <shannon.sands.1979@gmail.com> Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds Hermes Provider onboarding (OAuth device-code and API-key), Nous model recommendation fetching and UI integration, OpenShell provider registration/upsert, session persistence of Hermes auth method, rebuild credential preflight awareness for Hermes, Slack allowed-user config, tests, and installer/docs updates. ChangesHermes Provider OAuth & API-Key Onboarding
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/inference/model-prompts.ts`:
- Around line 194-225: The prompt currently maps Enter to "Other..." when the
saved default model is outside the visibleOptions, because defaultChoice
math/regression causes the parsed index to fall through to manual entry; fix by,
after computing index (the parsed choice), add an early branch that if index ===
defaultIndex then return defaultModel (before the visibleOptions bounds checks
and the "Other..." branch). Update the code paths around deps.promptFn parsing
(the index variable) and the subsequent conditional sequence so
promptFullRemoteModelList/promptManualModelId are only reached when the user
explicitly selected those, while Enter (or a number equal to defaultIndex+1)
returns defaultModel.
In `@src/lib/oauth-device-code.ts`:
- Around line 7-10: Update the module header in oauth-device-code.ts to correct
the persistence contract: state explicitly that Hermes OAuth/API-key material
MUST NOT be durably persisted to host-side NemoClaw storage (e.g., ~/.nemoclaw);
instead describe that onboarding mints short-lived agent keys for the OpenShell
inference provider from ephemeral tokens and that the sandbox only ever receives
the OpenShell inference placeholder (never raw Hermes/Nous OAuth tokens or API
keys). Ensure the header replaces the current wording that implies refresh
tokens are stored under ~/.nemoclaw with the new invariant forbidding durable
host persistence.
- Around line 125-137: postForm currently calls fetchImpl without an AbortSignal
so long-running sockets can hang; implement a reusable timeout helper that
returns an AbortController/AbortSignal (e.g., createRequestTimeout(timeoutMs):
{signal, clear}) and use it in postForm to pass signal into fetchImpl and clear
the timeout after completion, and also update mintAgentKeyWithAccessToken to use
the same helper so token polling/refresh requests are bounded by timeoutSeconds;
reference postForm and mintAgentKeyWithAccessToken when applying the change and
ensure existing fetch headers/behavior are preserved and the timeout is cleaned
up on success or error.
In `@src/lib/onboard.ts`:
- Around line 6646-6653: The hermesAuthMethod variable can retain stale Hermes
data when the user backs out to pick another provider; when handling the Hermes
branch (the selected.key === "hermesProvider" block) and the user chooses
BACK_TO_SELECTION (before continue selectionLoop), explicitly reset
hermesAuthMethod (e.g., set to undefined or null) so it cannot leak into
non-Hermes selections, and likewise ensure any code path that switches away from
Hermes clears hermesAuthMethod to avoid returning Hermes metadata for
OpenAI/Ollama selections.
- Around line 1532-1557: When running non-interactively, promptHermesAuthMethod
currently defaults to OAuth even if a Nous API key is present; change
promptHermesAuthMethod so that if getRequestedHermesAuthMethod() returns null
but an API key env var is exported (e.g. process.env.NOUS_API_KEY or
process.env.NEMOCLAW_PROVIDER_KEY), the default method becomes
HERMES_AUTH_METHOD_API_KEY instead of HERMES_AUTH_METHOD_OAUTH; update the
non-interactive branch to choose method = requested || (apiKeyPresent ?
HERMES_AUTH_METHOD_API_KEY : HERMES_AUTH_METHOD_OAUTH) and keep the note(...)
call using hermesAuthMethodLabel(method).
In `@test/hermes-provider-foundation.test.ts`:
- Around line 44-48: The spawnSync invocations in the test (the call using
spawnSync(process.execPath, [scriptPath], { cwd: repoRoot, encoding: "utf-8",
env: buildHermeticEnv(tmpDir) })) can hang; add a bounded timeout option to
their options object (e.g., timeout: 30000) so the child is killed after a
reasonable period and the test fails deterministically; apply the same change to
the other spawnSync usages in this file (the other result = spawnSync(...)
occurrences) ensuring the timeout value is included in each options literal (and
optionally set killSignal or cwd/encoding/env unchanged).
- Around line 10-17: The helper buildHermeticEnv currently only strips DISCORD_*
and TELEGRAM_* but still inherits ambient NEMOCLAW_* and other credential/config
vars that make tests flaky; update buildHermeticEnv to remove any env keys
matching NEMOCLAW_* plus common credential/config prefixes (e.g. NEMOCLAW_,
AWS_, GCP_, GOOGLE_, GCLOUD_, AZURE_, and keys ending with _CREDENTIALS or
containing "SECRET" or "TOKEN") by iterating Object.keys(env) and deleting keys
that match those patterns so the spawned helper runs with a fully hermetic
environment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 786260ff-9c33-4727-b255-b3c1adf894a9
📒 Files selected for processing (27)
agents/hermes/config/messaging-config.tsinstall.shscripts/install.shsrc/lib/actions/sandbox/rebuild.tssrc/lib/hermes-provider-auth.test.tssrc/lib/hermes-provider-auth.tssrc/lib/inference/config.test.tssrc/lib/inference/config.tssrc/lib/inference/model-prompts.test.tssrc/lib/inference/model-prompts.tssrc/lib/inference/nous-models.test.tssrc/lib/inference/nous-models.tssrc/lib/messaging-channel-config.test.tssrc/lib/oauth-device-code.test.tssrc/lib/oauth-device-code.tssrc/lib/onboard.tssrc/lib/onboard/providers.tssrc/lib/sandbox-channels.test.tssrc/lib/sandbox-channels.tssrc/lib/security/redact.tssrc/lib/state/onboard-session.test.tssrc/lib/state/onboard-session.tssrc/lib/state/sandbox-session.test.tssrc/lib/state/sandbox-session.tstest/generate-hermes-config.test.tstest/hermes-provider-foundation.test.tstest/rebuild-credential-preflight.test.ts
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
src/lib/agent/defs.ts (1)
88-89: ⚡ Quick winTighten
inferenceProviderOptionstype to non-optionalThis accessor always returns an array (Line 388-390), so keeping it optional in
AgentDefinitionweakens the contract and introduces unnecessaryundefinedchecks for callers.💡 Suggested type update
- readonly inferenceProviderOptions?: string[]; + readonly inferenceProviderOptions: string[];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/agent/defs.ts` around lines 88 - 89, The AgentDefinition interface currently declares readonly inferenceProviderOptions?: string[] but the accessor always returns an array, so change the type to readonly inferenceProviderOptions: string[] to tighten the contract; update any constructors/defaults or places that create AgentDefinition instances (e.g., the factory or new AgentDefinition(...) calls and any object literals) to provide an empty array when none are specified, and remove redundant undefined checks where callers use inferenceProviderOptions, ensuring code paths that assumed undefined now treat it as an empty array.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/agent/defs.ts`:
- Around line 262-270: readInference currently tolerates malformed manifest
fields and silently returns undefined/filtered arrays; change it to fail fast by
validating the shape of the "inference" object and throwing a descriptive Error
when fields are wrong. Specifically, inside readInference (which uses
readObject/readString/readStringArray), check that provider_type is a string and
provider_options is an array of strings; if either check fails throw an Error
referencing the ManifestRecord (or record.id if available) and the offending key
(provider_type or provider_options) so callers can detect bad manifests
immediately. Ensure AgentInference is only returned when both validations pass.
In `@src/lib/onboard.ts`:
- Around line 6699-6710: The call to nousModels.getHermesProviderModelOptions()
can reject and currently aborts onboarding; wrap that call in a try/catch, log
or surface a non-fatal warning, and on failure set hermesProviderModels to a
safe fallback (e.g., undefined or an empty array) so
promptRemoteModel(remoteConfig.label, selected.key, defaultModel, null, {
otherShowsFullList: true, remoteModelOptions: { [selected.key]:
hermesProviderModels }, topLevelModelLimit: 10 }) still runs and falls back to
the documented full/custom model behavior; ensure the catch does not rethrow and
that promptRemoteModel receives a well-formed remoteModelOptions (omit the key
or pass an empty list) so the existing fallback paths remain available.
- Around line 7480-7538: The code only calls
ensureHermesProviderApiKeyCredentials/ensureHermesProviderOAuthCredentials when
isHermesProviderRegistered(...) is false, which leaves existing Hermes
registrations stale when the auth method or NOUS_API_KEY changes; update the
logic so credential reconciliation always runs (or runs when the
resolvedHermesAuthMethod or credentialEnv differs) by invoking
ensureHermesProviderApiKeyCredentials or ensureHermesProviderOAuthCredentials
regardless of isHermesProviderRegistered, using resolvedHermesAuthMethod,
resolveHermesNousApiKey, and runOpenshell to refresh credentials; keep the
existing error handling (hermesAuthMethodLabel, process.exit, retry) and only
skip re-applying the provider registration when you can detect no change.
In `@test/onboard.test.ts`:
- Around line 2793-2827: The test currently deletes process.env.NOUS_API_KEY and
process.env.OPENAI_API_KEY which makes the "not forwarded" assertion weak;
instead seed sentinel values (e.g., process.env.NOUS_API_KEY = "SENTINEL_NOUS"
and process.env.OPENAI_API_KEY = "SENTINEL_OPENAI") before requiring/setuping
the script that calls setupInference, then update the assertions that inspect
the captured commands (via parseStdoutJson, CommandEntry and the commands array)
to assert that none of the command.env values or command.command/argv strings
contain those sentinel tokens (and still assert no provider create/update
occurred and expected commands exist). This ensures secrets were present in the
environment but never forwarded to child commands.
---
Nitpick comments:
In `@src/lib/agent/defs.ts`:
- Around line 88-89: The AgentDefinition interface currently declares readonly
inferenceProviderOptions?: string[] but the accessor always returns an array, so
change the type to readonly inferenceProviderOptions: string[] to tighten the
contract; update any constructors/defaults or places that create AgentDefinition
instances (e.g., the factory or new AgentDefinition(...) calls and any object
literals) to provide an empty array when none are specified, and remove
redundant undefined checks where callers use inferenceProviderOptions, ensuring
code paths that assumed undefined now treat it as an empty array.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8ee0793f-9f4b-4893-a97a-6a8360308618
📒 Files selected for processing (7)
agents/hermes/manifest.yamlsrc/lib/agent/defs.test.tssrc/lib/agent/defs.tssrc/lib/inference/model-prompts.test.tssrc/lib/inference/model-prompts.tssrc/lib/onboard.tstest/onboard.test.ts
✅ Files skipped from review due to trivial changes (1)
- agents/hermes/manifest.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/inference/model-prompts.test.ts
- src/lib/inference/model-prompts.ts
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
CodeRabbit feedback response for latest push
Local validation run before push:
|
Selective E2E Results — ✅ All requested jobs passedRun: 25679224745
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/inference/config.ts (1)
47-47: ⚡ Quick winMake the Hermes default model explicit instead of index-based.
Line 47 ties behavior to array order; reordering the curated list would silently change onboarding defaults.
Proposed refactor
-export const HERMES_PROVIDER_MODEL_OPTIONS = [ - "moonshotai/kimi-k2.6", +export const DEFAULT_HERMES_PROVIDER_MODEL = "moonshotai/kimi-k2.6"; +export const HERMES_PROVIDER_MODEL_OPTIONS = [ + DEFAULT_HERMES_PROVIDER_MODEL, "xiaomi/mimo-v2.5-pro", "xiaomi/mimo-v2.5", "tencent/hy3-preview", "anthropic/claude-opus-4.7", "anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "anthropic/claude-sonnet-4.5", "anthropic/claude-haiku-4.5", "openai/gpt-5.5", "openai/gpt-5.4-mini", "openai/gpt-5.3-codex", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview", "google/gemini-3.1-pro-preview", "google/gemini-3.1-flash-lite-preview", "qwen/qwen3.5-plus-02-15", "qwen/qwen3.5-35b-a3b", "stepfun/step-3.5-flash", "minimax/minimax-m2.7", "minimax/minimax-m2.5", "minimax/minimax-m2.5:free", "z-ai/glm-5.1", "z-ai/glm-5v-turbo", "z-ai/glm-5-turbo", "x-ai/grok-4.20-beta", "nvidia/nemotron-3-super-120b-a12b", "arcee-ai/trinity-large-thinking", "openai/gpt-5.5-pro", "openai/gpt-5.4-nano", ] as const; -export const DEFAULT_HERMES_PROVIDER_MODEL = HERMES_PROVIDER_MODEL_OPTIONS[0];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/inference/config.ts` at line 47, DEFAULT_HERMES_PROVIDER_MODEL is set from HERMES_PROVIDER_MODEL_OPTIONS[0], which couples the default to array order; change it to an explicit, named default value instead (e.g., the intended model string or an exported constant) rather than indexing the array. Replace the assignment to DEFAULT_HERMES_PROVIDER_MODEL so it references the explicit model identifier you want as the default (or uses a find by model name on HERMES_PROVIDER_MODEL_OPTIONS if you must derive it), and keep the symbol names DEFAULT_HERMES_PROVIDER_MODEL and HERMES_PROVIDER_MODEL_OPTIONS to locate/verify the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/inference/config.ts`:
- Line 47: DEFAULT_HERMES_PROVIDER_MODEL is set from
HERMES_PROVIDER_MODEL_OPTIONS[0], which couples the default to array order;
change it to an explicit, named default value instead (e.g., the intended model
string or an exported constant) rather than indexing the array. Replace the
assignment to DEFAULT_HERMES_PROVIDER_MODEL so it references the explicit model
identifier you want as the default (or uses a find by model name on
HERMES_PROVIDER_MODEL_OPTIONS if you must derive it), and keep the symbol names
DEFAULT_HERMES_PROVIDER_MODEL and HERMES_PROVIDER_MODEL_OPTIONS to locate/verify
the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1ae78979-7a5b-4809-ad20-7273a1acef38
📒 Files selected for processing (4)
docs/reference/commands.mdsrc/lib/actions/inference-set.test.tssrc/lib/inference/config.tssrc/lib/onboard/providers.ts
✅ Files skipped from review due to trivial changes (2)
- src/lib/actions/inference-set.test.ts
- docs/reference/commands.md
## Summary Refreshes the release-prep docs for v0.0.39 based on changes merged since the Friday 4pm doc refresh. Updates the source docs, bumps the docs version metadata, and regenerates the NemoClaw user skills from the refreshed docs. ## Changes - #3314 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documents installer Docker setup, Docker group activation, and retry guidance. - #3317 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Documents the DGX Spark and DGX Station express install prompt and `NEMOCLAW_NO_EXPRESS`. - #3328 and #3329 -> `docs/security/best-practices.md`, `docs/deployment/sandbox-hardening.md`: Updates sandbox capability hardening docs for the stricter bounding-set and `setpriv` step-down behavior. - #3330, #3335, and #3346 -> `docs/inference/use-local-inference.md`: Documents Windows-host Ollama relaunch behavior, NIM key passthrough, early health-fail diagnostics, and mixed-GPU preflight detail. - #2406, #2883, #3001, #3244, #3267, #3318, #3320, and #3354 -> `docs/about/release-notes.md`: Adds the v0.0.39 release-prep section while keeping the v0.0.38 release notes intact. - Advances the release-prep docs metadata from v0.0.38 to v0.0.39. - Regenerates `.agents/skills/nemoclaw-user-*` from the updated source docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes v0.0.39 * **New Features** * Host alias management commands for easier configuration * Sandbox GPU control options during onboarding * Update command with check and confirmation modes * **Documentation** * Enhanced Linux installer guidance with Docker and group membership handling * Expanded troubleshooting for permission and connectivity issues * Improved capability-dropping security documentation * Updated inference model switching commands * Brev environment-specific troubleshooting * **Improvements** * DGX Spark/Station express install flow * Windows Ollama relay and health-check enhancements * NVIDIA NIM preflight GPU reporting [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3375) <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Replays the Hermes Provider onboarding foundation from partner PR #3237 in a branch based on current
main, with the credential boundary adjusted to match NemoClaw/OpenShell security invariants.This keeps bare
nemoclaw onboardon the existing OpenClaw default while allowingnemohermes onboard/nemoclaw onboard --agent hermesto select Hermes Provider, choose Nous Portal OAuth or Nous API Key, pick from Nous-recommended models with a full/custom fallback, and register Hermes inference through OpenShell.Security / Architecture
~/.nemoclaw.NOUS_API_KEY/NEMOCLAW_PROVIDER_KEYfrom the current process and registers it into OpenShell;NOUS_API_KEYis intentionally not added to NemoClaw's legacy host credential store allowlist.NOUS_API_KEYbefore destructive rebuild work.hermesAuthMethod; invalid auth methods are rejected/normalized.Credit
Based on the Hermes Provider foundation work from @shannonsands / NousResearch in #3237. The replay commit includes Shannon as co-author and retains Shannon's signed-off-by from the original contribution.
Verification
npm run build:clinpm run typecheck:cligit diff --cached --checknpx vitest run src/lib/hermes-provider-auth.test.ts src/lib/oauth-device-code.test.ts src/lib/inference/config.test.ts src/lib/inference/model-prompts.test.ts src/lib/inference/nous-models.test.ts src/lib/state/onboard-session.test.ts src/lib/messaging-channel-config.test.ts src/lib/sandbox-channels.test.ts src/lib/state/sandbox-session.test.ts test/hermes-provider-foundation.test.ts test/rebuild-credential-preflight.test.ts test/generate-hermes-config.test.tsnpx vitest run src/lib/oauth-device-code.test.tsSigned-off-by: Shannon Sands shannon.sands.1979@gmail.com
Signed-off-by: Aaron Erickson aerickson@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Docs