Skip to content

[Ubuntu 22.04][Onboard] nemohermes re-onboard re-asks all 5 messaging per-channel prompts; credentials.json never written; "Messaging: none" on Run 2 review #3581

@hulynn

Description

@hulynn

Description

Description

The wizard accepts all five per-channel values on the first run and prints "✓ saved" after each,
but a second `nemohermes onboard --recreate-sandbox` re-asks every one of them,
the second-run review screen shows "Messaging: none",
and ~/.nemoclaw/credentials.json is never written. Spec expects "already set" / skip messages and an 8-key credentials.json.
Environment
Device:        Brev shell `nemoclaw-0514` (shadeform-managed; host brev-w4rqzli3u)
OS:            Ubuntu 22.04.5 LTS, kernel 6.8.0-90-generic
Architecture:  x86_64
GPU:           NVIDIA H100 PCIe (81559 MiB)
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.1.3 (server)
OpenShell CLI: openshell 0.0.39
NemoClaw:      v0.0.43
NemoHermes:    v0.0.43
OpenClaw:      Not reached (sandbox build failed at GPU patch step — see related bug section)
Docker nvidia runtime: registered (Default Runtime: nvidia)
nvidia-container-toolkit: 1.19.0-1
Steps to Reproduce
Pre-conditions:
  - NemoClaw + NemoHermes v0.0.43 installed (curl|bash; license accepted)
  - ~/.nemoclaw/credentials.json absent (confirm with ls)
  - Export three bot tokens (placeholder format OK; test verifies prompt persistence
    not message delivery):
        export TELEGRAM_BOT_TOKEN="123456789:AAAAAAAAAAAAAAAAAAAAAAAAAAAA****"
        export DISCORD_BOT_TOKEN="MTAxMjM0NTY3ODkwMTIzNDU2Nzg5MA.Gabcde.AAAAAAAA..."
        export SLACK_BOT_TOKEN="xoxb-1234567890123-1234567890123-AbCdEfGhIjKlMnOp..."

Run 1:
  1. nemohermes onboard --recreate-sandbox
  2. Walk through wizard: provider 1 (NVIDIA Endpoints) → API key → default model
     (Nemotron 3 Super 120B) → sandbox name "hermes" → Apply.
  3. At [5/8] Messaging channels, all three channels are auto-toggled ON
     (env tokens detected). Press Enter to advance to per-channel prompts.
  4. Fill the 5 spec-listed prompts in wizard order:
        a) Telegram "Reply only when @mentioned? [Y/n]:"             → y
        b) Telegram "User ID (for DM access)":                       → 12345,67890
        c) Discord "Server ID":                                      → 11111
        d) Discord "Reply only when @mentioned? [Y/n]:"              → n
        e) Discord "User ID (optional guild allowlist)":             → 22222
        (Slack also asks for App Token + Member IDs — not in spec but wizard requires.)
  5. Wizard prints "✓ saved" after each input.
  6. Inspect host state:
        ls -la ~/.nemoclaw/credentials.json
        python3 -c "import json; print(json.load(open('/home/.../onboard-session.json')).get('messagingChannels'))"

Run 2:
  7. Clean up failed sandbox: openshell sandbox delete hermes
  8. nemohermes onboard --recreate-sandbox (same env)
  9. Repeat the same wizard sequence and observe the messaging step.
Expected Result
Per T6002672 spec:
  1) Run 1 prompts for all five values; onboard completes.
  2) ~/.nemoclaw/credentials.json contains:
       TELEGRAM_BOT_TOKEN, TELEGRAM_ALLOWED_IDS, TELEGRAM_REQUIRE_MENTION,
       DISCORD_BOT_TOKEN, DISCORD_SERVER_ID, DISCORD_USER_ID, DISCORD_REQUIRE_MENTION,
       SLACK_BOT_TOKEN
  3) Run 2 reports "already set" (or equivalent skip) on each of the five
     per-channel prompts; the same values are NOT re-asked.
  FAIL signal per spec: any prompt re-asks a value entered in Run 1.
Actual Result
Run 1 — credentials.json NEVER written:
  $ ls -la ~/.nemoclaw/credentials.json
  ls: cannot access ...: No such file or directory

  $ python3 -c "import json; d=json.load(open('~/.nemoclaw/onboard-session.json')); print(d.get('messagingChannels'), d.get('messagingConfig'))"
  ['telegram','discord','slack']  None

  The five per-channel values are NOT on the host filesystem. Instead they are
  baked into the sandbox IMAGE build args (visible in the Dockerfile build log):
    ARG NEMOCLAW_MESSAGING_CHANNELS_B64=WyJkaXNjb3JkIiwic2x******
    ARG NEMOCLAW_MESSAGING_ALLOWED_IDS_B64=eyJ0ZWxlZ3JhbSI6******wIl0...
    ARG NEMOCLAW_DISCORD_GUILDS_B64=eyIxMTExMSI6eyJyZXF1aXJl****2UsIn...
    ARG NEMOCLAW_TELEGRAM_CONFIG_B64=eyJyZXF1aXJl*****J1ZX0=
  This is a different persistence model than the spec assumes.

Run 2 — Review configuration shows "Messaging: none":
    Provider:      nvidia-prod
    Model:         nvidia/nemotron-3-super-120b-a12b
    API key:       NVIDIA_API_KEY (staged for OpenShell gateway registration)
    Web search:    disabled
    Messaging:     none                    ← KEY EVIDENCE: Run 1 config not remembered
    Sandbox name:  hermes

Run 2 — All 5 spec-listed per-channel prompts re-asked verbatim:

  Prompt                                | Run 1 input    | Run 2 wizard behavior
  --------------------------------------+----------------+----------------------
  telegram Reply only when @mentioned?  | y              | RE-ASKED
  telegram User ID (allowlist)          | 12345,67890    | RE-ASKED
  discord Server ID                     | 11111          | RE-ASKED
  discord Reply only when @mentioned?   | n              | RE-ASKED
  discord User ID                       | 22222          | RE-ASKED
  (extra) Slack App Token               | xapp-1-...     | RE-ASKED
  (extra) Slack Member IDs              | U01ABC..,U04.. | RE-ASKED

The "✓ telegram — already configured" header on Run 2 refers ONLY to the
bot TOKEN env var being present, NOT to the channel's full configuration
having survived from Run 1.
Logs
Run 1 wizard transcripts: /tmp/hermes-run1.log
Run 2 wizard transcripts: /tmp/hermes-run2b.log
Failure diagnostics:      ~/.nemoclaw/onboard-failures/2026-05-15T09-41-20-223Z-hermes-docker-gpu-patch/

Two interpretations — needs PM/Eng triage:

(a) Product bug: the wizard's persistence layer for messaging config does not
    survive a re-onboard. The "✓ saved" messages are misleading because the
    values only land in the sandbox image build args (and even those are lost
    once `--recreate-sandbox` rebuilds the image from scratch on Run 2).
    To meet the spec, the wizard must persist the 5 per-channel values to a
    location that survives between onboards (credentials.json on host, OR
    a gateway-side store the wizard re-reads on launch).

(b) Spec is stale: the persistence model has intentionally shifted from
    host-side credentials.json to image-time build args. In that case, the
    T6002672 verification approach needs updating — e.g. assert that the
    NEMOCLAW_*_B64 ARGs inside the latest sandbox image match the entered
    values, AND make the wizard report "already set" on Run 2 by reading
    those ARG values from the existing image before recreating it.

Either way, the on-screen behavior contradicts the spec's FAIL criterion
("any prompt re-asks for a value the user already entered in run 1") — all
5 prompts re-ask.

Notes on related findings (separate issues, mentioned for triage context, not duplicates):
  - Docker GPU patch failed in Run 1 with "OpenShell supervisor did not
    reconnect to the GPU-enabled container." Different from the AMD CDI
    spec bug (6126101 / 6110214) — `--gpus all` mode select succeeded but
    supervisor reconnect timed out. Container stuck in Restarting loop.
    Should be filed separately if not already tracked.
  - Wizard preflight UX improvement (positive): when sandbox→gateway is
    blocked by UFW on the 172.18.0.0/16 bridge, the wizard now prints the
    exact `sudo ufw allow ...` remediation command. Big improvement over
    earlier "auth proxy unreachable" cryptic error.

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard

[NVB#6180486]

Metadata

Metadata

Assignees

No one assigned

    Labels

    NV QABugs found by the NVIDIA QA Teamplatform: brevAffects Brev hosted development environmentsplatform: ubuntuAffects Ubuntu Linux environments

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions