Description
Description
The wizard accepts all five per-channel values on the first run and prints "✓ saved" after each,
but a second `nemohermes onboard --recreate-sandbox` re-asks every one of them,
the second-run review screen shows "Messaging: none",
and ~/.nemoclaw/credentials.json is never written. Spec expects "already set" / skip messages and an 8-key credentials.json.
Environment
Device: Brev shell `nemoclaw-0514` (shadeform-managed; host brev-w4rqzli3u)
OS: Ubuntu 22.04.5 LTS, kernel 6.8.0-90-generic
Architecture: x86_64
GPU: NVIDIA H100 PCIe (81559 MiB)
Node.js: v22.22.3
npm: 10.9.8
Docker: 29.1.3 (server)
OpenShell CLI: openshell 0.0.39
NemoClaw: v0.0.43
NemoHermes: v0.0.43
OpenClaw: Not reached (sandbox build failed at GPU patch step — see related bug section)
Docker nvidia runtime: registered (Default Runtime: nvidia)
nvidia-container-toolkit: 1.19.0-1
Steps to Reproduce
Pre-conditions:
- NemoClaw + NemoHermes v0.0.43 installed (curl|bash; license accepted)
- ~/.nemoclaw/credentials.json absent (confirm with ls)
- Export three bot tokens (placeholder format OK; test verifies prompt persistence
not message delivery):
export TELEGRAM_BOT_TOKEN="123456789:AAAAAAAAAAAAAAAAAAAAAAAAAAAA****"
export DISCORD_BOT_TOKEN="MTAxMjM0NTY3ODkwMTIzNDU2Nzg5MA.Gabcde.AAAAAAAA..."
export SLACK_BOT_TOKEN="xoxb-1234567890123-1234567890123-AbCdEfGhIjKlMnOp..."
Run 1:
1. nemohermes onboard --recreate-sandbox
2. Walk through wizard: provider 1 (NVIDIA Endpoints) → API key → default model
(Nemotron 3 Super 120B) → sandbox name "hermes" → Apply.
3. At [5/8] Messaging channels, all three channels are auto-toggled ON
(env tokens detected). Press Enter to advance to per-channel prompts.
4. Fill the 5 spec-listed prompts in wizard order:
a) Telegram "Reply only when @mentioned? [Y/n]:" → y
b) Telegram "User ID (for DM access)": → 12345,67890
c) Discord "Server ID": → 11111
d) Discord "Reply only when @mentioned? [Y/n]:" → n
e) Discord "User ID (optional guild allowlist)": → 22222
(Slack also asks for App Token + Member IDs — not in spec but wizard requires.)
5. Wizard prints "✓ saved" after each input.
6. Inspect host state:
ls -la ~/.nemoclaw/credentials.json
python3 -c "import json; print(json.load(open('/home/.../onboard-session.json')).get('messagingChannels'))"
Run 2:
7. Clean up failed sandbox: openshell sandbox delete hermes
8. nemohermes onboard --recreate-sandbox (same env)
9. Repeat the same wizard sequence and observe the messaging step.
Expected Result
Per T6002672 spec:
1) Run 1 prompts for all five values; onboard completes.
2) ~/.nemoclaw/credentials.json contains:
TELEGRAM_BOT_TOKEN, TELEGRAM_ALLOWED_IDS, TELEGRAM_REQUIRE_MENTION,
DISCORD_BOT_TOKEN, DISCORD_SERVER_ID, DISCORD_USER_ID, DISCORD_REQUIRE_MENTION,
SLACK_BOT_TOKEN
3) Run 2 reports "already set" (or equivalent skip) on each of the five
per-channel prompts; the same values are NOT re-asked.
FAIL signal per spec: any prompt re-asks a value entered in Run 1.
Actual Result
Run 1 — credentials.json NEVER written:
$ ls -la ~/.nemoclaw/credentials.json
ls: cannot access ...: No such file or directory
$ python3 -c "import json; d=json.load(open('~/.nemoclaw/onboard-session.json')); print(d.get('messagingChannels'), d.get('messagingConfig'))"
['telegram','discord','slack'] None
The five per-channel values are NOT on the host filesystem. Instead they are
baked into the sandbox IMAGE build args (visible in the Dockerfile build log):
ARG NEMOCLAW_MESSAGING_CHANNELS_B64=WyJkaXNjb3JkIiwic2x******
ARG NEMOCLAW_MESSAGING_ALLOWED_IDS_B64=eyJ0ZWxlZ3JhbSI6******wIl0...
ARG NEMOCLAW_DISCORD_GUILDS_B64=eyIxMTExMSI6eyJyZXF1aXJl****2UsIn...
ARG NEMOCLAW_TELEGRAM_CONFIG_B64=eyJyZXF1aXJl*****J1ZX0=
This is a different persistence model than the spec assumes.
Run 2 — Review configuration shows "Messaging: none":
Provider: nvidia-prod
Model: nvidia/nemotron-3-super-120b-a12b
API key: NVIDIA_API_KEY (staged for OpenShell gateway registration)
Web search: disabled
Messaging: none ← KEY EVIDENCE: Run 1 config not remembered
Sandbox name: hermes
Run 2 — All 5 spec-listed per-channel prompts re-asked verbatim:
Prompt | Run 1 input | Run 2 wizard behavior
--------------------------------------+----------------+----------------------
telegram Reply only when @mentioned? | y | RE-ASKED
telegram User ID (allowlist) | 12345,67890 | RE-ASKED
discord Server ID | 11111 | RE-ASKED
discord Reply only when @mentioned? | n | RE-ASKED
discord User ID | 22222 | RE-ASKED
(extra) Slack App Token | xapp-1-... | RE-ASKED
(extra) Slack Member IDs | U01ABC..,U04.. | RE-ASKED
The "✓ telegram — already configured" header on Run 2 refers ONLY to the
bot TOKEN env var being present, NOT to the channel's full configuration
having survived from Run 1.
Logs
Run 1 wizard transcripts: /tmp/hermes-run1.log
Run 2 wizard transcripts: /tmp/hermes-run2b.log
Failure diagnostics: ~/.nemoclaw/onboard-failures/2026-05-15T09-41-20-223Z-hermes-docker-gpu-patch/
Two interpretations — needs PM/Eng triage:
(a) Product bug: the wizard's persistence layer for messaging config does not
survive a re-onboard. The "✓ saved" messages are misleading because the
values only land in the sandbox image build args (and even those are lost
once `--recreate-sandbox` rebuilds the image from scratch on Run 2).
To meet the spec, the wizard must persist the 5 per-channel values to a
location that survives between onboards (credentials.json on host, OR
a gateway-side store the wizard re-reads on launch).
(b) Spec is stale: the persistence model has intentionally shifted from
host-side credentials.json to image-time build args. In that case, the
T6002672 verification approach needs updating — e.g. assert that the
NEMOCLAW_*_B64 ARGs inside the latest sandbox image match the entered
values, AND make the wizard report "already set" on Run 2 by reading
those ARG values from the existing image before recreating it.
Either way, the on-screen behavior contradicts the spec's FAIL criterion
("any prompt re-asks for a value the user already entered in run 1") — all
5 prompts re-ask.
Notes on related findings (separate issues, mentioned for triage context, not duplicates):
- Docker GPU patch failed in Run 1 with "OpenShell supervisor did not
reconnect to the GPU-enabled container." Different from the AMD CDI
spec bug (6126101 / 6110214) — `--gpus all` mode select succeeded but
supervisor reconnect timed out. Container stuck in Restarting loop.
Should be filed separately if not already tracked.
- Wizard preflight UX improvement (positive): when sandbox→gateway is
blocked by UFW on the 172.18.0.0/16 bridge, the wizard now prints the
exact `sudo ufw allow ...` remediation command. Big improvement over
earlier "auth proxy unreachable" cryptic error.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard |
[NVB#6180486]
Description
Description
Environment Steps to ReproducePre-conditions: - NemoClaw + NemoHermes v0.0.43 installed (curl|bash; license accepted) - ~/.nemoclaw/credentials.json absent (confirm with ls) - Export three bot tokens (placeholder format OK; test verifies prompt persistence not message delivery): export TELEGRAM_BOT_TOKEN="123456789:AAAAAAAAAAAAAAAAAAAAAAAAAAAA****" export DISCORD_BOT_TOKEN="MTAxMjM0NTY3ODkwMTIzNDU2Nzg5MA.Gabcde.AAAAAAAA..." export SLACK_BOT_TOKEN="xoxb-1234567890123-1234567890123-AbCdEfGhIjKlMnOp..." Run 1: 1. nemohermes onboard --recreate-sandbox 2. Walk through wizard: provider 1 (NVIDIA Endpoints) → API key → default model (Nemotron 3 Super 120B) → sandbox name "hermes" → Apply. 3. At [5/8] Messaging channels, all three channels are auto-toggled ON (env tokens detected). Press Enter to advance to per-channel prompts. 4. Fill the 5 spec-listed prompts in wizard order: a) Telegram "Reply only when @mentioned? [Y/n]:" → y b) Telegram "User ID (for DM access)": → 12345,67890 c) Discord "Server ID": → 11111 d) Discord "Reply only when @mentioned? [Y/n]:" → n e) Discord "User ID (optional guild allowlist)": → 22222 (Slack also asks for App Token + Member IDs — not in spec but wizard requires.) 5. Wizard prints "✓ saved" after each input. 6. Inspect host state: ls -la ~/.nemoclaw/credentials.json python3 -c "import json; print(json.load(open('/home/.../onboard-session.json')).get('messagingChannels'))" Run 2: 7. Clean up failed sandbox: openshell sandbox delete hermes 8. nemohermes onboard --recreate-sandbox (same env) 9. Repeat the same wizard sequence and observe the messaging step.Expected ResultPer T6002672 spec: 1) Run 1 prompts for all five values; onboard completes. 2) ~/.nemoclaw/credentials.json contains: TELEGRAM_BOT_TOKEN, TELEGRAM_ALLOWED_IDS, TELEGRAM_REQUIRE_MENTION, DISCORD_BOT_TOKEN, DISCORD_SERVER_ID, DISCORD_USER_ID, DISCORD_REQUIRE_MENTION, SLACK_BOT_TOKEN 3) Run 2 reports "already set" (or equivalent skip) on each of the five per-channel prompts; the same values are NOT re-asked. FAIL signal per spec: any prompt re-asks a value entered in Run 1.Actual ResultRun 1 — credentials.json NEVER written: $ ls -la ~/.nemoclaw/credentials.json ls: cannot access ...: No such file or directory $ python3 -c "import json; d=json.load(open('~/.nemoclaw/onboard-session.json')); print(d.get('messagingChannels'), d.get('messagingConfig'))" ['telegram','discord','slack'] None The five per-channel values are NOT on the host filesystem. Instead they are baked into the sandbox IMAGE build args (visible in the Dockerfile build log): ARG NEMOCLAW_MESSAGING_CHANNELS_B64=WyJkaXNjb3JkIiwic2x****** ARG NEMOCLAW_MESSAGING_ALLOWED_IDS_B64=eyJ0ZWxlZ3JhbSI6******wIl0... ARG NEMOCLAW_DISCORD_GUILDS_B64=eyIxMTExMSI6eyJyZXF1aXJl****2UsIn... ARG NEMOCLAW_TELEGRAM_CONFIG_B64=eyJyZXF1aXJl*****J1ZX0= This is a different persistence model than the spec assumes. Run 2 — Review configuration shows "Messaging: none": Provider: nvidia-prod Model: nvidia/nemotron-3-super-120b-a12b API key: NVIDIA_API_KEY (staged for OpenShell gateway registration) Web search: disabled Messaging: none ← KEY EVIDENCE: Run 1 config not remembered Sandbox name: hermes Run 2 — All 5 spec-listed per-channel prompts re-asked verbatim: Prompt | Run 1 input | Run 2 wizard behavior --------------------------------------+----------------+---------------------- telegram Reply only when @mentioned? | y | RE-ASKED telegram User ID (allowlist) | 12345,67890 | RE-ASKED discord Server ID | 11111 | RE-ASKED discord Reply only when @mentioned? | n | RE-ASKED discord User ID | 22222 | RE-ASKED (extra) Slack App Token | xapp-1-... | RE-ASKED (extra) Slack Member IDs | U01ABC..,U04.. | RE-ASKED The "✓ telegram — already configured" header on Run 2 refers ONLY to the bot TOKEN env var being present, NOT to the channel's full configuration having survived from Run 1.LogsRun 1 wizard transcripts: /tmp/hermes-run1.log Run 2 wizard transcripts: /tmp/hermes-run2b.log Failure diagnostics: ~/.nemoclaw/onboard-failures/2026-05-15T09-41-20-223Z-hermes-docker-gpu-patch/ Two interpretations — needs PM/Eng triage: (a) Product bug: the wizard's persistence layer for messaging config does not survive a re-onboard. The "✓ saved" messages are misleading because the values only land in the sandbox image build args (and even those are lost once `--recreate-sandbox` rebuilds the image from scratch on Run 2). To meet the spec, the wizard must persist the 5 per-channel values to a location that survives between onboards (credentials.json on host, OR a gateway-side store the wizard re-reads on launch). (b) Spec is stale: the persistence model has intentionally shifted from host-side credentials.json to image-time build args. In that case, the T6002672 verification approach needs updating — e.g. assert that the NEMOCLAW_*_B64 ARGs inside the latest sandbox image match the entered values, AND make the wizard report "already set" on Run 2 by reading those ARG values from the existing image before recreating it. Either way, the on-screen behavior contradicts the spec's FAIL criterion ("any prompt re-asks for a value the user already entered in run 1") — all 5 prompts re-ask. Notes on related findings (separate issues, mentioned for triage context, not duplicates): - Docker GPU patch failed in Run 1 with "OpenShell supervisor did not reconnect to the GPU-enabled container." Different from the AMD CDI spec bug (6126101 / 6110214) — `--gpus all` mode select succeeded but supervisor reconnect timed out. Container stuck in Restarting loop. Should be filed separately if not already tracked. - Wizard preflight UX improvement (positive): when sandbox→gateway is blocked by UFW on the 172.18.0.0/16 bridge, the wizard now prints the exact `sudo ufw allow ...` remediation command. Big improvement over earlier "auth proxy unreachable" cryptic error.Bug Details
[NVB#6180486]