Skip to content

[bug] Slack credential not injected into sandbox env after messaging-provider rebuild (NemoClaw 0.0.51, OpenShell 0.0.44) #4274

@marqueswarren

Description

@marqueswarren

Summary

After adding a Slack messaging channel to an existing sandbox via nemoclaw onboard --name <sandbox>, the sandbox image is rebuilt (expected), but the resulting sandbox container has no SLACK_BOT_TOKEN or SLACK_APP_TOKEN in its process environment. /sandbox/.openclaw/openclaw.json retains the literal placeholder strings xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN and xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN — the credential injection from the OpenShell provider pipeline never occurs.

Slack subprocess starts, fails to authenticate (Bolt's auth.test call returns invalid_auth or the process exits silently), and the agent is unreachable via Slack DM despite a successful onboard.


Environment

Component Version
NemoClaw CLI 0.0.51 (installed via official installer https://www.nvidia.com/nemoclaw.sh)
OpenShell 0.0.44
OpenClaw (inside sandbox) 2026.5.18+
Host OS macOS 15 (aarch64 — Apple Silicon M4)
Container runtime Colima (Docker socket at ~/.colima/docker.sock)
Docker engine Colima-provided (Linux VM)
Sandbox name cruz-secure
Agent type OpenClaw (JS/TS runtime, NOT Hermes-agent)
Inference provider Anthropic (anthropic/claude-opus-4-7)

Installation method: NemoClaw installed globally from source (~/.nemoclaw/source/) via npm link. Installed version at time of incident: 0.0.51 (maintained release, not dev build).

Version history note (relevant for reproducibility): Today's session involved multiple partial-failure attempts during a NemoClaw upgrade from 0.0.32 → 0.0.51 + OpenShell 0.0.31 → 0.0.44 + sandbox destroy/recreate cycles. The credential injection failure was observed in the final successful nemoclaw onboard --name cruz-secure run that produced a Phase: Ready sandbox. See §Suspected Cause for why prior partial-failure history may or may not be relevant.


Steps to Reproduce

⚠️ Unconfirmed reproducibility on a clean-slate install — see §Unconfirmed Status below. These steps reconstruct the path that produced the bug. The diagnostic (§Verification) will reveal whether the issue is universal or session-specific.

  1. Install NemoClaw 0.0.51 and OpenShell 0.0.44 on macOS Apple Silicon with Colima.

  2. Start Colima and verify the OpenShell gateway is running:

    colima start
    launchctl list | grep ai.openshell.gateway  # → shows PID
  3. Create (or recreate) a sandbox with Slack channel via the interactive wizard. Do not use --non-interactive or --fresh — Slack requires interactive onboard (step 5/8):

    # If existing sandbox: nemoclaw onboard --name <sandbox> (no flags)
    # If new sandbox: nemoclaw onboard (then choose name)
    nemoclaw onboard --name <sandbox-name>
    # Walk through:
    #   Step 1: confirm name
    #   Step 2: Anthropic provider
    #   Step 3: anthropic/claude-opus-4-7
    #   Step 4: skip Brave (n)
    #   Step 5: select Slack → enter xoxb- token → enter xapp- token
    #   Step 6: observe "Creating sandbox" or "Recreating to ensure credentials flow..."
    #   Step 7: confirm policy presets
    #   Step 8: confirm

    Watch for step 6/8 output:

    Sandbox '<name>' exists but messaging providers are not attached.
    Recreating to ensure credentials flow through the provider pipeline.
    

    This triggers an image rebuild.

  4. Wait for Phase: Ready. Verify status:

    nemoclaw <sandbox-name> status
    # Expected: Phase: Ready, Inference: healthy
  5. Check for credential injection failure:

    # Check 1 — placeholder substitution in openclaw.json:
    nemoclaw <sandbox-name> exec -- cat /sandbox/.openclaw/openclaw.json | grep -E "Token|RESOLVE-ENV"
    # FAIL: shows literal "xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN"
    # PASS: shows actual xoxb-... token value
    
    # Check 2 — process environment:
    OPENCLAW_PID=$(nemoclaw <sandbox-name> exec -- pgrep -f 'openclaw gateway' | head -1)
    nemoclaw <sandbox-name> exec -- cat /proc/${OPENCLAW_PID}/environ | tr "\0" "\n" | grep -iE "SLACK"
    # FAIL: empty output — no SLACK_BOT_TOKEN in process env
    # PASS: shows SLACK_BOT_TOKEN=xoxb-...
    
    # Check 3 — Slack authentication outcome:
    nemoclaw <sandbox-name> exec -- tail /tmp/openclaw-998/openclaw-*.log | grep -iE "slack|invalid_auth|not_authed"
  6. Expected behavior (§Expected vs Actual):

    • SLACK_BOT_TOKEN and SLACK_APP_TOKEN are injected into the sandbox container environment by OpenShell
    • The OpenClaw gateway starts and Bolt authenticates (auth.testok: true)
    • Sending a DM to the agent from an allowlisted Slack user ID produces a response
  7. Actual behavior:

    • openclaw.json retains literal xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN and xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN placeholders
    • SLACK_BOT_TOKEN and SLACK_APP_TOKEN absent from openclaw process environment
    • Bolt never authenticates; no response via Slack DM

Expected vs Actual Behavior

Expected Actual
/sandbox/.openclaw/openclaw.json botToken field Actual xoxb-... token "xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN" (literal placeholder)
/sandbox/.openclaw/openclaw.json appToken field Actual xapp-... token "xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN" (literal placeholder)
SLACK_BOT_TOKEN in openclaw process env xoxb-... injected by OpenShell Not set — empty/absent
SLACK_APP_TOKEN in openclaw process env xapp-... injected by OpenShell Not set — empty/absent
Bolt startup auth.test returns ok: true Fails with invalid_auth or never starts
Slack DM delivery Agent responds to allowlisted user DMs Silent — agent unreachable

Diagnostic Commands

Run these from the host after the affected sandbox is Phase: Ready:

# 1. Check for unresolved RESOLVE-ENV placeholders in openclaw.json
#    FAIL condition: output contains literal "OPENSHELL-RESOLVE-ENV"
nemoclaw <name> exec -- cat /sandbox/.openclaw/openclaw.json | grep -E "Token|RESOLVE-ENV"

# 2. Check sandbox process env for Slack credentials
#    FAIL condition: no output (SLACK_BOT_TOKEN not in env)
nemoclaw <name> exec -- sh -c 'cat /proc/$(pgrep -f "openclaw gateway" | head -1)/environ | tr "\0" "\n" | grep -iE "SLACK"'

# 3. Check registered OpenShell providers (from host — does the gateway know about slack-bridge?)
openshell provider get <sandbox-name>-slack-bridge 2>&1
openshell provider get <sandbox-name>-slack-app 2>&1
# FAIL condition: "provider not found" / non-zero exit

# 4. Check sandbox provider attachment (are providers linked to this sandbox?)
openshell sandbox get <sandbox-name> 2>&1 | grep -i provider
# FAIL condition: slack-bridge / slack-app NOT listed

# 5. Boot log (check nemoclaw-start.sh credential refresh output)
nemoclaw <name> exec -- cat /tmp/nemoclaw-start.log | grep -iE "provider|credential|refresh|SLACK|resolve"
# Look for: "[config] Refreshed provider placeholders from OpenShell runtime env"
# If absent: runtime env injection may have been skipped

Suspected Cause

Based on source-level investigation of ~/.nemoclaw/source/ (0.0.51):

How credential injection is designed to work

  1. During nemoclaw onboard, the user enters tokens interactively → saveCredential(ch.envKey, token)process.env[ch.envKey] = token (in-memory only; nothing written to disk per src/lib/credentials/store.ts comment: "Nothing is written to disk.")

  2. upsertMessagingProviders() calls openshell provider create --name <sandbox>-slack-bridge --type generic --credential SLACK_BOT_TOKEN with { SLACK_BOT_TOKEN: token } in env. This registers the actual token with the OpenShell gateway.

  3. openshell sandbox create is called with --provider <sandbox>-slack-bridge --provider <sandbox>-slack-app. OpenShell is then responsible for injecting SLACK_BOT_TOKEN and SLACK_APP_TOKEN into the sandbox container environment at runtime.

  4. openclaw.json is baked at Docker image build time with literal placeholder xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN (from scripts/generate-openclaw-config.py, _placeholder() function). This placeholder is never replaced by refresh_openclaw_provider_placeholders() in nemoclaw-start.sh — that function only handles openshell:resolve:env:* prefixed values. The Slack placeholder is Bolt-regex-compatible by design and is resolved by the L7 proxy at egress, not by env var substitution.

  5. The sandbox's openclaw process reads SLACK_BOT_TOKEN from its environment (injected by OpenShell in step 3) and passes it to Bolt. Bolt authenticates.

Where this breaks in the observed failure

Hypothesis A (highest probability): The "messaging providers not attached" rebuild path (step 6/8 of the wizard) destroys and recreates the sandbox image. At the point of rebuild, getMessagingToken("SLACK_BOT_TOKEN") is called to populate messagingTokenDefs. If the NemoClaw onboard process does not have SLACK_BOT_TOKEN in process.env at that moment — which can happen if:

  • The token was entered in a previous NemoClaw session that crashed/exited before completing (in-memory credentials do not survive process restart)
  • Or the "messaging providers not attached" trigger fires in a new wizard invocation (not the same session that captured the tokens)

...then upsertMessagingProviders() receives token: null for slack, skips provider registration, the sandbox is created without --provider <sandbox>-slack-bridge, and OpenShell has no credential to inject at runtime.

Hypothesis B (lower probability but NemoClaw version-specific): Even with correct provider registration, a regression in NemoClaw 0.0.51's openshell sandbox create pipeline fails to pass or attach the registered providers to the sandbox. The provider exists in the gateway registry but is not linked to the sandbox container's env injection.

Source location for Hypothesis A:

  • src/lib/onboard.ts ~line 3075: token: getMessagingToken("SLACK_BOT_TOKEN")
  • src/lib/onboard/messaging-token.ts: getMessagingToken = normalizeCredentialValue(process.env[envKey]) || getCredential(envKey) || null
  • src/lib/credentials/store.ts line 164: getCredential() reads process.env only — no persistent storage, no gateway-retrieval fallback

The in-memory-only credential design is intentional for security, but creates a failure mode when a rebuild is triggered in a session that doesn't have the credentials staged. The wizard collects tokens at step 5/8 → sets process.env[ch.envKey] → but if the rebuild then runs in a context where step 5/8 was already completed in a prior session, the env is empty.

Source location for Hypothesis B:

  • src/lib/onboard.ts ~line 3602: upsertMessagingProviders(messagingTokenDefs) returns provider names
  • These names are pushed to createArgs as --provider <name>
  • The openshell-sandbox-create pipeline in OpenShell 0.0.44 is the black box to check

Workarounds Tried

None. Deferred to sterling-secure clean-slate diagnostic (same NemoClaw 0.0.51 path, no prior partial-failure history). See §Unconfirmed Status.


Unconfirmed Status

This issue is filed as "possible bug, verification pending."

The failure may be:

  1. Universal NemoClaw 0.0.51 bug — triggered by any nemoclaw onboard that produces the "messaging providers not attached" rebuild message → reproducible on clean slate
  2. Session-state corruption — specific to today's partial-failure session (multiple destroy/rebuild attempts, process restarts mid-onboard) that left process.env without tokens during the final rebuild → not reproducible on clean slate

Verification plan: Spin up a new sandbox (sterling-secure) on the same system using the same NemoClaw 0.0.51 + OpenShell 0.0.44 + Anthropic provider path, without any prior partial-failure history. If nemoclaw <name> exec -- cat /sandbox/.openclaw/openclaw.json | grep RESOLVE-ENV shows unresolved placeholders on sterling-secure → confirmed NemoClaw 0.0.51 bug. If sterling-secure resolves correctly → cruz-specific state corruption.

We will update this issue with the result before formally requesting a fix.


Additional Context

Messaging channel add = interactive nemoclaw onboard only (UX gap)

There is no nemoclaw <sandbox> channels add slack subcommand in 0.0.51. The only path for adding a messaging channel to an existing sandbox is re-running the full nemoclaw onboard wizard. This is a UX limitation that compounds the credential injection failure — the wizard has no way to recover credentials from the gateway on a fresh session, forcing the user to re-enter tokens. If the re-entered tokens are then lost to the rebuild path described above, the result is a sandbox that "successfully" onboards but cannot authenticate.

request_body_credential_rewrite interaction

NemoClaw 0.0.51 upstream added request_body_credential_rewrite: true to all Slack policy preset REST endpoints in nemoclaw-blueprint/policies/presets/slack.yaml. This is a parallel credential-rewrite mechanism at the HTTP body level. It is not related to the env injection failure but is worth confirming: if env injection is broken, request_body_credential_rewrite will also receive the literal placeholder string xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN instead of the real token, and will be unable to rewrite it.

Version compatibility note

This issue appeared in the context of upgrading from NemoClaw 0.0.32 → 0.0.51 and OpenShell 0.0.31 → 0.0.44 on the same system. The NEMOCLAW_ACCEPT_EXPERIMENTAL_OPENSHELL_UPGRADE=1 flag was required for the OpenShell binary swap (see related experience: pre-0.0.37 → 0.0.44 upgrade requires experimental flag in the installer). The NemoClaw upgrade itself was clean; this credential injection failure appears to be specific to the messaging channel add + sandbox rebuild path in 0.0.51.


Labels (suggested)

bug, slack, credentials, onboard, sandbox-rebuild


End of issue draft.
Return to: RUNBOOK-OPENCLAW-AGENT-IN-NEMOCLAW-COLIMA.md §Common Pitfall #7 — credential injection bug (OQ-1)


Verification on clean-slate sterling-secure spin-up pending; will update with results.

Metadata

Metadata

Assignees

Labels

integration: slackSlack integration or channel behavior

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions