[bug] Slack credential not injected into sandbox env after messaging-provider rebuild (NemoClaw 0.0.51, OpenShell 0.0.44)

### Summary

After adding a Slack messaging channel to an existing sandbox via `nemoclaw onboard --name <sandbox>`, the sandbox image is rebuilt (expected), but the resulting sandbox container has **no `SLACK_BOT_TOKEN` or `SLACK_APP_TOKEN` in its process environment**. `/sandbox/.openclaw/openclaw.json` retains the literal placeholder strings `xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN` and `xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN` — the credential injection from the OpenShell provider pipeline never occurs.

Slack subprocess starts, fails to authenticate (Bolt's `auth.test` call returns `invalid_auth` or the process exits silently), and the agent is unreachable via Slack DM despite a successful onboard.

---

### Environment

| Component | Version |
|---|---|
| NemoClaw CLI | 0.0.51 (installed via official installer `https://www.nvidia.com/nemoclaw.sh`) |
| OpenShell | 0.0.44 |
| OpenClaw (inside sandbox) | 2026.5.18+ |
| Host OS | macOS 15 (aarch64 — Apple Silicon M4) |
| Container runtime | Colima (Docker socket at `~/.colima/docker.sock`) |
| Docker engine | Colima-provided (Linux VM) |
| Sandbox name | `cruz-secure` |
| Agent type | OpenClaw (JS/TS runtime, NOT Hermes-agent) |
| Inference provider | Anthropic (`anthropic/claude-opus-4-7`) |

**Installation method:** NemoClaw installed globally from source (`~/.nemoclaw/source/`) via `npm link`. Installed version at time of incident: 0.0.51 (maintained release, not dev build).

**Version history note (relevant for reproducibility):** Today's session involved multiple partial-failure attempts during a NemoClaw upgrade from 0.0.32 → 0.0.51 + OpenShell 0.0.31 → 0.0.44 + sandbox destroy/recreate cycles. The credential injection failure was observed in the final successful `nemoclaw onboard --name cruz-secure` run that produced a `Phase: Ready` sandbox. See §Suspected Cause for why prior partial-failure history may or may not be relevant.

---

### Steps to Reproduce

**⚠️ Unconfirmed reproducibility on a clean-slate install — see §Unconfirmed Status below.** These steps reconstruct the path that produced the bug. The diagnostic (§Verification) will reveal whether the issue is universal or session-specific.

1. Install NemoClaw 0.0.51 and OpenShell 0.0.44 on macOS Apple Silicon with Colima.

2. Start Colima and verify the OpenShell gateway is running:
   ```bash
   colima start
   launchctl list | grep ai.openshell.gateway  # → shows PID
   ```

3. Create (or recreate) a sandbox with Slack channel via the interactive wizard. **Do not use `--non-interactive` or `--fresh` — Slack requires interactive onboard (step 5/8):**
   ```bash
   # If existing sandbox: nemoclaw onboard --name <sandbox> (no flags)
   # If new sandbox: nemoclaw onboard (then choose name)
   nemoclaw onboard --name <sandbox-name>
   # Walk through:
   #   Step 1: confirm name
   #   Step 2: Anthropic provider
   #   Step 3: anthropic/claude-opus-4-7
   #   Step 4: skip Brave (n)
   #   Step 5: select Slack → enter xoxb- token → enter xapp- token
   #   Step 6: observe "Creating sandbox" or "Recreating to ensure credentials flow..."
   #   Step 7: confirm policy presets
   #   Step 8: confirm
   ```
   Watch for step 6/8 output:
   ```
   Sandbox '<name>' exists but messaging providers are not attached.
   Recreating to ensure credentials flow through the provider pipeline.
   ```
   This triggers an image rebuild.

4. Wait for `Phase: Ready`. Verify status:
   ```bash
   nemoclaw <sandbox-name> status
   # Expected: Phase: Ready, Inference: healthy
   ```

5. Check for credential injection failure:
   ```bash
   # Check 1 — placeholder substitution in openclaw.json:
   nemoclaw <sandbox-name> exec -- cat /sandbox/.openclaw/openclaw.json | grep -E "Token|RESOLVE-ENV"
   # FAIL: shows literal "xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN"
   # PASS: shows actual xoxb-... token value

   # Check 2 — process environment:
   OPENCLAW_PID=$(nemoclaw <sandbox-name> exec -- pgrep -f 'openclaw gateway' | head -1)
   nemoclaw <sandbox-name> exec -- cat /proc/${OPENCLAW_PID}/environ | tr "\0" "\n" | grep -iE "SLACK"
   # FAIL: empty output — no SLACK_BOT_TOKEN in process env
   # PASS: shows SLACK_BOT_TOKEN=xoxb-...

   # Check 3 — Slack authentication outcome:
   nemoclaw <sandbox-name> exec -- tail /tmp/openclaw-998/openclaw-*.log | grep -iE "slack|invalid_auth|not_authed"
   ```

6. **Expected behavior (§Expected vs Actual):**
   - `SLACK_BOT_TOKEN` and `SLACK_APP_TOKEN` are injected into the sandbox container environment by OpenShell
   - The OpenClaw gateway starts and Bolt authenticates (`auth.test` → `ok: true`)
   - Sending a DM to the agent from an allowlisted Slack user ID produces a response

7. **Actual behavior:**
   - `openclaw.json` retains literal `xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN` and `xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN` placeholders
   - `SLACK_BOT_TOKEN` and `SLACK_APP_TOKEN` absent from openclaw process environment
   - Bolt never authenticates; no response via Slack DM

---

### Expected vs Actual Behavior

| | Expected | Actual |
|---|---|---|
| `/sandbox/.openclaw/openclaw.json` `botToken` field | Actual `xoxb-...` token | `"xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN"` (literal placeholder) |
| `/sandbox/.openclaw/openclaw.json` `appToken` field | Actual `xapp-...` token | `"xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN"` (literal placeholder) |
| `SLACK_BOT_TOKEN` in openclaw process env | `xoxb-...` injected by OpenShell | Not set — empty/absent |
| `SLACK_APP_TOKEN` in openclaw process env | `xapp-...` injected by OpenShell | Not set — empty/absent |
| Bolt startup | `auth.test` returns `ok: true` | Fails with `invalid_auth` or never starts |
| Slack DM delivery | Agent responds to allowlisted user DMs | Silent — agent unreachable |

---

### Diagnostic Commands

Run these from the host after the affected sandbox is `Phase: Ready`:

```bash
# 1. Check for unresolved RESOLVE-ENV placeholders in openclaw.json
#    FAIL condition: output contains literal "OPENSHELL-RESOLVE-ENV"
nemoclaw <name> exec -- cat /sandbox/.openclaw/openclaw.json | grep -E "Token|RESOLVE-ENV"

# 2. Check sandbox process env for Slack credentials
#    FAIL condition: no output (SLACK_BOT_TOKEN not in env)
nemoclaw <name> exec -- sh -c 'cat /proc/$(pgrep -f "openclaw gateway" | head -1)/environ | tr "\0" "\n" | grep -iE "SLACK"'

# 3. Check registered OpenShell providers (from host — does the gateway know about slack-bridge?)
openshell provider get <sandbox-name>-slack-bridge 2>&1
openshell provider get <sandbox-name>-slack-app 2>&1
# FAIL condition: "provider not found" / non-zero exit

# 4. Check sandbox provider attachment (are providers linked to this sandbox?)
openshell sandbox get <sandbox-name> 2>&1 | grep -i provider
# FAIL condition: slack-bridge / slack-app NOT listed

# 5. Boot log (check nemoclaw-start.sh credential refresh output)
nemoclaw <name> exec -- cat /tmp/nemoclaw-start.log | grep -iE "provider|credential|refresh|SLACK|resolve"
# Look for: "[config] Refreshed provider placeholders from OpenShell runtime env"
# If absent: runtime env injection may have been skipped
```

---

### Suspected Cause

Based on source-level investigation of `~/.nemoclaw/source/` (0.0.51):

#### How credential injection is designed to work

1. During `nemoclaw onboard`, the user enters tokens interactively → `saveCredential(ch.envKey, token)` → `process.env[ch.envKey] = token` (in-memory only; nothing written to disk per `src/lib/credentials/store.ts` comment: *"Nothing is written to disk."*)

2. `upsertMessagingProviders()` calls `openshell provider create --name <sandbox>-slack-bridge --type generic --credential SLACK_BOT_TOKEN` with `{ SLACK_BOT_TOKEN: token }` in env. This registers the actual token with the OpenShell gateway.

3. `openshell sandbox create` is called with `--provider <sandbox>-slack-bridge --provider <sandbox>-slack-app`. OpenShell is then responsible for injecting `SLACK_BOT_TOKEN` and `SLACK_APP_TOKEN` into the sandbox container environment at runtime.

4. `openclaw.json` is baked at Docker image build time with literal placeholder `xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN` (from `scripts/generate-openclaw-config.py`, `_placeholder()` function). This placeholder is **never replaced by `refresh_openclaw_provider_placeholders()`** in `nemoclaw-start.sh` — that function only handles `openshell:resolve:env:*` prefixed values. The Slack placeholder is Bolt-regex-compatible by design and is resolved by the L7 proxy at egress, not by env var substitution.

5. The sandbox's openclaw process reads `SLACK_BOT_TOKEN` from its environment (injected by OpenShell in step 3) and passes it to Bolt. Bolt authenticates.

#### Where this breaks in the observed failure

**Hypothesis A (highest probability):** The "messaging providers not attached" rebuild path (step 6/8 of the wizard) destroys and recreates the sandbox image. At the point of rebuild, `getMessagingToken("SLACK_BOT_TOKEN")` is called to populate `messagingTokenDefs`. If the NemoClaw onboard process does not have `SLACK_BOT_TOKEN` in `process.env` at that moment — which can happen if:
  - The token was entered in a *previous NemoClaw session* that crashed/exited before completing (in-memory credentials do not survive process restart)
  - Or the "messaging providers not attached" trigger fires in a *new* wizard invocation (not the same session that captured the tokens)
  
  ...then `upsertMessagingProviders()` receives `token: null` for slack, skips provider registration, the sandbox is created **without** `--provider <sandbox>-slack-bridge`, and OpenShell has no credential to inject at runtime.

**Hypothesis B (lower probability but NemoClaw version-specific):** Even with correct provider registration, a regression in NemoClaw 0.0.51's `openshell sandbox create` pipeline fails to pass or attach the registered providers to the sandbox. The provider exists in the gateway registry but is not linked to the sandbox container's env injection.

**Source location for Hypothesis A:**
- `src/lib/onboard.ts` ~line 3075: `token: getMessagingToken("SLACK_BOT_TOKEN")`
- `src/lib/onboard/messaging-token.ts`: `getMessagingToken = normalizeCredentialValue(process.env[envKey]) || getCredential(envKey) || null`
- `src/lib/credentials/store.ts` line 164: `getCredential()` reads `process.env` only — no persistent storage, no gateway-retrieval fallback

The in-memory-only credential design is intentional for security, but creates a failure mode when a rebuild is triggered in a session that doesn't have the credentials staged. The wizard collects tokens at step 5/8 → sets `process.env[ch.envKey]` → but if the rebuild then runs in a context where step 5/8 was already completed in a prior session, the env is empty.

**Source location for Hypothesis B:**
- `src/lib/onboard.ts` ~line 3602: `upsertMessagingProviders(messagingTokenDefs)` returns provider names
- These names are pushed to `createArgs` as `--provider <name>`
- The openshell-sandbox-create pipeline in OpenShell 0.0.44 is the black box to check

---

### Workarounds Tried

None. Deferred to sterling-secure clean-slate diagnostic (same NemoClaw 0.0.51 path, no prior partial-failure history). See §Unconfirmed Status.

---

### Unconfirmed Status

**This issue is filed as "possible bug, verification pending."**

The failure may be:
1. **Universal NemoClaw 0.0.51 bug** — triggered by any `nemoclaw onboard` that produces the "messaging providers not attached" rebuild message → reproducible on clean slate
2. **Session-state corruption** — specific to today's partial-failure session (multiple destroy/rebuild attempts, process restarts mid-onboard) that left `process.env` without tokens during the final rebuild → not reproducible on clean slate

**Verification plan:** Spin up a new sandbox (`sterling-secure`) on the same system using the same NemoClaw 0.0.51 + OpenShell 0.0.44 + Anthropic provider path, **without any prior partial-failure history**. If `nemoclaw <name> exec -- cat /sandbox/.openclaw/openclaw.json | grep RESOLVE-ENV` shows unresolved placeholders on sterling-secure → confirmed NemoClaw 0.0.51 bug. If sterling-secure resolves correctly → cruz-specific state corruption.

**We will update this issue with the result before formally requesting a fix.**

---

### Additional Context

#### Messaging channel add = interactive `nemoclaw onboard` only (UX gap)

There is no `nemoclaw <sandbox> channels add slack` subcommand in 0.0.51. The only path for adding a messaging channel to an existing sandbox is re-running the full `nemoclaw onboard` wizard. This is a UX limitation that compounds the credential injection failure — the wizard has no way to recover credentials from the gateway on a fresh session, forcing the user to re-enter tokens. If the re-entered tokens are then lost to the rebuild path described above, the result is a sandbox that "successfully" onboards but cannot authenticate.

#### `request_body_credential_rewrite` interaction

NemoClaw 0.0.51 upstream added `request_body_credential_rewrite: true` to all Slack policy preset REST endpoints in `nemoclaw-blueprint/policies/presets/slack.yaml`. This is a parallel credential-rewrite mechanism at the HTTP body level. It is not related to the env injection failure but is worth confirming: if env injection is broken, `request_body_credential_rewrite` will also receive the literal placeholder string `xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN` instead of the real token, and will be unable to rewrite it.

#### Version compatibility note

This issue appeared in the context of upgrading from NemoClaw 0.0.32 → 0.0.51 and OpenShell 0.0.31 → 0.0.44 on the same system. The `NEMOCLAW_ACCEPT_EXPERIMENTAL_OPENSHELL_UPGRADE=1` flag was required for the OpenShell binary swap (see related experience: pre-0.0.37 → 0.0.44 upgrade requires experimental flag in the installer). The NemoClaw upgrade itself was clean; this credential injection failure appears to be specific to the messaging channel add + sandbox rebuild path in 0.0.51.

---

### Labels (suggested)

`bug`, `slack`, `credentials`, `onboard`, `sandbox-rebuild`

---

*End of issue draft.*  
*Return to: `RUNBOOK-OPENCLAW-AGENT-IN-NEMOCLAW-COLIMA.md` §Common Pitfall #7 — credential injection bug (OQ-1)*

---

*Verification on clean-slate sterling-secure spin-up pending; will update with results.*

	Expected	Actual
`/sandbox/.openclaw/openclaw.json` `botToken` field	Actual `xoxb-...` token	`"xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN"` (literal placeholder)
`/sandbox/.openclaw/openclaw.json` `appToken` field	Actual `xapp-...` token	`"xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN"` (literal placeholder)
`SLACK_BOT_TOKEN` in openclaw process env	`xoxb-...` injected by OpenShell	Not set — empty/absent
`SLACK_APP_TOKEN` in openclaw process env	`xapp-...` injected by OpenShell	Not set — empty/absent
Bolt startup	`auth.test` returns `ok: true`	Fails with `invalid_auth` or never starts
Slack DM delivery	Agent responds to allowlisted user DMs	Silent — agent unreachable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Slack credential not injected into sandbox env after messaging-provider rebuild (NemoClaw 0.0.51, OpenShell 0.0.44) #4274

Summary

Environment

Steps to Reproduce

Expected vs Actual Behavior

Diagnostic Commands

Suspected Cause

How credential injection is designed to work

Where this breaks in the observed failure

Workarounds Tried

Unconfirmed Status

Additional Context

Messaging channel add = interactive `nemoclaw onboard` only (UX gap)

`request_body_credential_rewrite` interaction

Version compatibility note

Labels (suggested)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Version
NemoClaw CLI	0.0.51 (installed via official installer `https://www.nvidia.com/nemoclaw.sh`)
OpenShell	0.0.44
OpenClaw (inside sandbox)	2026.5.18+
Host OS	macOS 15 (aarch64 — Apple Silicon M4)
Container runtime	Colima (Docker socket at `~/.colima/docker.sock`)
Docker engine	Colima-provided (Linux VM)
Sandbox name	`cruz-secure`
Agent type	OpenClaw (JS/TS runtime, NOT Hermes-agent)
Inference provider	Anthropic (`anthropic/claude-opus-4-7`)

[bug] Slack credential not injected into sandbox env after messaging-provider rebuild (NemoClaw 0.0.51, OpenShell 0.0.44) #4274

Description

Summary

Environment

Steps to Reproduce

Expected vs Actual Behavior

Diagnostic Commands

Suspected Cause

How credential injection is designed to work

Where this breaks in the observed failure

Workarounds Tried

Unconfirmed Status

Additional Context

Messaging channel add = interactive nemoclaw onboard only (UX gap)

request_body_credential_rewrite interaction

Version compatibility note

Labels (suggested)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Messaging channel add = interactive `nemoclaw onboard` only (UX gap)

`request_body_credential_rewrite` interaction