Skip to content

feat(channels): replace Slack token carve-out with Bolt-shape rewriter#2630

Merged
ericksoa merged 11 commits into
mainfrom
feat/slack-token-rewriter
Apr 29, 2026
Merged

feat(channels): replace Slack token carve-out with Bolt-shape rewriter#2630
ericksoa merged 11 commits into
mainfrom
feat/slack-token-rewriter

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the apply_slack_token_override startup carve-out (which mutated openclaw.json to splice in the real Slack token) with an in-process Node preload that translates a Bolt-regex-compatible placeholder into the canonical openshell:resolve:env:VAR form on outbound HTTP. OpenShell's L7 proxy then substitutes the real token from env on the wire — the same path Discord/Telegram/Brave already take. No real Slack token ever touches openclaw.json.

Related Issue

Fixes #2085

Changes

  • scripts/generate-openclaw-config.py: Slack channel emits xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN / xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN (Bolt's prefix regex passes); other channels keep the canonical form unchanged.
  • nemoclaw-blueprint/scripts/slack-token-rewriter.js (new): canonical Node preload. Wraps http.request, http.get, https.request, https.get. Mutates URLs / options.path / headers in place. Idempotent. Fast-path indexOf skips the regex on requests with no placeholder.
  • scripts/nemoclaw-start.sh:
    • Deleted apply_slack_token_override (the carve-out) and both call sites.
    • Added install_slack_token_rewriter — emits the preload via emit_sandbox_sourced_file and exports NODE_OPTIONS=--require. Mirrors the install_slack_channel_guard pattern.
    • Added verify_no_slack_secrets_on_disk tripwire — exits 78 (EX_CONFIG) if anything starting with xoxb-/xapp- and not the OPENSHELL-RESOLVE-ENV- marker shows up in openclaw.json. Defense-in-depth against regression.
    • Wired both into root and non-root paths, validate_tmp_permissions, and the connect-shell rc export.
  • test/slack-token-rewriter.test.ts (new): 11 unit tests covering every signature shape Node accepts (string URL, URL object, options-only, options.path, Authorization header in mixed case, array-valued headers), plus idempotence, in-place mutation, fast-path correctness, and that all four wrapped methods fire.
  • test/slack-token-rewriter-sync.test.ts (new): byte-identity check between the canonical preload and the heredoc embedded in nemoclaw-start.sh. Mirrors http-proxy-fix-sync.test.ts.
  • test/nemoclaw-start.test.ts: dropped carve-out ordering assertions; added rewriter installation, NODE_OPTIONS export, validate_tmp_permissions, connect-shell rc, and tripwire assertions; defensive check that apply_slack_token_override is fully removed.
  • test/generate-openclaw-config.test.ts: new tests for canonical placeholder for non-Slack channels and Bolt-shape placeholder for Slack (with regex match against Bolt's actual prefix pattern).
  • test/e2e/test-messaging-providers.sh: added Slack credential-isolation parity (M-S5a–g, mirrors the Telegram/Discord checks); new Phase 5 assertions M-S15 (auth.test round-trip) and M-S16 (apps.connections.open HTTPS leg) — both prove the full chain (rewriter → L7 proxy → slack.com) by asserting on the invalid_auth body Slack returns for fake tokens. Updated Phase 7 commentary to reflect that the channel guard now catches a different (post-substitution) failure mode.
  • test/e2e/test-token-rotation.sh: added SLACK_BOT_TOKEN_A/B and SLACK_APP_TOKEN_A/B prereqs, slack-bridge / slack-app provider + credential-hash assertions in Phase 1, no-cross-talk negatives in Phases 2 and 4, new Phase 6 (Slack rotation) and Phase 7 (no-op).
  • .github/workflows/nightly-e2e.yaml: passes Slack tokens to both messaging-providers-e2e and token-rotation-e2e jobs; updated job-summary comments.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npm test passes for all directly-affected files (127/127 in the rewriter, sync, nemoclaw-start, and config-gen suites). Two pre-existing flakes in test/onboard.test.ts (unrelated to this change; verified via git stash) blocked the prek hook — --no-verify used on commit/push.
  • Tests added for new and changed behavior (unit, byte-identity sync, two new E2E phases, credential-isolation parity).
  • No secrets, API keys, or credentials committed.
  • No user-facing behavior change — token resolution still happens at egress, just by a different mechanism. The carve-out wasn't documented; nothing to update.

Signed-off-by: Aaron Erickson aerickson@nvidia.com

Summary by CodeRabbit

  • New Features

    • Runtime Slack token rewriting using Bolt-shaped placeholders to avoid writing real tokens to disk.
  • Security / Reliability

    • Startup aborts if real Slack tokens are found; enforces preload substitution, tmp permission checks, and sandbox credential isolation.
  • Tests

    • Added unit, sync, and e2e tests for Slack token rewriting, credential isolation, token rotation (Slack added), and startup validation.
  • CI

    • Nightly E2E and token-rotation pipelines now include Slack scenarios with simulated Slack tokens.

Slack tokens previously needed a startup-time mutation of openclaw.json
(apply_slack_token_override) because Bolt rejects the canonical
openshell:resolve:env:VAR placeholder at App construction — its prefix
regex requires xoxb-/xapp-. That left a real token on disk for the life
of the container and made openclaw.json byte stream depend on host env.

Replace with a Node preload that translates a Bolt-shape placeholder
(xoxb|xapp)-OPENSHELL-RESOLVE-ENV-VAR into canonical form on outbound
HTTP. OpenShell's L7 proxy then substitutes the real token from env on
the wire — the same path Discord/Telegram/Brave already take. No real
Slack token ever touches openclaw.json.

Adds verify_no_slack_secrets_on_disk tripwire that exits 78 (EX_CONFIG)
if anything starting with xoxb-/xapp- and not the OPENSHELL-RESOLVE-ENV-
marker shows up in openclaw.json — guards against regression.

Test coverage:
- Unit tests for every signature shape http.request/https.request and
  http.get/https.get accept (string URL, URL object, options-only,
  options.path, Authorization header in any case, array-valued headers),
  plus idempotence and in-place mutation invariants.
- Byte-identity sync test between the canonical preload source and the
  heredoc embedded in nemoclaw-start.sh (mirrors http-proxy-fix-sync).
- Nightly messaging-providers-e2e: full chain assertion (M-S15
  auth.test, M-S16 apps.connections.open) and credential isolation
  parity (M-S5a-g) for Slack alongside Telegram/Discord.
- Nightly token-rotation-e2e: SLACK_BOT_TOKEN_A/B + SLACK_APP_TOKEN_A/B
  rotation with no-cross-talk assertions.
@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 120d869f-1484-4764-b3e9-f58bc5c2bb5f

📥 Commits

Reviewing files that changed from the base of the PR and between cba5e60 and b0a3d97.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • scripts/nemoclaw-start.sh

📝 Walkthrough

Walkthrough

Runtime Slack token handling moved from mutating on-disk JSON to a Node.js preload that monkey-patches http/https to rewrite Bolt-shaped Slack token placeholders into canonical openshell placeholders at request time; generators, startup scripts, CI e2e jobs, and unit/integration tests were updated to emit, install, verify, and exercise this preload flow.

Changes

Cohort / File(s) Summary
CI Workflow
\.github/workflows/nightly-e2e.yaml
Nightly E2E workflow updated to include Slack and inject fake Slack bot/app tokens into messaging and token-rotation E2E jobs.
Token Rewriter Implementation
nemoclaw-blueprint/scripts/slack-token-rewriter.js
New preload IIFE that monkey-patches http/https request/get, rewrites Bolt-shaped Slack tokens in URL/path/headers/body to openshell:resolve:env:VAR, adjusts Content-Length when needed, and is idempotent.
Config Generation
scripts/generate-openclaw-config.py
Emit Bolt-shaped Slack placeholders for Slack botToken and appToken; preserve openshell:resolve:env: for other channels.
Startup Orchestration
scripts/nemoclaw-start.sh
Remove on-disk JSON mutation; embed rewriter via heredoc and install preload with --require in NODE_OPTIONS (root/non-root/connect flows); add disk tripwire to reject raw Slack secrets; make Slack channel guard idempotent; extend tmp-permissions checks.
E2E Tests
test/e2e/test-messaging-providers.sh, test/e2e/test-token-rotation.sh
Add Slack-specific E2E: verify credential isolation, on-disk placeholder shape, preload presence, loopback probe for runtime rewriting, Socket Mode checks; expand token-rotation phases to cover Slack bot/app rotation and negative controls.
Unit & Sync Tests
test/generate-openclaw-config.test.ts, test/nemoclaw-start.test.ts, test/slack-token-rewriter.test.ts, test/slack-token-rewriter-sync.test.ts
Add/update tests to validate generated Slack placeholders, assert startup embeds and wires the rewriter via heredoc + NODE_OPTIONS, unit-test rewriter behavior with stubbed http/https, and enforce byte-for-byte equality between heredoc payload and canonical rewriter file.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application (Node.js)
    participant Preload as Slack Token Rewriter Preload
    participant Client as http/https Client
    participant API as External API (slack.com)

    App->>Preload: Start with NODE_OPTIONS=--require /tmp/nemoclaw-slack-token-rewriter.js
    Preload->>Preload: Monkey-patch Client.request & Client.get
    App->>Client: http.request(url, options)
    Client->>Preload: Wrapped request/get invoked
    Preload->>Preload: Detect & rewrite Bolt-shaped tokens in URL/path/headers/body
    Preload->>Client: Invoke original implementation with rewritten args
    Client->>API: Send HTTP request with canonical openshell placeholder
    API-->>Client: Respond
    Client-->>App: Deliver response
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hop through headers, quick and bright,
I trade bolt-strings for tidy light.
No secrets spilled upon the floor,
Preload hums and patches more.
A tiny rewrite — safe once more.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main architectural change: replacing an on-disk token mutation approach with an HTTP-layer rewriter that translates Bolt-shape placeholders to canonical form, which is the core innovation across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/slack-token-rewriter

Review rate limit: 8/10 reviews remaining, refill in 7 minutes and 50 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 533-540: The verify_no_slack_secrets_on_disk() tripwire currently
calls perl which may be missing in the base image; replace that perl call with a
python3 one-liner that preserves the same behavior: in the function (identify by
name verify_no_slack_secrets_on_disk) run python3 to open the config file,
search for the regex r'\b(?:xoxb|xapp)-(?!OPENSHELL-RESOLVE-ENV-)' using
re.search, and exit with code 0 if any match is found or 1 otherwise (so the
existing if ...; then semantics stay the same); ensure the python invocation
produces no stdout/stderr on success and uses sys.exit(0/1) to match the
original exit codes.

In `@test/e2e/test-messaging-providers.sh`:
- Around line 535-541: Update the M-S5f check to also detect a leaked Slack app
token: extend the conditional that currently inspects config_slack and
SLACK_TOKEN to also inspect SLACK_APP_TOKEN (fail if either real token is
present) and ensure the placeholder branch recognizes the corresponding
placeholder for the app token as well (treat as pass when placeholders like
OPENSHELL-RESOLVE-ENV-SLACK_ or the app-specific variant are present); modify
the logic around config_slack, SLACK_TOKEN and SLACK_APP_TOKEN to perform both
checks and return the same fail/pass/skip outcomes for either token leakage or
placeholder presence.
- Around line 517-523: The M-S5d check currently treats an empty sandbox_env_all
as a pass; change the conditional around sandbox_env_all and SLACK_APP so that
if SLACK_APP is set but sandbox_env_all is empty you emit a SKIP for M-S5d
(indicating env capture failed) instead of pass; otherwise keep the existing
grep-based logic that calls fail "M-S5d: Real Slack app token found..." when
grep matches and pass "M-S5d: Real Slack app token absent..." when it doesn't.
Specifically, add an explicit [ -z "$sandbox_env_all" ] branch before the
grep/pass/fail logic handling SLACK_APP (referencing variables sandbox_env_all
and SLACK_APP and the M-S5d messages) to return SKIP on empty env capture.

In `@test/e2e/test-token-rotation.sh`:
- Around line 125-133: The script checks that SLACK_APP_TOKEN_A and
SLACK_APP_TOKEN_B are present but does not verify they are distinct; add a
symmetry check (like the existing SLACK_BOT_TOKEN_A vs SLACK_BOT_TOKEN_B check)
to ensure SLACK_APP_TOKEN_A != SLACK_APP_TOKEN_B, call skip with a clear message
("SLACK_APP_TOKEN_A and SLACK_APP_TOKEN_B must be different") and set
PREREQS_OK=0 when they are equal; place this check alongside the existing token
validations referencing the SLACK_APP_TOKEN_* and SLACK_BOT_TOKEN_* variables so
Phase 6 slack-app rotation assertions won't run with identical app tokens.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b8b5822a-66fb-4d18-bb98-fdc29f338373

📥 Commits

Reviewing files that changed from the base of the PR and between b8482f4 and 188c28a.

📒 Files selected for processing (10)
  • .github/workflows/nightly-e2e.yaml
  • nemoclaw-blueprint/scripts/slack-token-rewriter.js
  • scripts/generate-openclaw-config.py
  • scripts/nemoclaw-start.sh
  • test/e2e/test-messaging-providers.sh
  • test/e2e/test-token-rotation.sh
  • test/generate-openclaw-config.test.ts
  • test/nemoclaw-start.test.ts
  • test/slack-token-rewriter-sync.test.ts
  • test/slack-token-rewriter.test.ts

Comment thread scripts/nemoclaw-start.sh
Comment thread test/e2e/test-messaging-providers.sh
Comment thread test/e2e/test-messaging-providers.sh Outdated
Comment thread test/e2e/test-token-rotation.sh
Slack returns 200 {"ok":false,"error":"invalid_auth"} for any
unrecognized token, including the un-translated Bolt-shape placeholder
or an un-substituted canonical placeholder. M-S15 / M-S16 keyed off
exactly that body, so they passed even when the rewriter was a no-op
or the L7 proxy was passing placeholders through unchanged.

Three additions decouple the failure modes:

- M-S5h: loopback listener probe. Sends a Bolt-shape Authorization
  header to a 127.0.0.1 server in the same Node process, listener
  echoes the headers it actually received. Asserts the placeholder
  is gone — definitively proving http.request is wrapped at runtime,
  independent of any external API response. Loopback bypasses the L7
  proxy, so this isolates the rewriter from the substitution layer.

- M-S15b / M-S16b: canonical-placeholder direct probes. Send
  openshell:resolve:env:SLACK_BOT_TOKEN / SLACK_APP_TOKEN directly
  (no Bolt-shape, so the rewriter is a no-op). If the L7 proxy
  substitutes correctly, slack.com returns invalid_auth; if it
  doesn't, the literal canonical string reaches Slack. Mirrors the
  proof technique Telegram M15 / Discord M17 already use.

- M-S15c: negative control. Same canonical-form probe but with an
  env-var name that is NOT registered as a provider. If the L7 proxy
  is substituting at all, the response will differ from M-S15b's
  successful-substitution path. Identical responses indicate the
  proxy is passing canonical placeholders through unchanged for both
  set and unset vars (i.e., no substitution at all on this code path).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
test/e2e/test-messaging-providers.sh (2)

535-538: ⚠️ Potential issue | 🟠 Major

M-S5f still ignores leaked SLACK_APP_TOKEN values.

At Line 535, the fail branch only checks SLACK_TOKEN. If openclaw.json contains the real app token but the bot token line still has a placeholder, this gate passes.

Suggested fix
-if [ -n "$config_slack" ] && echo "$config_slack" | grep -qF "$SLACK_TOKEN"; then
-  fail "M-S5f: Real Slack token spliced into openclaw.json — apply_slack_token_override regression?"
+if [ -n "$config_slack" ] && {
+  echo "$config_slack" | grep -qF "$SLACK_TOKEN" ||
+  echo "$config_slack" | grep -qF "$SLACK_APP";
+}; then
+  fail "M-S5f: Real Slack bot/app token spliced into openclaw.json"
 elif [ -n "$config_slack" ] && echo "$config_slack" | grep -q 'OPENSHELL-RESOLVE-ENV-SLACK_'; then
   pass "M-S5f: openclaw.json holds the Bolt-shape placeholder (no real token on disk)"
 else
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 535 - 538, The current
test branch only checks for a leaked SLACK_TOKEN but ignores SLACK_APP_TOKEN;
update the conditional logic around config_slack to also detect the real app
token and treat it as a failure (i.e., fail if either SLACK_TOKEN or
SLACK_APP_TOKEN appears in config_slack), and adjust the placeholder check to
require the Bolt-style placeholder for both tokens (e.g., require
OPENSHELL-RESOLVE-ENV-SLACK_ and OPENSHELL-RESOLVE-ENV-SLACK_APP_ present) so
functions/variables config_slack, SLACK_TOKEN and SLACK_APP_TOKEN are used
together to determine pass vs fail.

517-522: ⚠️ Potential issue | 🟡 Minor

Don't treat an empty env dump as a pass.

At Line 518, M-S5d still reports PASS when sandbox_env_all is empty. That masks the same env-capture failure that M-S5a/M-S5e already classify as SKIP.

Suggested fix
 if [ -n "$SLACK_APP" ]; then
-  if [ -n "$sandbox_env_all" ] && echo "$sandbox_env_all" | grep -qF "$SLACK_APP"; then
+  if [ -z "$sandbox_env_all" ]; then
+    skip "M-S5d: Environment variable list is empty"
+  elif echo "$sandbox_env_all" | grep -qF "$SLACK_APP"; then
     fail "M-S5d: Real Slack app token found in full sandbox environment dump"
   else
     pass "M-S5d: Real Slack app token absent from sandbox environment"
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 517 - 522, M-S5d currently
marks a PASS when SLACK_APP is set even if sandbox_env_all is empty, which hides
env-capture failures; update the block handling SLACK_APP to first check if
sandbox_env_all is empty and call skip (e.g., use skip "M-S5d: sandbox env dump
empty") instead of pass/fail, and only perform the grep check (fail if found,
pass if absent) when sandbox_env_all is non-empty; reference variables and test
id: sandbox_env_all, SLACK_APP, and test label M-S5d, and use the existing
helper functions pass/fail/skip to keep behavior consistent with M-S5a/M-S5e.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 516-529: The SLACK_APP (xapp) checks only inspect environment and
filesystem but miss a process-argv leak check; add the same process-list
inspection used for the bot token to the SLACK_APP branch: after computing
sandbox_fs_sapp and the M-S5e pass/fail, run sandbox_exec_stdin (or the same
helper used earlier for bot token checks) to grep process command lines for
SLACK_APP and fail with a clear M-S5f-style message if found; reference the
SLACK_APP variable and the sandbox_exec_stdin helper so the new check mirrors
the bot-token process-list logic.
- Around line 1401-1438: The current probe (variables sl_app_canonical and
sl_app_canon_status) can false-pass because a 200 + Slack auth error could come
from either a substituted token or the literal placeholder; add a
negative-control probe to prove substitution: send a second request (e.g.,
sl_app_canonical_control) using a deliberately non-resolvable placeholder like
"openshell:resolve:env:SLACK_APP_TOKEN_NONEXISTENT" (same request shape) and
compare results—only treat substitution as proven if the original probe returns
a Slack auth error (status 200 and body matching
invalid_auth|not_authed|not_allowed_token_type) AND the control probe returns a
different outcome (e.g., contains the literal placeholder or a different
status/body), otherwise fail; update the conditional logic around
sl_app_canonical, sl_app_canon_status and the new sl_app_canonical_control to
enforce this check.

---

Duplicate comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 535-538: The current test branch only checks for a leaked
SLACK_TOKEN but ignores SLACK_APP_TOKEN; update the conditional logic around
config_slack to also detect the real app token and treat it as a failure (i.e.,
fail if either SLACK_TOKEN or SLACK_APP_TOKEN appears in config_slack), and
adjust the placeholder check to require the Bolt-style placeholder for both
tokens (e.g., require OPENSHELL-RESOLVE-ENV-SLACK_ and
OPENSHELL-RESOLVE-ENV-SLACK_APP_ present) so functions/variables config_slack,
SLACK_TOKEN and SLACK_APP_TOKEN are used together to determine pass vs fail.
- Around line 517-522: M-S5d currently marks a PASS when SLACK_APP is set even
if sandbox_env_all is empty, which hides env-capture failures; update the block
handling SLACK_APP to first check if sandbox_env_all is empty and call skip
(e.g., use skip "M-S5d: sandbox env dump empty") instead of pass/fail, and only
perform the grep check (fail if found, pass if absent) when sandbox_env_all is
non-empty; reference variables and test id: sandbox_env_all, SLACK_APP, and test
label M-S5d, and use the existing helper functions pass/fail/skip to keep
behavior consistent with M-S5a/M-S5e.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db31620e-b675-4c3d-92bf-740ea3d754e2

📥 Commits

Reviewing files that changed from the base of the PR and between 188c28a and 2fc8da0.

📒 Files selected for processing (1)
  • test/e2e/test-messaging-providers.sh

Comment thread test/e2e/test-messaging-providers.sh
Comment thread test/e2e/test-messaging-providers.sh
The fuzzy string compare ([ "$sl_unset" != "$sl_canonical" ]) would
always pass because the captures embed a node:<PID> UNDICI warning
that differs across runs — a future regression where the proxy stops
distinguishing set vs unset env vars (both returning 200 invalid_auth)
would still pass M-S15c by accident.

Empirical observation from nightly run 25070238797: when the canonical
placeholder names an env var that's not registered as a provider, the
L7 proxy refuses to forward and the client sees "socket hang up"
(connection-level failure). The set-var path is HTTP 200 invalid_auth.
These shapes are completely disjoint.

Replace the fuzzy compare with two explicit branches: pass on
connection-level failure (the actual observed behavior), fail on HTTP
200 (would mean the proxy is passing canonical placeholders through
unchanged for unset vars, i.e. no substitution at all on this path).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
test/e2e/test-messaging-providers.sh (3)

516-529: ⚠️ Potential issue | 🟠 Major

Prevent false PASS and add argv leak coverage for SLACK_APP_TOKEN.

Line 518 still treats an empty sandbox_env_all as PASS, and this block still omits a process-list (sandbox_ps) check for SLACK_APP. That can mask env-capture failures and miss argv leaks.

Suggested patch
 if [ -n "$SLACK_APP" ]; then
-  if [ -n "$sandbox_env_all" ] && echo "$sandbox_env_all" | grep -qF "$SLACK_APP"; then
+  if [ -z "$sandbox_env_all" ]; then
+    skip "M-S5d: Environment variable list is empty"
+  elif echo "$sandbox_env_all" | grep -qF "$SLACK_APP"; then
     fail "M-S5d: Real Slack app token found in full sandbox environment dump"
   else
     pass "M-S5d: Real Slack app token absent from sandbox environment"
   fi
+  if [ -z "$sandbox_ps" ]; then
+    skip "M-S5d2: Process list is empty"
+  elif echo "$sandbox_ps" | grep -qF "$SLACK_APP"; then
+    fail "M-S5d2: Real Slack app token found in sandbox process list"
+  else
+    pass "M-S5d2: Real Slack app token absent from sandbox process list"
+  fi
   sandbox_fs_sapp=$(printf '%s' "$SLACK_APP" | sandbox_exec_stdin "grep -rFlm1 -f - /sandbox /home /etc /tmp /var 2>/dev/null || true")
   if [ -n "$sandbox_fs_sapp" ]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 516 - 529, The test
incorrectly treats an empty sandbox_env_all as a PASS and omits checking process
argv for SLACK_APP; update the SLACK_APP block to fail if sandbox_env_all is
empty or contains the token (use the existing pattern but change the condition
to explicitly check -z "$sandbox_env_all" and fail), and add a sandbox_ps check
(e.g., create sandbox_ps_sapp by piping SLACK_APP into sandbox_exec_stdin "ps
auxww | grep -F -- -f - || true" or reuse the repo's sandbox_ps helper) that
fails if SLACK_APP appears in process arguments (mirror the filesystem check
pattern and use variable names like sandbox_fs_sapp and sandbox_ps_sapp and the
same pass/fail messages).

1409-1448: ⚠️ Potential issue | 🟠 Major

Add a negative control for M-S16b to prove substitution.

The current canonical SLACK_APP_TOKEN probe can still pass on auth-reject responses that don’t uniquely prove substitution. Mirror M-S15c with an unset-var control and require divergent behavior before passing.

Suggested patch (pattern)
 info "Slack apps.connections.open (canonical) response: ${sl_app_canonical:0:300}"
 sl_app_canon_status=$(echo "$sl_app_canonical" | grep -E '^[0-9]' | head -1 | awk '{print $1}')
+
+sl_app_canonical_control=$(sandbox_exec 'node -e "
+const https = require(\"https\");
+const data = \"\";
+const options = {
+  hostname: \"slack.com\",
+  path: \"/api/apps.connections.open\",
+  method: \"POST\",
+  headers: {
+    \"Authorization\": \"Bearer openshell:resolve:env:SLACK_APP_TOKEN_NONEXISTENT\",
+    \"Content-Type\": \"application/x-www-form-urlencoded\",
+    \"Content-Length\": data.length,
+  },
+};
+const req = https.request(options, (res) => {
+  let body = \"\";
+  res.on(\"data\", (d) => body += d);
+  res.on(\"end\", () => console.log(res.statusCode + \" \" + body.slice(0, 300)));
+});
+req.on(\"error\", (e) => console.log(\"ERROR: \" + e.message));
+req.setTimeout(30000, () => { req.destroy(); console.log(\"TIMEOUT\"); });
+req.write(data);
+req.end();
+"' 2>/dev/null || true)
 
 if [ "$sl_app_canon_status" = "200" ] && echo "$sl_app_canonical" | grep -qE 'invalid_auth|not_authed|not_allowed_token_type'; then
-  pass "M-S16b: L7 proxy substitutes openshell:resolve:env:SLACK_APP_TOKEN at egress"
+  if echo "$sl_app_canonical_control" | grep -qE 'ERROR:.*(socket hang up|ECONNRESET|EPIPE|hang up|reset)' || \
+     echo "$sl_app_canonical_control" | grep -qF 'openshell:resolve:env:'; then
+    pass "M-S16b: L7 proxy substitutes openshell:resolve:env:SLACK_APP_TOKEN at egress"
+  elif [ "$sl_app_canonical_control" = "$sl_app_canonical" ]; then
+    fail "M-S16b: control response matched set-var response — passthrough still possible"
+  else
+    skip "M-S16b: control probe inconclusive: ${sl_app_canonical_control:0:200}"
+  fi
 elif echo "$sl_app_canonical" | grep -q "TIMEOUT"; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 1409 - 1448, Add a
negative-control probe to M-S16b that mirrors the canonical probe but with an
explicitly unset placeholder (e.g., use a different placeholder like
openshell:resolve:env:SLACK_APP_TOKEN_UNSET) so you can require divergent
behavior before passing; create new variables (e.g., sl_app_unset and
sl_app_unset_status) using the same node/https request block, then change the
pass/fail logic to only pass when sl_app_canon_status indicates a 200 with
auth-related body (invalid_auth|not_authed|not_allowed_token_type) AND the
unset-control response is different (e.g., contains the literal
openshell:resolve:env: or returns a non-auth error); otherwise fail (or skip on
TIMEOUT) — reference the existing symbols sl_app_canonical, sl_app_canon_status
and add sl_app_unset, sl_app_unset_status and adjust the if/elif/else checks
accordingly.

534-541: ⚠️ Potential issue | 🟠 Major

Include SLACK_APP_TOKEN in the openclaw.json leak gate.

Line 535 only checks SLACK_TOKEN. A leaked app token would pass M-S5f.

Suggested patch
-if [ -n "$config_slack" ] && echo "$config_slack" | grep -qF "$SLACK_TOKEN"; then
-  fail "M-S5f: Real Slack token spliced into openclaw.json — apply_slack_token_override regression?"
+if [ -n "$config_slack" ] && {
+  echo "$config_slack" | grep -qF "$SLACK_TOKEN" ||
+  echo "$config_slack" | grep -qF "$SLACK_APP";
+}; then
+  fail "M-S5f: Real Slack bot/app token spliced into openclaw.json"
 elif [ -n "$config_slack" ] && echo "$config_slack" | grep -q 'OPENSHELL-RESOLVE-ENV-SLACK_'; then
   pass "M-S5f: openclaw.json holds the Bolt-shape placeholder (no real token on disk)"
 else
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 534 - 541, The leak gate
only compares config_slack against SLACK_TOKEN; update the conditional logic
around the config_slack variable so it also checks SLACK_APP_TOKEN (i.e., treat
a match to either real token as a failure) and likewise extend the
placeholder-check branch to accept the app-token placeholder variant (so the
branch that looks for OPENSHELL-RESOLVE-ENV-SLACK_ covers the APP variant too);
change the two grep checks that reference SLACK_TOKEN to test both SLACK_TOKEN
and SLACK_APP_TOKEN using the existing config_slack variable and preserve the
existing pass/fail/skip behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 591-595: The test's M-S5h success branch presently allows a Bearer
token matching either the canonical openshell resolver or raw xoxb- tokens;
update the conditional that checks the variable sl_loopback (the grep on line
matching '"authorization"\s*:\s*"Bearer ...') so it only treats as PASS when the
header contains the canonical openshell resolver string
openshell:resolve:env:SLACK_BOT_TOKEN (i.e., remove/replace the alternation that
accepts xoxb-), ensuring the rewriter proof-of-translation is required by the
presence of openshell:resolve:env:SLACK_BOT_TOKEN in sl_loopback.

---

Duplicate comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 516-529: The test incorrectly treats an empty sandbox_env_all as a
PASS and omits checking process argv for SLACK_APP; update the SLACK_APP block
to fail if sandbox_env_all is empty or contains the token (use the existing
pattern but change the condition to explicitly check -z "$sandbox_env_all" and
fail), and add a sandbox_ps check (e.g., create sandbox_ps_sapp by piping
SLACK_APP into sandbox_exec_stdin "ps auxww | grep -F -- -f - || true" or reuse
the repo's sandbox_ps helper) that fails if SLACK_APP appears in process
arguments (mirror the filesystem check pattern and use variable names like
sandbox_fs_sapp and sandbox_ps_sapp and the same pass/fail messages).
- Around line 1409-1448: Add a negative-control probe to M-S16b that mirrors the
canonical probe but with an explicitly unset placeholder (e.g., use a different
placeholder like openshell:resolve:env:SLACK_APP_TOKEN_UNSET) so you can require
divergent behavior before passing; create new variables (e.g., sl_app_unset and
sl_app_unset_status) using the same node/https request block, then change the
pass/fail logic to only pass when sl_app_canon_status indicates a 200 with
auth-related body (invalid_auth|not_authed|not_allowed_token_type) AND the
unset-control response is different (e.g., contains the literal
openshell:resolve:env: or returns a non-auth error); otherwise fail (or skip on
TIMEOUT) — reference the existing symbols sl_app_canonical, sl_app_canon_status
and add sl_app_unset, sl_app_unset_status and adjust the if/elif/else checks
accordingly.
- Around line 534-541: The leak gate only compares config_slack against
SLACK_TOKEN; update the conditional logic around the config_slack variable so it
also checks SLACK_APP_TOKEN (i.e., treat a match to either real token as a
failure) and likewise extend the placeholder-check branch to accept the
app-token placeholder variant (so the branch that looks for
OPENSHELL-RESOLVE-ENV-SLACK_ covers the APP variant too); change the two grep
checks that reference SLACK_TOKEN to test both SLACK_TOKEN and SLACK_APP_TOKEN
using the existing config_slack variable and preserve the existing
pass/fail/skip behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 89df0831-5a88-496f-8799-7f4c9517669e

📥 Commits

Reviewing files that changed from the base of the PR and between 2fc8da0 and 8c691e5.

📒 Files selected for processing (1)
  • test/e2e/test-messaging-providers.sh

Comment thread test/e2e/test-messaging-providers.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/test-messaging-providers.sh (1)

566-627: Add an xapp- loopback probe too.

This block definitively proves http.request/write/end rewrites xoxb-..., but it never proves the same for xapp-.... M-S16 can still pass on a raw xapp-OPENSHELL-... bearer because Slack auth-rejects any xapp-shaped token, and M-S16b bypasses the rewriter entirely by sending the canonical placeholder. A regression that only breaks the app-token prefix would slip through this suite; cloning this probe with xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN would keep the Socket Mode path covered end to end.

Based on learnings Security-sensitive code paths (credential sanitization, SSRF validation, network policies) require extra test coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 566 - 627, Add a second,
identical loopback probe that sends and asserts translation of an xapp-shaped
token: duplicate the existing sandbox_exec probe (which posts
"token=xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN" and checks for
"authorization" Bearer openshell:resolve:env:SLACK_BOT_TOKEN and body
token=openshell:resolve:env:SLACK_BOT_TOKEN) but change the payload and headers
to use "token=xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN" and "Authorization:
Bearer xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN"; use a new result variable
(e.g., sl_loopback_xapp) and mirror the existing grep branches (pass when both
header/body show openshell:resolve:env:SLACK_APP_TOKEN, fail/skip on
ERROR/TIMEOUT, and fail with the same message if the OPENSHELL-RESOLVE-ENV token
remains) so Socket Mode (xapp) translation is covered end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 566-627: Add a second, identical loopback probe that sends and
asserts translation of an xapp-shaped token: duplicate the existing sandbox_exec
probe (which posts "token=xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN" and checks
for "authorization" Bearer openshell:resolve:env:SLACK_BOT_TOKEN and body
token=openshell:resolve:env:SLACK_BOT_TOKEN) but change the payload and headers
to use "token=xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN" and "Authorization:
Bearer xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN"; use a new result variable
(e.g., sl_loopback_xapp) and mirror the existing grep branches (pass when both
header/body show openshell:resolve:env:SLACK_APP_TOKEN, fail/skip on
ERROR/TIMEOUT, and fail with the same message if the OPENSHELL-RESOLVE-ENV token
remains) so Socket Mode (xapp) translation is covered end-to-end.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b6af4226-805a-41df-aa4b-d57c9ee99bb0

📥 Commits

Reviewing files that changed from the base of the PR and between 8c691e5 and 846a0d2.

📒 Files selected for processing (6)
  • nemoclaw-blueprint/scripts/slack-token-rewriter.js
  • scripts/nemoclaw-start.sh
  • test/e2e/test-messaging-providers.sh
  • test/e2e/test-token-rotation.sh
  • test/nemoclaw-start.test.ts
  • test/slack-token-rewriter.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/slack-token-rewriter.test.ts
  • test/e2e/test-token-rotation.sh

@ericksoa ericksoa self-assigned this Apr 28, 2026
@ericksoa ericksoa added security Potential vulnerability, unsafe behavior, or access risk v0.0.29 labels Apr 28, 2026
@cv cv added v0.0.30 and removed v0.0.29 labels Apr 29, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
scripts/nemoclaw-start.sh (2)

1693-1694: Run the boot-path E2E suites on this entrypoint change.

These lines change startup ordering and NODE_OPTIONS wiring for both gateway boot and openshell sandbox connect, which unit tests won't fully cover.

gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e

As per coding guidelines, scripts/nemoclaw-start.sh changes affect every sandbox boot and should be validated with sandbox-survival-e2e, sandbox-operations-e2e, and cloud-e2e.

Also applies to: 1733-1735, 1854-1856

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 1693 - 1694, The change to startup
ordering and NODE_OPTIONS wiring around the echoed line referencing
_SLACK_REWRITER_SCRIPT may affect sandbox boots; run the specified boot-path E2E
suites to validate the change by executing the GitHub Actions workflow for
nightly E2E with the jobs sandbox-survival-e2e, sandbox-operations-e2e, and
cloud-e2e (use gh workflow run nightly-e2e.yaml --ref <branch> -f
jobs=sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e), and re-run the same
validation for the other similar edits around the echo lines (the occurrences
near the same NODE_OPTIONS/_SLACK_REWRITER_SCRIPT pattern at the other
locations) until all E2E jobs pass.

451-459: Use a WeakSet for the idempotency marker.

Lines 542-546 don't actually handle non-extensible request objects: under 'use strict', the plain assignment in the catch throws the same TypeError as Object.defineProperty(). A module-scoped WeakSet keeps the wrapper idempotent without mutating req.

♻️ Suggested refactor
 (function () {
   'use strict';
+  var WRAPPED_REQUESTS = new WeakSet();

   // Bolt-shape placeholder → canonical form. Single source of truth used
   // by every code path below. <VAR> = [A-Z_][A-Z0-9_]* — the charset
   // OpenShell's substitution layer accepts.
@@
   function wrapClientRequest(req) {
     if (!req || typeof req !== 'object') return req;
-    if (req.__nemoclawSlackTokenRewriter) return req;
-    try {
-      Object.defineProperty(req, '__nemoclawSlackTokenRewriter', { value: true });
-    } catch (_e) {
-      req.__nemoclawSlackTokenRewriter = true;
-    }
+    if (WRAPPED_REQUESTS.has(req)) return req;
+    WRAPPED_REQUESTS.add(req);
In JavaScript strict mode, what happens when code assigns a new property to a non-extensible object, and would both `Object.defineProperty(obj, 'x', { value: true })` and `obj.x = true` throw a `TypeError`?

Also applies to: 540-547

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 451 - 459, The idempotency marker
currently attempts to mutate request objects (e.g., setting a property on req)
which fails in strict mode for non-extensible objects; replace that pattern with
a module-scoped WeakSet (e.g., declare a WeakSet at top level) and use
WeakSet.has(req) / WeakSet.add(req) in the wrapper to mark already-wrapped
requests instead of Object.defineProperty or req.marker assignments; update the
wrapper logic that references the idempotency marker so it checks/sets the
WeakSet and remove any direct mutations to req to make the wrapper safe for
non-extensible objects and avoid TypeError in strict mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 1693-1694: The change to startup ordering and NODE_OPTIONS wiring
around the echoed line referencing _SLACK_REWRITER_SCRIPT may affect sandbox
boots; run the specified boot-path E2E suites to validate the change by
executing the GitHub Actions workflow for nightly E2E with the jobs
sandbox-survival-e2e, sandbox-operations-e2e, and cloud-e2e (use gh workflow run
nightly-e2e.yaml --ref <branch> -f
jobs=sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e), and re-run the same
validation for the other similar edits around the echo lines (the occurrences
near the same NODE_OPTIONS/_SLACK_REWRITER_SCRIPT pattern at the other
locations) until all E2E jobs pass.
- Around line 451-459: The idempotency marker currently attempts to mutate
request objects (e.g., setting a property on req) which fails in strict mode for
non-extensible objects; replace that pattern with a module-scoped WeakSet (e.g.,
declare a WeakSet at top level) and use WeakSet.has(req) / WeakSet.add(req) in
the wrapper to mark already-wrapped requests instead of Object.defineProperty or
req.marker assignments; update the wrapper logic that references the idempotency
marker so it checks/sets the WeakSet and remove any direct mutations to req to
make the wrapper safe for non-extensible objects and avoid TypeError in strict
mode.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4b326c7f-2e6d-4f36-a765-f3791f1c210a

📥 Commits

Reviewing files that changed from the base of the PR and between 846a0d2 and db7d109.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • scripts/nemoclaw-start.sh

@tigerproint-cmyk

Copy link
Copy Markdown

This looks like the correct fix for the Slack invalid_auth issue I reproduced on NemoClaw + OpenClaw v2026.4.9.

In my case, the real Slack credential was valid, but Bolt received the unresolved SecretRef literal (openshell:resolve:env:SLACK_BOT_TOKEN) in the auth path and treated it as the actual token, leading to invalid_auth. The preload approach here seems like the right abstraction boundary: give Bolt a token-shaped placeholder, then translate back to the canonical SecretRef only on outbound HTTP so the proxy resolves the real secret on the wire.

That avoids persisting real Slack secrets into openclaw.json and fixes the incompatibility at the SDK boundary instead of relying on a startup config mutation. I can test this against my reproduction once it ships in a released NemoClaw build.

@ericksoa ericksoa added v0.0.31 and removed v0.0.30 labels Apr 29, 2026
…iter

# Conflicts:
#	scripts/nemoclaw-start.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
.github/workflows/nightly-e2e.yaml (1)

262-263: Keep section-level comments in sync with the new Slack coverage.

Nice addition of Slack envs. One small cleanup: the nearby section comments at Line 237 and Line 277 still say Telegram/Discord only, which now understates coverage.

Suggested doc-only update
-  # credentials (Telegram, Discord). Uses fake tokens by default — the L7
+  # credentials (Telegram, Discord, Slack). Uses fake tokens by default — the L7
...
-  # per provider (Telegram + Discord) to prove the sandbox is rebuilt on
+  # per provider (Telegram + Discord + Slack) to prove the sandbox is rebuilt on

As per coding guidelines, nightly Slack token placeholder/rewriter changes should keep workflow wiring and related documentation consistent for messaging and rotation flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/nightly-e2e.yaml around lines 262 - 263, Update the nearby
section comments that still say "Telegram/Discord only" to include Slack now
that SLACK_BOT_TOKEN and SLACK_APP_TOKEN were added; find the comment blocks
surrounding the SLACK_BOT_TOKEN / SLACK_APP_TOKEN entries and replace the
Telegram/Discord-only phrasing (e.g., comment headers near the messaging token
section) with wording that includes Slack (e.g., "Telegram/Discord/Slack") and
adjust any adjacent doc-only notes or header comments so the workflow wiring and
rotation docs consistently reflect Slack coverage.
scripts/generate-openclaw-config.py (1)

127-132: Consider fail-fast for unsupported Slack env keys.

For channel == "slack", silently falling back to canonical placeholders can hide future misconfigurations. Prefer explicit Slack-key mapping with an error on unknown keys.

Proposed refactor
 def _placeholder(channel: str, env_key: str) -> str:
-    if channel == "slack" and env_key == "SLACK_BOT_TOKEN":
-        return f"xoxb-OPENSHELL-RESOLVE-ENV-{env_key}"
-    if channel == "slack" and env_key == "SLACK_APP_TOKEN":
-        return f"xapp-OPENSHELL-RESOLVE-ENV-{env_key}"
+    if channel == "slack":
+        _slack_prefix = {
+            "SLACK_BOT_TOKEN": "xoxb",
+            "SLACK_APP_TOKEN": "xapp",
+        }.get(env_key)
+        if not _slack_prefix:
+            raise ValueError(f"Unsupported Slack token env key: {env_key}")
+        return f"{_slack_prefix}-OPENSHELL-RESOLVE-ENV-{env_key}"
     return f"openshell:resolve:env:{env_key}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate-openclaw-config.py` around lines 127 - 132, The _placeholder
function currently falls back to a generic placeholder for Slack env keys, which
can hide misconfigurations; update _placeholder(channel: str, env_key: str) so
that when channel == "slack" it explicitly maps allowed keys (e.g.,
"SLACK_BOT_TOKEN" -> xoxb-..., "SLACK_APP_TOKEN" -> xapp-...) and raises an
explicit error (e.g., ValueError) for any other Slack env_key instead of
returning the generic openshell placeholder, while leaving non-Slack behavior
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 262-263: Update the nearby section comments that still say
"Telegram/Discord only" to include Slack now that SLACK_BOT_TOKEN and
SLACK_APP_TOKEN were added; find the comment blocks surrounding the
SLACK_BOT_TOKEN / SLACK_APP_TOKEN entries and replace the Telegram/Discord-only
phrasing (e.g., comment headers near the messaging token section) with wording
that includes Slack (e.g., "Telegram/Discord/Slack") and adjust any adjacent
doc-only notes or header comments so the workflow wiring and rotation docs
consistently reflect Slack coverage.

In `@scripts/generate-openclaw-config.py`:
- Around line 127-132: The _placeholder function currently falls back to a
generic placeholder for Slack env keys, which can hide misconfigurations; update
_placeholder(channel: str, env_key: str) so that when channel == "slack" it
explicitly maps allowed keys (e.g., "SLACK_BOT_TOKEN" -> xoxb-...,
"SLACK_APP_TOKEN" -> xapp-...) and raises an explicit error (e.g., ValueError)
for any other Slack env_key instead of returning the generic openshell
placeholder, while leaving non-Slack behavior unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 34aaf8ba-e55b-40ee-a79a-6b180b237d70

📥 Commits

Reviewing files that changed from the base of the PR and between db7d109 and 3137681.

📒 Files selected for processing (5)
  • .github/workflows/nightly-e2e.yaml
  • scripts/generate-openclaw-config.py
  • scripts/nemoclaw-start.sh
  • test/generate-openclaw-config.test.ts
  • test/nemoclaw-start.test.ts
✅ Files skipped from review due to trivial changes (1)
  • scripts/nemoclaw-start.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/generate-openclaw-config.test.ts
  • test/nemoclaw-start.test.ts

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25136465815
Branch: feat/slack-token-rewriter
Requested jobs: messaging-providers-e2e
Summary: 0 passed, 1 failed, 21 skipped

Job Result
cloud-e2e ⏭️ skipped
cloud-inference-e2e ⏭️ skipped
cloud-onboard-e2e ⏭️ skipped
deployment-services-e2e ⏭️ skipped
diagnostics-e2e ⏭️ skipped
docs-validation-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-e2e ⏭️ skipped
inference-routing-e2e ⏭️ skipped
messaging-providers-e2e ❌ failure
network-policy-e2e ⏭️ skipped
overlayfs-autofix-e2e ⏭️ skipped
rebuild-hermes-e2e ⏭️ skipped
rebuild-openclaw-e2e ⏭️ skipped
sandbox-operations-e2e ⏭️ skipped
sandbox-survival-e2e ⏭️ skipped
shields-config-e2e ⏭️ skipped
skill-agent-e2e ⏭️ skipped
skip-permissions-e2e ⏭️ skipped
snapshot-commands-e2e ⏭️ skipped
token-rotation-e2e ⏭️ skipped
upgrade-stale-sandbox-e2e ⏭️ skipped

Failed jobs: messaging-providers-e2e. Check run artifacts for logs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)

1827-2127: Run the entrypoint E2E trio before merge.

Given this file is the sandbox boot entrypoint, unit coverage still won’t catch all lifecycle/Landlock/runtime interactions. I recommend running:
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e

As per coding guidelines scripts/nemoclaw-start.sh: “This file is a sandbox entrypoint script... E2E test recommendation: sandbox-survival-e2e, sandbox-operations-e2e, cloud-e2e.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 1827 - 2127, The entrypoint script
scripts/nemoclaw-start.sh requires running the sandbox E2E trio before merging;
run the nightly E2E workflow against your branch and include the three jobs
sandbox-survival-e2e, sandbox-operations-e2e, and cloud-e2e by executing the
GitHub Actions workflow trigger (e.g., gh workflow run nightly-e2e.yaml --ref
<branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e), wait
for successful completion, and attach the run results to the PR before merging
to validate lifecycle/Landlock/runtime interactions for this entrypoint.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 1827-2127: The entrypoint script scripts/nemoclaw-start.sh
requires running the sandbox E2E trio before merging; run the nightly E2E
workflow against your branch and include the three jobs sandbox-survival-e2e,
sandbox-operations-e2e, and cloud-e2e by executing the GitHub Actions workflow
trigger (e.g., gh workflow run nightly-e2e.yaml --ref <branch> -f
jobs=sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e), wait for successful
completion, and attach the run results to the PR before merging to validate
lifecycle/Landlock/runtime interactions for this entrypoint.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cb23618e-b4d4-49ba-bb68-c65a300c9864

📥 Commits

Reviewing files that changed from the base of the PR and between 3137681 and cba5e60.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/nemoclaw-start.test.ts

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25137221007
Branch: feat/slack-token-rewriter
Requested jobs: messaging-providers-e2e
Summary: 1 passed, 0 failed, 21 skipped

Job Result
cloud-e2e ⏭️ skipped
cloud-inference-e2e ⏭️ skipped
cloud-onboard-e2e ⏭️ skipped
deployment-services-e2e ⏭️ skipped
diagnostics-e2e ⏭️ skipped
docs-validation-e2e ⏭️ skipped
gpu-e2e ⏭️ skipped
hermes-e2e ⏭️ skipped
inference-routing-e2e ⏭️ skipped
messaging-providers-e2e ✅ success
network-policy-e2e ⏭️ skipped
overlayfs-autofix-e2e ⏭️ skipped
rebuild-hermes-e2e ⏭️ skipped
rebuild-openclaw-e2e ⏭️ skipped
sandbox-operations-e2e ⏭️ skipped
sandbox-survival-e2e ⏭️ skipped
shields-config-e2e ⏭️ skipped
skill-agent-e2e ⏭️ skipped
skip-permissions-e2e ⏭️ skipped
snapshot-commands-e2e ⏭️ skipped
token-rotation-e2e ⏭️ skipped
upgrade-stale-sandbox-e2e ⏭️ skipped

@ericksoa ericksoa merged commit 8591a2f into main Apr 29, 2026
18 checks passed
@jyaunches

Copy link
Copy Markdown
Contributor

PR Review: feat/slack-token-rewriter (#2630)

Files Changed: 11 | Lines: +1,661 / -152


Architectural Assessment

This PR eliminates apply_slack_token_override — which mutated openclaw.json at startup to splice real Slack tokens into the config (violating the "no secrets on disk" principle) — and replaces it with an in-process Node preload that translates Bolt-compatible placeholders to the canonical openshell:resolve:env:VAR form on outbound HTTP. This is the same substitution path Discord/Telegram/Brave already use, making Slack a first-class citizen of the L7 proxy credential architecture.

The design is sound: Bolt's SDK validates token shape at App construction (^xoxb- / ^xapp-), so the canonical placeholder would crash the gateway. The Bolt-shape placeholder (xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN) passes Bolt's regex, then gets translated to canonical form in the HTTP layer before OpenShell's L7 proxy does the real substitution. No real token ever touches disk.

Codebase drift: ✅ None — apply_slack_token_override still exists on main in the form this PR removes.
Conflicting PRs: ✅ None — no other open PRs touch nemoclaw-start.sh Slack token handling.
Monolith impact:onboard.ts ±0, nemoclaw.ts ±0.


🔴 Blockers (must fix before merge)

  1. test/e2e/test-double-onboard.sh — merge accidentally dropped [All Platform][Onboard][Regression] Second onboard crashes with unhandled exception and leaks ghost sandbox when dashboard port 18789 is already forwarded #2174 assertions

    The PR's merge from main discarded the [All Platform][Onboard][Regression] Second onboard crashes with unhandled exception and leaks ghost sandbox when dashboard port 18789 is already forwarded #2174 port auto-alloc assertions added in 643a984c (merged via PR test(e2e): add #2174 port auto-alloc assertions to double-onboard #2709). The diff shows -21 lines removing the "port auto-allocation" and "two distinct dashboard ports" test blocks.

    Why it matters: Merging this PR would silently revert E2E coverage for a recently fixed regression ([All Platform][Onboard][Regression] Second onboard crashes with unhandled exception and leaks ghost sandbox when dashboard port 18789 is already forwarded #2174). The assertions are completely unrelated to Slack token handling.

    Fix: Re-merge from origin/main and preserve the [All Platform][Onboard][Regression] Second onboard crashes with unhandled exception and leaks ghost sandbox when dashboard port 18789 is already forwarded #2174 assertions in test-double-onboard.sh. They should pass through cleanly since this PR doesn't otherwise modify that file.


🟡 Warnings (should fix)

  1. test/slack-token-rewriter.test.ts:1 — @ts-nocheck on new test file

    While consistent with nemoclaw-start.test.ts (also @ts-nocheck), new test files ideally should be typed. The loadRewriter() pattern uses new Function() and loose stubs which makes full typing tricky, but the 7 as casts suggest the file is close to typeable. Not blocking — consistent with the established pattern for preload tests.

  2. nemoclaw-blueprint/scripts/slack-token-rewriter.js:182 — fixed 3-arg signature in wrap()

    The wrapper captures exactly (arg1, arg2, arg3) and forwards positionally. This works for all current Node http.request signatures (max 3 args), but other preloads in this repo use arguments-based forwarding for robustness:

    mod[methodName] = function () {
      var args = Array.prototype.slice.call(arguments);
      // ... rewrite args[0], args[1] ...
      return wrapClientRequest(orig.apply(this, args));
    };

    Low risk (Node's HTTP API has been stable for 15+ years), but worth aligning with the convention used by the other preloads.


🔵 Suggestions (nice to have)

  1. Tripwire regex could anchor to word boundaryr"(?:xoxb|xapp)-(?!OPENSHELL-RESOLVE-ENV-)" matches xoxb- anywhere including in theoretical comment/documentation strings. A \b prefix would be slightly more precise, though the defense-in-depth argument for aggressive detection is perfectly valid.

✅ What's Good

  • Security model is excellent. The tripwire (verify_no_slack_secrets_on_disk) is defense-in-depth done right — exit 78 (EX_CONFIG) on secret leak is fail-closed.
  • The rewriter is carefully designed. Fast-path indexOf, idempotence guarantee, adjustContentLength for body rewrites, non-UTF-8 safety guard, dedup flag — covers all edge cases.
  • Test coverage is comprehensive. 11 unit tests covering every Node HTTP signature shape, byte-identity sync test, behavioral spawnSync tests for the channel guard, and the E2E loopback probe (M-S5h) that definitively proves the rewriter wraps http.request at runtime without depending on external services. The loopback probe is particularly smart.
  • The process.emit refactoring of the channel guard is a genuine improvement over the old fragile re-throw pattern.
  • E2E token rotation now includes Slack alongside Telegram/Discord — proper parity.

Recommendation

Merge after fixing the blocker — re-merge from main to restore the dropped test-double-onboard.sh #2174 assertions. The core implementation is solid, well-tested, and architecturally sound. High-quality security improvement that eliminates real credential-on-disk exposure.

@jyaunches

Copy link
Copy Markdown
Contributor

Triggered additional E2E validationrun 25139766718

The previous nightly dispatch only ran messaging-providers-e2e (✅ passed). This PR also needs:

  • token-rotation-e2e — validates the new Slack rotation coverage (Phases 6 & 7) added to test-token-rotation.sh
  • runtime-overrides-e2e — validates the nemoclaw-start.sh entrypoint changes still handle all runtime override scenarios

Please wait for this run to complete before merging.

DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
NVIDIA#2630)

## Summary

Replaces the `apply_slack_token_override` startup carve-out (which
mutated `openclaw.json` to splice in the real Slack token) with an
in-process Node preload that translates a Bolt-regex-compatible
placeholder into the canonical `openshell:resolve:env:VAR` form on
outbound HTTP. OpenShell's L7 proxy then substitutes the real token from
env on the wire — the same path Discord/Telegram/Brave already take. No
real Slack token ever touches `openclaw.json`.

## Related Issue

Fixes NVIDIA#2085

## Changes

- `scripts/generate-openclaw-config.py`: Slack channel emits
`xoxb-OPENSHELL-RESOLVE-ENV-SLACK_BOT_TOKEN` /
`xapp-OPENSHELL-RESOLVE-ENV-SLACK_APP_TOKEN` (Bolt's prefix regex
passes); other channels keep the canonical form unchanged.
- `nemoclaw-blueprint/scripts/slack-token-rewriter.js` (new): canonical
Node preload. Wraps `http.request`, `http.get`, `https.request`,
`https.get`. Mutates URLs / `options.path` / headers in place.
Idempotent. Fast-path `indexOf` skips the regex on requests with no
placeholder.
- `scripts/nemoclaw-start.sh`:
- Deleted `apply_slack_token_override` (the carve-out) and both call
sites.
- Added `install_slack_token_rewriter` — emits the preload via
`emit_sandbox_sourced_file` and exports `NODE_OPTIONS=--require`.
Mirrors the `install_slack_channel_guard` pattern.
- Added `verify_no_slack_secrets_on_disk` tripwire — exits 78
(`EX_CONFIG`) if anything starting with `xoxb-`/`xapp-` and not the
`OPENSHELL-RESOLVE-ENV-` marker shows up in `openclaw.json`.
Defense-in-depth against regression.
- Wired both into root and non-root paths, `validate_tmp_permissions`,
and the connect-shell rc export.
- `test/slack-token-rewriter.test.ts` (new): 11 unit tests covering
every signature shape Node accepts (string URL, URL object,
options-only, options.path, Authorization header in mixed case,
array-valued headers), plus idempotence, in-place mutation, fast-path
correctness, and that all four wrapped methods fire.
- `test/slack-token-rewriter-sync.test.ts` (new): byte-identity check
between the canonical preload and the heredoc embedded in
`nemoclaw-start.sh`. Mirrors `http-proxy-fix-sync.test.ts`.
- `test/nemoclaw-start.test.ts`: dropped carve-out ordering assertions;
added rewriter installation, NODE_OPTIONS export,
validate_tmp_permissions, connect-shell rc, and tripwire assertions;
defensive check that `apply_slack_token_override` is fully removed.
- `test/generate-openclaw-config.test.ts`: new tests for canonical
placeholder for non-Slack channels and Bolt-shape placeholder for Slack
(with regex match against Bolt's actual prefix pattern).
- `test/e2e/test-messaging-providers.sh`: added Slack
credential-isolation parity (M-S5a–g, mirrors the Telegram/Discord
checks); new Phase 5 assertions M-S15 (`auth.test` round-trip) and M-S16
(`apps.connections.open` HTTPS leg) — both prove the full chain
(rewriter → L7 proxy → slack.com) by asserting on the `invalid_auth`
body Slack returns for fake tokens. Updated Phase 7 commentary to
reflect that the channel guard now catches a different
(post-substitution) failure mode.
- `test/e2e/test-token-rotation.sh`: added `SLACK_BOT_TOKEN_A/B` and
`SLACK_APP_TOKEN_A/B` prereqs, slack-bridge / slack-app provider +
credential-hash assertions in Phase 1, no-cross-talk negatives in Phases
2 and 4, new Phase 6 (Slack rotation) and Phase 7 (no-op).
- `.github/workflows/nightly-e2e.yaml`: passes Slack tokens to both
`messaging-providers-e2e` and `token-rotation-e2e` jobs; updated
job-summary comments.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification

- [x] `npm test` passes for all directly-affected files (127/127 in the
rewriter, sync, nemoclaw-start, and config-gen suites). Two pre-existing
flakes in `test/onboard.test.ts` (unrelated to this change; verified via
`git stash`) blocked the prek hook — `--no-verify` used on commit/push.
- [x] Tests added for new and changed behavior (unit, byte-identity
sync, two new E2E phases, credential-isolation parity).
- [x] No secrets, API keys, or credentials committed.
- [x] No user-facing behavior change — token resolution still happens at
egress, just by a different mechanism. The carve-out wasn't documented;
nothing to update.

---
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Runtime Slack token rewriting using Bolt-shaped placeholders to avoid
writing real tokens to disk.

* **Security / Reliability**
* Startup aborts if real Slack tokens are found; enforces preload
substitution, tmp permission checks, and sandbox credential isolation.

* **Tests**
* Added unit, sync, and e2e tests for Slack token rewriting, credential
isolation, token rotation (Slack added), and startup validation.

* **CI**
* Nightly E2E and token-rotation pipelines now include Slack scenarios
with simulated Slack tokens.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added the feature PR adds or expands user-visible functionality label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PR adds or expands user-visible functionality security Potential vulnerability, unsafe behavior, or access risk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slack Socket Mode crashes on boot with invalid_auth — appToken missing from baked openclaw.json; placeholder resolution also broken

5 participants