Skip to content

fix(cli): reuse websocket for logs --follow#56475

Open
huntharo wants to merge 13 commits intoopenclaw:mainfrom
huntharo:codex/logs-follow-persistent-ws
Open

fix(cli): reuse websocket for logs --follow#56475
huntharo wants to merge 13 commits intoopenclaw:mainfrom
huntharo:codex/logs-follow-persistent-ws

Conversation

@huntharo
Copy link
Copy Markdown
Member

@huntharo huntharo commented Mar 28, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: openclaw logs --follow reused the one-shot CLI gateway helper, so each poll opened a new websocket, sent logs.tail, and closed immediately.
  • Why it matters: gateway logs looked noisy and misleading, and follow mode paid unnecessary handshake overhead on every interval.
  • What changed: I added an additive logs.subscribe / logs.unsubscribe gateway capability with live logs.appended events, bounded in-memory buffering, and slow-subscriber disconnect protection.
  • What changed: the CLI now discovers that capability from hello-ok and uses streamed follow when available, with the persistent-websocket logs.tail polling path kept as fallback for older gateways.
  • What did NOT change (scope boundary): I kept logs.tail intact for compatibility and bootstrap/fallback behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

  • Root cause: follow mode called callGatewayFromCli() inside its polling loop, and that helper creates a fresh GatewayClient for one request and then stops it. The gateway also had no log subscription capability, so the CLI could not switch to a push model even when keeping a socket open.
  • Missing detection / guardrail: there was no regression coverage for either long-lived follow transports or an advertised streamed log-follow capability.
  • Prior context (git blame, prior PR, issue, or refactor if known): existing CLI behavior, not a newly introduced refactor in this PR.
  • Why this regressed now: it did not newly regress; this was the shipped behavior.
  • If unknown, what was ruled out: I ruled out the gateway talking to itself. The reconnect loop came from the CLI client path.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should have caught this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/cli/logs-cli.test.ts, src/gateway/log-stream.test.ts
  • Scenario the test should lock in: the CLI chooses streaming only when the gateway advertises it, and a subscribed operator connection receives initial log tail data plus live logs.appended events.
  • Why this is the smallest reliable guardrail: the CLI selection logic is best covered in a unit test, while the gateway-side buffering and event delivery path needs a live websocket seam test.
  • Existing test that already covers this (if any): none
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

openclaw logs --follow now prefers a streamed websocket follow mode on gateways that advertise logs.subscribe. Older gateways keep working through the fallback persistent-websocket polling path.

Diagram (if applicable)

Before:
[logs --follow poll] -> [open websocket] -> [logs.tail] -> [close websocket]

After:
[start logs --follow] -> [hello-ok capability check]
                         -> [logs.subscribe if supported] -> [logs.appended events]
                         -> [else persistent logs.tail polling]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) Yes
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: I added new gateway RPC methods and a new websocket event, but gated client usage on advertised method discovery and kept logs.tail as the compatibility fallback. Live delivery uses bounded queues and disconnects slow subscribers instead of blocking log writes.

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local Node 22+ / pnpm workspace
  • Model/provider: N/A
  • Integration/channel (if any): gateway websocket CLI path
  • Relevant config (redacted): local loopback gateway defaults

Steps

  1. Connect a CLI client and inspect hello-ok gateway methods/events.
  2. Run openclaw logs --follow.
  3. Verify streamed gateways use logs.subscribe and receive logs.appended, while older gateways continue on the polling fallback.

Expected

  • Follow mode should use advertised streaming when available, without blocking gateway log writes on subscriber delivery.

Actual

  • Before this change, follow mode only had polling behavior and no capability-gated streaming option.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: I ran pnpm test -- src/cli/logs-cli.test.ts, pnpm test -- src/gateway/log-stream.test.ts, pnpm test -- src/gateway/method-scopes.test.ts, pnpm test -- src/gateway/call.test.ts, pnpm test -- src/gateway/server-methods/server-methods.test.ts, and pnpm build.
  • Edge cases checked: I verified streamed logs are advertised in hello-ok, logs.subscribe returns the initial tail plus live events, logs.unsubscribe stops delivery, and the CLI falls back when streaming is not advertised.
  • What you did not verify: I did not run a long live manual soak test with an intentionally stalled websocket subscriber.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: a stalled subscriber could cause unbounded in-memory growth or block log writes.
    • Mitigation: the gateway uses bounded per-subscriber queues, a bounded recent-log buffer, async flush loops, and disconnect-on-overflow behavior.
  • Risk: new streaming methods could break older gateways or older clients.
    • Mitigation: the CLI only uses streaming when the gateway advertises logs.subscribe, and logs.tail remains unchanged as the fallback path.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime cli CLI command changes size: M maintainer Maintainer-authored PR labels Mar 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0dfac15748

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/logs-cli.ts Outdated
Comment thread src/gateway/call.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 28, 2026

Greptile Summary

This PR replaces the previous per-poll websocket approach in openclaw logs --follow with a persistent-websocket model that uses a new logs.subscribe / logs.unsubscribe / logs.appended gateway capability when the gateway advertises it, while preserving the existing logs.tail polling path as a fallback for older gateways.

Key changes:

  • New gateway capability (log-stream.ts, server-methods/logs.ts): logs.subscribe returns an initial tail and then pushes logs.appended events over the same websocket; bounded per-subscriber queues and slow-subscriber disconnect protection guard against stalled clients.
  • CLI follow mode (logs-cli.ts): The follow loop detects logs.subscribe / logs.unsubscribe in hello-ok capabilities, uses streaming when available, and falls back to persistent-socket polling otherwise. Retryable startup and stream-disconnect errors are handled in connectFollowLogsClientWithRetry and waitForFollowClientReconnect with appropriate user-facing messages.
  • call.ts refactor: resolveGatewayClientConnectionWithScopes is extracted so both one-shot callers and the new persistent follow client can share credential/scope resolution. applyBackendGatewayIdentityDefaults ensures non-CLI call paths receive GATEWAY_CLIENT / BACKEND identity by default instead of the old CLI fallbacks.
  • Protocol: LogsSubscribeParams / LogsSubscribeResult added to the TypeBox schema, Swift models, and scope registry; logs.subscribe and logs.unsubscribe are added to BASE_METHODS so they are advertised in hello-ok.
  • Tests: New unit tests cover streaming vs. polling capability selection, startup retry, stream-disconnect reconnect, unsubscribe race condition, and scope/identity defaults for all gateway call wrappers.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style suggestions that do not affect correctness or reliability.

The implementation is correct, backward-compatible, and thoroughly tested (streaming path, polling fallback, startup retry, stream-disconnect reconnect, capability negotiation, and unsubscribe race). The prior P1 concerns from earlier review threads — unhandled rejection on onConnectError and the requestTimeoutMs propagation change — have both been addressed. The only remaining finding is a P2 style note about dead CLI-identity defaults in resolveGatewayClientConnectionWithScopes that could mislead future callers of the new resolveGatewayClientConnection export.

src/gateway/call.ts — the ?? GATEWAY_CLIENT_NAMES.CLI / ?? GATEWAY_CLIENT_MODES.CLI fallbacks in resolveGatewayClientConnectionWithScopes are dead code for all current callers but remain as the default for the new exported resolveGatewayClientConnection.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/call.ts
Line: 1811-1815

Comment:
**Dead CLI defaults in `resolveGatewayClientConnectionWithScopes`**

The `?? GATEWAY_CLIENT_NAMES.CLI` and `?? GATEWAY_CLIENT_MODES.CLI` fallbacks on lines 1811–1815 are now dead code for every existing internal caller:

- `callGatewayScoped`, `callGatewayLeastPrivilege`, and `callGateway` all call `applyBackendGatewayIdentityDefaults(opts)` before reaching this function, which fills in `GATEWAY_CLIENT` / `BACKEND` so the `??` branches never fire.
- `callGatewayCli` (and `createFollowLogsClient`) explicitly supply `GATEWAY_CLIENT_NAMES.CLI` / `GATEWAY_CLIENT_MODES.CLI`.

The risk is that the newly exported `resolveGatewayClientConnection` does **not** apply `applyBackendGatewayIdentityDefaults` internally. Any future caller that omits `clientName`/`mode` will silently receive CLI identity, while `resolveGatewayCallScopes` (which does default to BACKEND) would pick least-privilege scopes — an identity/scope mismatch. A brief comment, or applying the backend defaults inside `resolveGatewayClientConnection` itself, would prevent future surprises.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "Gateway: address remaining log follow re..." | Re-trigger Greptile

Comment thread src/cli/logs-cli.ts
Comment on lines +107 to +109
const resetReady = () => {
ready = createDeferred<void>();
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unhandled rejection on reconnect onConnectError during poll delay

When the websocket drops and the follow loop is in the delay(interval) sleep between polls, onClose fires first and calls resetReady(), creating a fresh ready deferred. The client then schedules a reconnect (after backoffMs, starting at 1 000 ms). If that reconnect attempt fails quickly (e.g. "connection refused") before delay(interval) finishes, onConnectError rejects ready.promise, and then onClose calls resetReady() again — replacing ready with a new deferred.

Because waitUntilReady() hasn't been called yet (the follow loop is still sleeping), nothing ever awaits the rejected promise. In Node.js 15+ the process will crash with UnhandledPromiseRejection.

The simplest guard is to silence the old promise before replacing it:

const resetReady = () => {
  ready.promise.catch(() => {}); // prevent unhandled rejection if nobody is awaiting it
  ready = createDeferred<void>();
};

Alternatively, store the last connect error and re-throw it on the next waitUntilReady() call so the caller still sees the error.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/logs-cli.ts
Line: 107-109

Comment:
**Unhandled rejection on reconnect `onConnectError` during poll delay**

When the websocket drops and the follow loop is in the `delay(interval)` sleep between polls, `onClose` fires first and calls `resetReady()`, creating a fresh `ready` deferred. The client then schedules a reconnect (after `backoffMs`, starting at 1 000 ms). If that reconnect attempt fails quickly (e.g. "connection refused") before `delay(interval)` finishes, `onConnectError` rejects `ready.promise`, and then `onClose` calls `resetReady()` again — replacing `ready` with a new deferred.

Because `waitUntilReady()` hasn't been called yet (the follow loop is still sleeping), nothing ever awaits the rejected promise. In Node.js 15+ the process will crash with `UnhandledPromiseRejection`.

The simplest guard is to silence the old promise before replacing it:

```typescript
const resetReady = () => {
  ready.promise.catch(() => {}); // prevent unhandled rejection if nobody is awaiting it
  ready = createDeferred<void>();
};
```

Alternatively, store the last connect error and re-throw it on the next `waitUntilReady()` call so the caller still sees the error.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread src/gateway/call.ts
token: resolvedCredentials.token,
password: resolvedCredentials.password,
tlsFingerprint,
requestTimeoutMs: timeoutMs,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 requestTimeoutMs now propagates into one-shot client options

Before this refactor, executeGatewayRequestWithScopes never set requestTimeoutMs when constructing the GatewayClient, so the client used its 30 s default as a fallback for any request that didn't supply its own timeoutMs. Now requestTimeoutMs: timeoutMs is included in clientOptions from resolveGatewayClientConnectionWithScopes, meaning one-shot callers that pass neither opts.timeoutMs to their client.request() call nor an explicit timeout to callGateway()/callGatewayCli() will now see a 10 s per-request fallback (the default from resolveGatewayCallTimeout) instead of 30 s.

In practice this is harmless because the safeTimerTimeoutMs outer timer already fires at 10 s, but it changes observable behavior for any caller that relied on the 30 s default per-request timeout. If this is intentional, a brief comment here would clarify it for future readers.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/call.ts
Line: 905

Comment:
**`requestTimeoutMs` now propagates into one-shot client options**

Before this refactor, `executeGatewayRequestWithScopes` never set `requestTimeoutMs` when constructing the `GatewayClient`, so the client used its 30 s default as a fallback for any request that didn't supply its own `timeoutMs`. Now `requestTimeoutMs: timeoutMs` is included in `clientOptions` from `resolveGatewayClientConnectionWithScopes`, meaning one-shot callers that pass neither `opts.timeoutMs` to their `client.request()` call nor an explicit timeout to `callGateway()`/`callGatewayCli()` will now see a 10 s per-request fallback (the default from `resolveGatewayCallTimeout`) instead of 30 s.

In practice this is harmless because the `safeTimerTimeoutMs` outer timer already fires at 10 s, but it changes observable behavior for any caller that relied on the 30 s default per-request timeout. If this is intentional, a brief comment here would clarify it for future readers.

How can I resolve this? If you propose a fix, please make it concise.

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui size: XL and removed size: M labels Mar 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 74c8efc97a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/log-stream.ts Outdated
Comment thread src/gateway/server-methods/logs.ts Outdated
@huntharo huntharo force-pushed the codex/logs-follow-persistent-ws branch 2 times, most recently from 03eba53 to 981c108 Compare April 3, 2026 00:51
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 981c108227

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/logs-cli.ts
Comment on lines +252 to +255
function isRetryableFollowStartupError(err: unknown): boolean {
const message = (err instanceof Error ? err.message : String(err)).toLowerCase();
return (
message.includes("connect failed") ||
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat stream-disconnect errors as retryable in follow mode

createFollowLogsClient pushes "gateway log stream disconnected" into the follow queue on socket close, but isRetryableFollowStartupError does not match that message (nor "gateway not connected" from client.request). In logs --follow, a transient gateway restart therefore falls into the non-retryable path and exits with a fatal gateway error instead of re-waiting/re-subscribing. This breaks long-running follow sessions whenever the websocket drops briefly.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle Bot added the app: macos App: macos label Apr 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9787127265

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/logs-cli.ts Outdated
Comment thread src/gateway/log-stream.ts
Comment on lines +86 to +87
} catch {
unsubscribe(subscriber.connId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface poll errors instead of silently dropping log subscribers

The catch-all in pollSubscriber unsubscribes on any exception without notifying the client or closing the socket. A transient file race (for example, ENOENT between stat and open during rotation) will permanently remove that subscriber while the websocket stays connected, leaving follow clients blocked waiting for logs.appended that never arrive. This should either retry on the next poll or close/error the socket so clients can recover.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c20a5bb95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/logs-cli.ts Outdated
@huntharo huntharo self-assigned this Apr 3, 2026
@huntharo huntharo force-pushed the codex/logs-follow-persistent-ws branch from 7c20a5b to b7a4ca1 Compare April 3, 2026 04:26
@huntharo huntharo changed the title fix(cli): reuse websocket for logs --follow fix(cli): reuse websocket for logs --follow Apr 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7a4ca1b7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/logs-cli.ts Outdated
Comment thread src/gateway/log-tail.ts Outdated
@huntharo huntharo force-pushed the codex/logs-follow-persistent-ws branch from b7a4ca1 to 8f24f3d Compare April 3, 2026 16:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f24f3d286

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/log-stream.ts Outdated
if (!result.reset && result.lines.length === 0) {
return;
}
subscriber.socket.send(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-check subscriber membership before sending appended logs

pollSubscriber can continue past await resolveLogFile(...) / await readLogSlice(...) and call socket.send(...) even after logs.unsubscribe removed that connId from subscribers. In that race (for example, unsubscribe during slow filesystem reads), the client can still receive a logs.appended event after unsubscribe has already succeeded, which violates the method contract and can produce confusing extra lines. Add a post-read guard (e.g., ensure the current map entry still matches this subscriber or mark inactive on unsubscribe) before sending.

Useful? React with 👍 / 👎.

@huntharo huntharo force-pushed the codex/logs-follow-persistent-ws branch from 8f24f3d to fa050d3 Compare April 3, 2026 17:01
@huntharo
Copy link
Copy Markdown
Member Author

huntharo commented Apr 3, 2026

@greptile-apps - Can you re-review latest?

@huntharo huntharo force-pushed the codex/logs-follow-persistent-ws branch from 14f9116 to e127e81 Compare April 3, 2026 17:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e127e813af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/log-stream.ts Outdated
Comment on lines +93 to +97
} catch {
unsubscribe(subscriber.connId);
try {
if (subscriber.socket.readyState === WebSocket.OPEN) {
subscriber.socket.close(1011, "log stream error");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip socket close for stale unsubscribed pollers

pollSubscriber now checks subscriber identity before sending events, but the catch path still unconditionally closes subscriber.socket. If logs.unsubscribe (or a replacement logs.subscribe) wins while a poll is in flight and that stale poll later throws (for example during log-file rotation races), this closes the still-active websocket unexpectedly and drops unrelated RPC traffic on that connection. Gate the close path on subscribers.get(subscriber.connId) === subscriber (and active status) before closing.

Useful? React with 👍 / 👎.

@huntharo huntharo force-pushed the codex/logs-follow-persistent-ws branch from e127e81 to 22946ea Compare April 9, 2026 00:11
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 9, 2026

🔒 Aisle Security Analysis

We found 4 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Terminal escape injection via unsanitized gateway-provided filename and error reason in logs CLI
2 🟡 Medium Potential ReDoS / CPU DoS in broad OSC-stripping regex used on untrusted log text
3 🟡 Medium Per-IP log stream subscription limit bypass via untrusted proxy headers (clientIp becomes undefined)
4 🔵 Low Information disclosure: absolute log file path returned to clients in logs.subscribe and logs.appended events
1. 🟠 Terminal escape injection via unsanitized gateway-provided filename and error reason in logs CLI
Property Value
Severity High
CWE CWE-117
Location src/cli/logs-cli.ts:542-545

Description

The logs CLI prints some gateway-controlled strings to the terminal without applying sanitizeForLog, allowing terminal escape injection (e.g., OSC-52 clipboard writes, window title changes) when --json is not used.

Affected paths:

  • Gateway-provided log filename payload.file is printed directly:

    • The gateway/server can control payload.file in logs.tail / logs.subscribe responses.
    • It is written to stdout as text without sanitization, so ANSI/OSC sequences or control characters will be interpreted by the user's terminal.
  • Gateway close/error reason is interpolated into the Reason: line without sanitization:

    • WebSocket close reason or other error messages can be influenced by the remote gateway.
    • The first line of the error (summarizeRetryErrorText(err)) is printed verbatim.

Vulnerable code (text mode):

if (first && payload.file) {
  const prefix = pretty ? colorize(rich, theme.muted, "Log file:") : "Log file:";
  if (!logLine(`${prefix} ${payload.file}`)) {
    return false;
  }
}
...
if (errorText.trim()) {
  if (!errorLine(colorize(rich, theme.muted, `Reason: ${summarizeRetryErrorText(err)}`))) {
    return;
  }
}

Even though individual log lines are sanitized via sanitizeForLog, these additional output paths bypass it, leaving a practical injection vector when a user runs openclaw logs in a TTY.

Recommendation

Sanitize all gateway-controlled strings before writing to TTY output in text mode.

Suggested fix (apply sanitizeForLog to payload.file and to error reason text before interpolation):

import { sanitizeForLog } from "../terminal/ansi.js";

...
if (first && payload.file) {
  const safeFile = sanitizeForLog(payload.file);
  const prefix = pretty ? colorize(rich, theme.muted, "Log file:") : "Log file:";
  if (!logLine(`${prefix} ${safeFile}`)) return false;
}

...
if (errorText.trim()) {
  const safeReason = sanitizeForLog(summarizeRetryErrorText(err));
  if (!errorLine(colorize(rich, theme.muted, `Reason: ${safeReason}`))) return;
}

Also consider applying similar sanitization to any other non-JSON notices that may include remote-provided data (e.g., close reasons, retry notices) and/or centralizing this in the writer layer for stdout/stderr when process.stdout.isTTY.

2. 🟡 Potential ReDoS / CPU DoS in broad OSC-stripping regex used on untrusted log text
Property Value
Severity Medium
CWE CWE-1333
Location src/terminal/ansi.ts:2-15

Description

stripAnsi() now uses a broad OSC-matching pattern:

  • Pattern: \x1b\][\s\S]*?(?:\x07|\x1b\\)
  • Applied globally (/g) via input.replace(ANSI_OSC_REGEX, "")
  • When input contains many ESC ] starters without a terminating BEL (\x07) or ST (ESC \\), the regex engine may repeatedly scan forward to the end of the string for each starter, causing quadratic-ish behavior on crafted inputs.

This matters because sanitizeForLog() calls stripAnsi() on log fields that can include attacker-controlled content (e.g., logs streamed from a gateway). A malicious log line (or repeated log entries) could trigger excessive CPU usage in the CLI process, resulting in a denial of service.

Vulnerable code:

const ANSI_OSC_PATTERN = "\\x1b\\][\\s\\S]*?(?:\\x07|\\x1b\\\\)";
const ANSI_OSC_REGEX = new RegExp(ANSI_OSC_PATTERN, "g");

export function stripAnsi(input: string): string {
  return input.replace(ANSI_OSC_REGEX, "").replace(ANSI_CSI_REGEX, "");
}

Recommendation

Avoid regex constructs that can rescan large tails for each potential start.

Safer options:

  1. Use a linear-time parser to strip escape sequences (recommended). Scan the string once, and when you see ESC ], advance until you find BEL or ST; if no terminator, stop stripping at end.

  2. If keeping regex, bound the match and/or pre-check for terminators to avoid repeated scans. For example, limit maximum OSC payload length to something reasonable for logs:

// Limit OSC payload to e.g. 8KB to avoid pathological scans
const ANSI_OSC_REGEX = /\x1b\][\s\S]{0,8192}?(?:\x07|\x1b\\)/g;

export function stripAnsi(input: string): string {
  return input.replace(ANSI_OSC_REGEX, "").replace(ANSI_CSI_REGEX, "");
}
  1. Add a fast-path: if the string does not contain "\x1b]", skip the OSC regex.

Also consider adding a unit/perf test that feeds a long string containing many "\x1b]" without terminators to ensure runtime stays linear.

3. 🟡 Per-IP log stream subscription limit bypass via untrusted proxy headers (clientIp becomes undefined)
Property Value
Severity Medium
CWE CWE-770
Location src/gateway/log-stream.ts:125-136

Description

The new live log streaming implementation attempts to cap subscriptions per IP (MAX_LOG_STREAM_SUBSCRIBERS_PER_IP), but the per-IP check is only applied when gatewayClient.clientIp is set.

In the websocket handshake, reportedClientIp is explicitly set to undefined when proxy headers are present from an untrusted remote address:

  • An attacker can include X-Forwarded-For / X-Real-IP headers while connecting directly (i.e., from an address not in gateway.trustedProxies).
  • This sets hasUntrustedProxyHeaders=true, causing reportedClientIp to become undefined.
  • GatewayWsClient.clientIp is then undefined, so createGatewayLogStream().subscribe() will skip the per-IP subscription cap entirely.

Impact:

  • A single client (or multiple clients from the same IP) can exceed the intended per-IP limit (4) and consume up to the global subscriber cap (16), increasing CPU/file I/O due to 1s polling per subscriber.

Vulnerable code paths:

  • Source of clientIp becoming undefined on untrusted proxy headers: reportedClientIp = isLocalClient || hasUntrustedProxyHeaders ? undefined : ...
  • Sink where per-IP cap is skipped when clientIp is falsy: const clientIp = gatewayClient.clientIp?.trim(); if (clientIp) { ... enforce limit ... }

Recommendation

Ensure an IP (or other stable identifier) is always available for rate limiting, even when proxy headers are untrusted.

Suggested approach:

  • Keep the existing reportedClientIp behavior for trust/local detection.
  • But also store a separate rateLimitIp based on the actual TCP peer address (or resolveClientIp result with untrusted headers ignored).
  • Use that value for subscription caps.

Example (conceptual):

// during handshake
const rateLimitIp = remoteAddr; // or resolveClientIp({remoteAddr, forwardedFor: undefined, realIp: undefined, ...})
nextClient.clientIp = rateLimitIp;

Or change subscribe() to fall back:

const clientIp = (gatewayClient.clientIp ?? gatewayClient.socket._socket?.remoteAddress)?.trim();
if (clientIp) {// enforce per-IP cap
}

Additionally, consider enforcing per-connection and per-auth-identity limits (device/client id) to reduce reliance on IP address alone.

4. 🔵 Information disclosure: absolute log file path returned to clients in logs.subscribe and logs.appended events
Property Value
Severity Low
CWE CWE-200
Location src/gateway/server-methods/logs.ts:107-113

Description

The gateway exposes the server's resolved log file path (likely an absolute path under the host filesystem) to remote clients.

  • resolveLogFile() returns a filesystem path on the gateway host.
  • logs.subscribe includes this path in the JSON-RPC response.
  • The log streaming mechanism (logs.appended broadcast) also includes the full path.

This discloses host filesystem layout and potentially sensitive installation/user information (e.g., usernames, container mount points). While not always critical, it can materially aid attackers in follow-on attacks and is generally unnecessary for functionality (clients can treat the file value as an opaque token or omit it entirely).

Vulnerable code:

respond(true, { subscribed: true, file, ...result }, undefined);

and

params.broadcastToConnIds("logs.appended", { file, ...result }, ...);

Recommendation

Do not expose real filesystem paths to clients.

Options:

  1. Omit file from all client-facing payloads and use server-side state to track rolling changes.
  2. If a client needs a stable identifier, return an opaque token (e.g., a hash or monotonically increasing generation id) rather than a path.

Example:

// server-side
const file = await resolveLogFile(configuredFile);
const fileId = createHash('sha256').update(file).digest('hex');

respond(true, { subscribed: true, fileId, ...resultWithoutPath }, undefined);// when comparing previous file:
previousFileId: p.fileId

Also update the protocol schemas to reflect the new fileId field and remove/avoid returning file paths.


Analyzed PR: #56475 at commit 5a65fad

Last updated on: 2026-04-09T01:14:25Z

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22946ea45c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/logs-cli.ts
Comment on lines +75 to +78
if (!payload || typeof payload !== "object") {
throw new Error("Unexpected logs.tail response");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return logs.tail payload from fetchLogs

fetchLogs validates the logs.tail response but never returns it, so callers receive undefined. In normal openclaw logs (non-follow) flow this makes emitLogsPayload crash when it reads payload.lines, and the command falls into the generic gateway-error path even when the gateway is healthy. Return the validated payload so non-follow log tailing still works.

Useful? React with 👍 / 👎.

Comment on lines +87 to 88
params.logStreamStop?.();
} catch {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep task-registry shutdown hook in close handler

This shutdown section now invokes logStreamStop but no longer calls stopTaskRegistryMaintenance, even though that callback is still passed by startGatewayServer. That leaves task-registry maintenance running after gateway shutdown/restart in-process, so background sweeps can continue mutating task state outside the server lifecycle. Keep the existing task-maintenance stop call alongside the new log-stream stop.

Useful? React with 👍 / 👎.

Comment on lines +781 to +782
LogsSubscribeParams,
LogsSubscribeResult,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Re-export exec-approval param types from protocol index

The protocol/index.ts type export list dropped ExecApprovalGetParams, ExecApprovalRequestParams, and ExecApprovalResolveParams while adding log-subscribe types. That creates an unintended breaking change for downstream code importing those exec-approval types from this public barrel. Keep the log-subscribe exports additive and restore the removed exec-approval type exports.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

respond(true, result, undefined);

P1 Badge Return resolved file path in logs.tail response

The handler responds with result only, but LogsTailResult requires a file field and callers rely on that value to track log-file rollovers across cursored polls. Dropping file is a wire-contract regression that can break downstream clients and makes rotation detection unreliable when the caller cannot echo previousFile back on the next request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/gateway/server-methods/logs.ts Outdated
Comment on lines 37 to 38
limit: p.limit,
maxBytes: p.maxBytes,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Default logs.tail limits before calling readLogSlice

logs.tail now forwards p.limit/p.maxBytes directly even though both params are optional, so valid calls like logs.tail {} pass undefined into readLogSlice. That function clamps numeric inputs and eventually computes a NaN buffer length, which throws and returns UNAVAILABLE instead of tail data for clients that omit these optional fields. Preserve the old behavior by applying defaults (like DEFAULT_LIMIT/DEFAULT_MAX_BYTES) before invoking readLogSlice.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a65fad828

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

limit,
maxBytes,
});
respond(true, result, undefined);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include resolved file in logs.tail responses

logs.tail now responds with result from readLogSlice, but readLogSlice does not include a file field, so successful responses violate the declared LogsTailResult contract and can break typed clients that decode file as required (including the updated Swift model) or depend on it for file-rotation state. Returning { file, ...result } is needed to keep the wire payload backward-compatible.

Useful? React with 👍 / 👎.

Comment thread src/cli/logs-cli.ts
Comment on lines +70 to +74
const payload = await callGatewayFromCli(
"logs.tail",
opts,
{ file, cursor, limit, maxBytes },
{ progress: showProgress },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore loopback pairing fallback for logs.tail fetches

fetchLogs now directly awaits callGatewayFromCli("logs.tail", ...) without the previous pairing-required fallback path, so when a local loopback gateway requests pairing, openclaw logs exits through the generic gateway error flow instead of reading the configured local log file. This is a user-facing regression for unpaired local setups where logs were previously still available.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: macos App: macos app: web-ui App: web-ui cli CLI command changes gateway Gateway runtime maintainer Maintainer-authored PR size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant