fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock by 1998xiangcai-cmd · Pull Request #42027 · openclaw/openclaw

1998xiangcai-cmd · 2026-03-10T09:59:10Z

Summary

Describe the problem and fix in 2�C4 bullets:

Problem: fix(exec): resolve login shell PATH when exec host defaults to "sandbox" without sandbox runtime #41549 left exec in a phantom-sandbox state that skipped login-shell PATH recovery, [UX] Surface layer-specific diagnostics for Control UI auth/origin vs remote CDP reachability in WSL2 + Windows setups #41553 made WSL2 + Windows browser failures hard to triage across Control UI/auth/origin/CDP layers, and [Bug]: `Recurring cron jobs with isolated agentTurn + LLM time out when force-run, while equivalent one-shot at jobs succeed #41558 let detached cron force-runs block on their own cron lane until timeout.
Why it matters: these failures look like broken tooling even when the underlying install, browser profile, or model path is otherwise valid.
What changed: exec now probes login-shell PATH when sandbox host falls back locally; browser status now emits layered control-ui/profile/CDP diagnostics and cleaner HTML upstream errors; cron force-runs now use a detached manual lane so nested cron work can still reuse the cron execution lane.
What did NOT change (scope boundary): this does not add new browser capabilities, does not change sandbox policy beyond PATH recovery for local fallback, and does not redesign cron scheduling semantics.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Exec tool can now find login-shell-managed binaries when the default sandbox host falls back to direct local execution.
openclaw browser status now surfaces layered diagnostics for Control UI auth/origin configuration, browser profile attach mode, and remote/local CDP reachability.
Browser HTTP errors now summarize proxy HTML error pages instead of dumping raw markup.
Detached cron.run --force no longer self-deadlocks when isolated cron work reuses the cron execution lane.

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) Yes
Data access scope changed? (Yes/No) Yes
If any Yes, explain risk + mitigation:
- Exec PATH recovery now also runs for local fallback from the default sandbox host, but only when no sandbox runtime exists and no explicit PATH override was provided.
- Browser CLI status now reads the existing redacted config.get snapshot through the authenticated gateway path to derive operator diagnostics; it does not expose unredacted secrets.

Repro + Verification

Environment

OS: Windows workstation, repo validation run locally
Runtime/container: Node 22 / pnpm
Model/provider: N/A for repro validation
Integration/channel (if any): Browser control + cron
Relevant config (redacted): default exec host, browser remote CDP profile, recurring isolated cron jobs

Steps

Leave tools.exec.host unset, disable sandbox runtime, and run an exec command that depends on login-shell PATH.
Run openclaw browser status against a remote CDP profile in a WSL2 + Windows style setup.
Force-run a recurring isolated cron job whose nested work reuses the cron lane.

Expected

Exec finds login-shell-managed binaries.
Browser status points to the failing layer instead of a generic connection failure.
Force-run completes instead of timing out behind its own lane.

Actual

Covered by the targeted tests below; all now pass.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: pnpm check; pnpm test -- --run src/agents/bash-tools.exec.path.test.ts src/browser/status-diagnostics.test.ts src/browser/server.status-diagnostics.test.ts src/cli/browser-cli-manage.status-diagnostics.test.ts src/browser/client.test.ts src/cron/service.issue-regressions.test.ts; pnpm build.
Edge cases checked: phantom sandbox PATH fallback, remote/local CDP diagnostics ordering, HTML proxy error summarization, detached cron force-run lane reuse.
What you did not verify: live remote CDP against a real Windows browser host, or a live LLM-backed recurring cron on production credentials.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly: revert the three commits in this PR.
Files/config to restore: src/agents/bash-tools.exec.ts, src/browser/*status*, src/cli/browser-cli-*, src/cron/service/ops.ts, src/process/lanes.ts, gateway lane config wiring.
Known bad symptoms reviewers should watch for: missing PATH recovery on macOS launch agents, noisy browser diagnostics ordering, cron force-runs stuck until timeout.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk: Browser status now includes more diagnostics and merges CLI-derived gateway config checks with server-side CDP checks.
- Mitigation: diagnostics are additive, optional in the payload, redacted, and covered by route + CLI tests.
Risk: The new detached cron manual lane could drift from cron concurrency settings.
- Mitigation: gateway startup and hot-reload now set CommandLane.Cron and CommandLane.CronManual together, with regression coverage for the deadlock case.

greptile-apps · 2026-03-10T10:04:39Z

Greptile Summary

This PR addresses three targeted bug fixes: (1) login-shell PATH recovery now triggers when host === "sandbox" falls back to direct local execution (not just host === "gateway"), (2) openclaw browser status gains layered diagnostics covering Control UI auth/origin config, profile attach-mode, and CDP reachability, with HTML proxy error pages now summarized instead of dumped raw, and (3) cron.run --force is moved to a new CommandLane.CronManual lane so isolated cron jobs can still enqueue on CommandLane.Cron without self-deadlocking.

Key changes:

bash-tools.exec.ts: extends host === "gateway" PATH probe guard to also cover host === "sandbox" when no sandbox runtime exists
status-diagnostics.ts (new): pure-logic module for deriveBrowserStatusDiagnostics, deriveGatewayControlUiDiagnostics, and combineBrowserStatusDiagnostics; no network calls, well-typed, well-tested
client-fetch.ts: summarizeBrowserServiceHttpError correctly identifies and summarizes nginx/proxy HTML error pages; the /<[^>]+>/g tag-stripping regex can be tricked by unencoded > in HTML attribute values and the fallback branch (no <title> or <h1>) has no length cap — minor robustness concerns worth addressing
cron/service/ops.ts + process/lanes.ts + gateway wiring: CronManual lane cleanly separates manual force-runs from scheduled cron execution; both startup and hot-reload set CronManual concurrency in lockstep with Cron, preventing config drift
Test coverage is thorough: unit tests for all three diagnostic paths, a regression test that directly exercises the deadlock scenario, and a server integration test for CDP reachability diagnostics

Confidence Score: 4/5

Safe to merge with one minor robustness improvement suggested for HTML error message handling.
All three fixes are narrowly scoped, backward-compatible, and covered by regression tests. The exec PATH change is guarded by existing conditions. Browser diagnostics are additive and optional. The cron lane split resolves a deadlock without altering scheduling semantics. One style/robustness concern identified in stripHtmlErrorText: the tag-stripping regex can be confused by unencoded > in attribute values, and the fallback path has no length cap, but this is a minor improvement opportunity rather than a blocking issue for merge.
src/browser/client-fetch.ts — consider applying the suggested robustness improvement to the stripHtmlErrorText function (cap output length, improve regex handling of attributes with > characters)

_{Last reviewed commit: 3e8ab2b}

greptile-apps · 2026-03-10T10:04:42Z

+function stripHtmlErrorText(input: string): string {
+  return input
+    .replace(/<script[\s\S]*?<\/script>/gi, " ")
+    .replace(/<style[\s\S]*?<\/style>/gi, " ")
+    .replace(/<[^>]+>/g, " ")
+    .replace(/\s+/g, " ")
+    .trim();


> in attribute values can defeat the tag-stripping regex.

/<[^>]+>/g stops at the first > it encounters, so an attribute value that contains an unencoded > (e.g., aria-label="a > b", which some proxy error pages include) will leave the tail of the attribute and the closing > in the output. This is the fallback path that fires when neither <title> nor <h1> is found, so the resulting "summary" could contain raw attribute fragments and could be unboundedly long if the page body has significant text content.

Consider using a tighter pattern and capping the output length in the fallback:

Suggested change

function stripHtmlErrorText(input: string): string {

return input

.replace(/<script[\s\S]*?<\/script>/gi, " ")

.replace(/<style[\s\S]*?<\/style>/gi, " ")

.replace(/<[^>]+>/g, " ")

.replace(/\s+/g, " ")

.trim();

function stripHtmlErrorText(input: string): string {

return input

.replace(/<script[\s\S]*?<\/script>/gi, " ")

.replace(/<style[\s\S]*?<\/style>/gi, " ")

.replace(/<[^>]*>/g, " ")

.replace(/&[a-z]+;/gi, " ")

.replace(/\s+/g, " ")

.trim()

.slice(0, 200);

}

Note: [^>]* (zero-or-more instead of one-or-more) is a minor alignment fix, but the more actionable change is adding .slice(0, 200) so that an HTML page without a title or heading can't push a multi-kilobyte text dump into the thrown error message.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/browser/client-fetch.ts Line: 157-163 Comment: `>` in attribute values can defeat the tag-stripping regex. `/<[^>]+>/g` stops at the first `>` it encounters, so an attribute value that contains an unencoded `>` (e.g., `aria-label="a > b"`, which some proxy error pages include) will leave the tail of the attribute and the closing `>` in the output. This is the fallback path that fires when neither `<title>` nor `<h1>` is found, so the resulting "summary" could contain raw attribute fragments and could be unboundedly long if the page body has significant text content. Consider using a tighter pattern and capping the output length in the fallback: ```suggestion function stripHtmlErrorText(input: string): string { return input .replace(/<script[\s\S]*?<\/script>/gi, " ") .replace(/<style[\s\S]*?<\/style>/gi, " ") .replace(/<[^>]*>/g, " ") .replace(/&[a-z]+;/gi, " ") .replace(/\s+/g, " ") .trim() .slice(0, 200); } ``` Note: `[^>]*` (zero-or-more instead of one-or-more) is a minor alignment fix, but the more actionable change is adding `.slice(0, 200)` so that an HTML page without a title or heading can't push a multi-kilobyte text dump into the thrown error message. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e8ab2b689

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-10T10:06:55Z

+    };
+  };
+}> {
+  const payload = await callGatewayFromCli("config.get", opts, {});


Reuse short status timeout for config snapshot call

The new browser status path now always does config.get via callGatewayConfigSnapshot, but this helper does not pass a bounded timeout and therefore inherits the CLI default timeout from callGatewayFromCli (30s), unlike fetchBrowserStatus which explicitly uses 1500ms; when config.get is slow or partially unavailable, openclaw browser status can hang for tens of seconds before falling back to existing diagnostics, which is a responsiveness regression introduced by this change.

Useful? React with 👍 / 👎.

Owlock · 2026-03-24T20:36:29Z

Additional real-world validation from today (Mar 24): this PR’s layered diagnostics would have helped a lot in a WSL2 + Windows CDP outage triggered by virtualization mode toggling.

Repro chain:

bcdedit /set hypervisorlaunchtype off (game mode) -> reboot
bcdedit /set hypervisorlaunchtype auto -> reboot
Chrome CDP still healthy on Windows localhost (127.0.0.1:9222), but unreachable from WSL host endpoint (172.30.144.1:9222) until bridge recovery

Recovery that restored remote tabs:

ensure iphlpsvc running
recreate portproxy (0.0.0.0:9222 -> 127.0.0.1:9222)
allow inbound TCP 9222 in firewall
verify both /json/version checks (localhost and WSL-reachable host address)

I posted full field notes in #41553 as follow-up evidence. +1 on this direction.

openclaw-barnacle · 2026-04-27T04:33:37Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

clawsweeper · 2026-04-27T05:13:34Z

Codex review: needs changes before merge.

Summary
The PR updates exec PATH recovery, browser status/error diagnostics, CLI status rendering, and cron force-run lane scheduling with regression tests across those areas.

Reproducibility: yes. by source inspection. The active browser plugin status route/type/CLI lack layered diagnostics and client-fetch still throws raw non-rate-limit response text; the exec and cron failures are not reproduced because current main solves them through host=auto gateway fallback and CronNested.

Next step before merge
A narrow replacement can port the remaining browser diagnostics and bounded HTML-error handling to extensions/browser while leaving current main exec and cron behavior untouched.

Security
Cleared: No concrete supply-chain, secret-handling, permission, or command-execution security regression was found in the proposed diff.

Review findings

[P2] Port browser diagnostics to the active plugin route — src/browser/routes/basic.ts:74-79
[P2] Reuse the status timeout for config snapshots — src/cli/browser-cli-shared.ts:87
[P3] Cap HTML fallback summaries — src/browser/client-fetch.ts:156-163

Review details

Best possible solution:

Replace or split the PR: land a narrow browser plugin patch for layered status diagnostics and bounded HTML upstream-error summaries, leaving current main exec auto-routing and CronNested behavior unchanged.

Do we have a high-confidence way to reproduce the issue?

Yes, by source inspection. The active browser plugin status route/type/CLI lack layered diagnostics and client-fetch still throws raw non-rate-limit response text; the exec and cron failures are not reproduced because current main solves them through host=auto gateway fallback and CronNested.

Is this the best way to solve the issue?

No, not as submitted. The browser direction is useful, but the patch targets obsolete core browser paths and carries two boundedness/timeout issues, so a narrow extensions/browser replacement is the maintainable path.

Full review comments:

[P2] Port browser diagnostics to the active plugin route — src/browser/routes/basic.ts:74-79
Current main serves browser status from extensions/browser, but this PR adds the diagnostics payload under the old src/browser route. As submitted, the active browser plugin would still return no layered diagnostics after the browser move, so the remaining user-visible fix needs to be ported to the plugin paths.
Confidence: 0.9
[P2] Reuse the status timeout for config snapshots — src/cli/browser-cli-shared.ts:87
The new status path fetches config.get without the short timeout used for the status request. If the gateway config path is slow, openclaw browser status can wait on the default RPC timeout before falling back, which is a responsiveness regression.
Confidence: 0.86
[P3] Cap HTML fallback summaries — src/browser/client-fetch.ts:156-163
When an HTML error page has no title or h1, the fallback strips tags and returns the full remaining body. Large proxy pages can still produce multi-kilobyte browser errors, so the fallback should sanitize and cap the text before throwing it.
Confidence: 0.78

Overall correctness: patch is incorrect
Overall confidence: 0.87

Acceptance criteria:

pnpm test extensions/browser/src/browser/client.test.ts extensions/browser/src/browser/status-diagnostics.test.ts extensions/browser/src/browser/routes/basic.status-diagnostics.test.ts extensions/browser/src/cli/browser-cli-manage.test.ts extensions/browser/src/cli/browser-cli-manage.timeout-option.test.ts extensions/browser/src/cli/browser-cli-manage.status-diagnostics.test.ts
pnpm exec oxfmt --check --threads=1 extensions/browser/src/browser/client-fetch.ts extensions/browser/src/browser/routes/basic.ts extensions/browser/src/browser/status-diagnostics.ts extensions/browser/src/cli/browser-cli-manage.ts extensions/browser/src/cli/browser-cli-shared.ts CHANGELOG.md
pnpm check:changed in Testbox before handoff if runtime/source files are changed

What I checked:

active_browser_route: Current main builds browser status in the browser plugin route and returns the basic status object without a diagnostics payload. (extensions/browser/src/browser/routes/basic.ts:59, 89a15fddaf84)
active_browser_status_type: The active BrowserStatus type has status fields but no diagnostics field for JSON or CLI consumers. (extensions/browser/src/browser/client.types.ts:10, 89a15fddaf84)
active_browser_cli_gap: The active openclaw browser status command fetches and prints status fields only; it does not fetch config state or render layered diagnostics. (extensions/browser/src/cli/browser-cli-manage.ts:297, 89a15fddaf84)
html_error_gap: For non-rate-limit browser HTTP failures, current main still throws the raw response body or status string. (extensions/browser/src/browser/client-fetch.ts:264, 89a15fddaf84)
exec_mainline_solution: Current main resolves implicit host=auto to gateway when no sandbox exists, and the existing login-shell PATH probe runs on gateway execution. (src/agents/bash-tools.exec-runtime.ts:274, 89a15fddaf84)
cron_mainline_solution: Current main maps cron-owned inner agent work from the outer cron lane to cron-nested and applies cron concurrency to that lane. (src/agents/lanes.ts:18, 89a15fddaf84)

Likely related people:

@steipete: Current checkout blame for the active browser extension files and cron lane helpers points to Peter Steinberger at the shallow boundary commit, and the timeline also routed the prior review to this maintainer. (role: current maintainer / active-path owner; confidence: medium; commits: 585ce38015ef; files: extensions/browser/src/browser/routes/basic.ts, extensions/browser/src/browser/client-fetch.ts, extensions/browser/src/cli/browser-cli-manage.ts)
@vincentkoc: Recent visible main history and changelog entries show ongoing plugin/externalization and gateway-platform maintenance adjacent to the browser plugin surface, and the prior review timeline also routed this PR to that maintainer. (role: adjacent plugin/platform maintainer; confidence: low; commits: 89a15fddaf84, b8f6e16ba5ec; files: extensions/browser, src/plugins, src/gateway)

Remaining risk / open question:

A replacement must keep browser config diagnostics redacted, bounded, and timeout-limited so browser status stays responsive.
The WSL2 + Windows remote CDP field scenario still needs targeted validation after the browser plugin port.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 89a15fddaf84.

1998xiangcai-cmd added 3 commits March 10, 2026 17:45

fix(exec): resolve login-shell PATH for phantom sandbox fallback

7ea4d85

ux(browser): surface layered control-ui and CDP diagnostics

25deb76

fix(cron): avoid force-run cron lane self-deadlock

3e8ab2b

openclaw-barnacle Bot added gateway Gateway runtime cli CLI command changes agents Agent runtime and tooling size: L labels Mar 10, 2026

greptile-apps Bot reviewed Mar 10, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 10, 2026

View reviewed changes

clawsweeper Bot mentioned this pull request Apr 26, 2026

[UX] Surface layer-specific diagnostics for Control UI auth/origin vs remote CDP reachability in WSL2 + Windows setups #41553

Closed

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 27, 2026

BingqingLyu mentioned this pull request Apr 27, 2026

fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock BingqingLyu/openclaw#355

Open

20 tasks

openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock#42027

fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock#42027
1998xiangcai-cmd wants to merge 3 commits intoopenclaw:mainfrom
1998xiangcai-cmd:xiangcai/exec-browser-cron-fixes-41549-41553-41558

1998xiangcai-cmd commented Mar 10, 2026

Uh oh!

greptile-apps Bot commented Mar 10, 2026

Uh oh!

greptile-apps Bot Mar 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 10, 2026

Uh oh!

Owlock commented Mar 24, 2026

Uh oh!

openclaw-barnacle Bot commented Apr 27, 2026

Uh oh!

clawsweeper Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

1998xiangcai-cmd commented Mar 10, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

greptile-apps Bot commented Mar 10, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Owlock commented Mar 24, 2026

Uh oh!

openclaw-barnacle Bot commented Apr 27, 2026

Uh oh!

clawsweeper Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented Apr 27, 2026 •

edited

Loading