[codex] Add generic OpenClaw code mode#80600
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd8ec3e1ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| function rejectsModuleAccess(code: string): boolean { | ||
| return /(^|[^\w$])import\s*(?:\(|[\s{*]|\w)|(^|[^\w$])require\s*\(/u.test(code); |
There was a problem hiding this comment.
Narrow module-access regex to avoid blocking valid code
The new module-access guard rejects many safe snippets because the import branch matches any identifier that starts with import (for example const important = 1 or even string content containing important). In rejectsModuleAccess, import\s*(?:\(|[\s{*]|\w) allows \w immediately after import, so regular code is incorrectly treated as forbidden module access and exec returns code mode module access is disabled. This makes code mode fail on common user programs and should be restricted to actual import syntax (or replaced with a parser-based check).
Useful? React with 👍 / 👎.
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: yes. for the review findings: source inspection shows maxOutputBytes is not available inside the worker before returning values/output, and the module regex still matches unmasked template literal text. I did not run the PR branch because this review is read-only. Real behavior proof Next step before merge Security Review findings
Review detailsBest possible solution: Keep the feature opt-in, but enforce all resource limits inside the worker before postMessage, fix source scanning false positives, add real-run proof, and require maintainer/security sign-off before merge. Do we have a high-confidence way to reproduce the issue? Yes for the review findings: source inspection shows maxOutputBytes is not available inside the worker before returning values/output, and the module regex still matches unmasked template literal text. I did not run the PR branch because this review is read-only. Is this the best way to solve the issue? No, not as-is. The direction is plausible, but the maintainable path is to move serialized output enforcement into the worker, replace or improve the source scanner, then validate a real code-mode run before merge. Full review comments:
Overall correctness: patch is incorrect Security concerns:
Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 3f80f889fab7. |
akrimm702
left a comment
There was a problem hiding this comment.
I found one issue that looks worth addressing before merging.
| const output = takeOutput(vm); | ||
| const resultHandle = getResultHandle(vm); | ||
| try { | ||
| if ( |
There was a problem hiding this comment.
[P2] Fail instead of suspending with no bridge requests
When Code Mode user code awaits a promise that is not backed by a bridge call, for example await new Promise(() => {}), this branch returns waiting with an empty pendingRequests list. The wait tool then restores the snapshot, reaches the same pending-promise/no-request state again, and returns another empty waiting result. At that point the model has no actionable pending tool call and can only remain stuck until the snapshot expires.
Please only suspend when there is at least one host bridge request to settle. Otherwise, fail cleanly with a timeout or unresolved-pending-promise error.
1bb1d29 to
f61a088
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f61a088a9d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const ready = await waitForPending(state.pending, state.config.timeoutMs); | ||
| if (!ready) { |
There was a problem hiding this comment.
Guard code-mode waits against concurrent resume races
runWait reads shared state from activeRuns, then awaits waitForPending(...) before removing the run from the map; two parallel wait calls for the same runId can both pass this point and both resume the same snapshot. In that scenario, the guest program is resumed twice and nested tool calls can execute twice, which can duplicate side effects on external tools. Mark the run as in-progress (or remove it) before the await, with rollback if still waiting.
Useful? React with 👍 / 👎.
|
Tested this against the built worker ( Clawsweeper P1 and P2 both reproduce live. Timeout error message is empty. Triggered the QuickJS interrupt handler with {
"status": "failed",
"error": " at <anonymous> (openclaw-code-mode:user.js:1:1)\n at <eval> (openclaw-code-mode:user.js:3:1)\n",
"code": "internal_error",
"output": []
}No "timeout exceeded" string, just an empty stack frame. Operators triaging a code-mode failure will not have an obvious signal that the cause was the timeout vs a generic JS error. Probably worth raising a typed error in the
Heads up on CI: the Scope: focus of my read was parent-process state management, ctx scoping, and worker control flow against the built artifact. Did not exercise memory limits, the snapshot/output byte caps, the TypeScript transpile path, or the full pi-coding-agent integration with a live model. Re-review progress:
|
pashpashpash
left a comment
There was a problem hiding this comment.
looks fine to me as it keeps a clean separation from codex harness mode (codex harness has its own code mode that we use).
as long as this pr keeps doing that, and generic code mode stuff doesn't leak into codex harness mode - i'm happy
|
Re-reviewed after the approval. I do not think this is merge-ready yet. The approval looks scoped to keeping generic OpenClaw code mode separate from Codex harness mode. That separation is important, but it does not clear the current runtime/code-execution blockers. Latest merge-base recheck:
Current blockers I still see:
Local verification while re-reviewing:
Recommendation: keep the approval as signal for the Codex-harness separation, but do not merge until the unresolved runtime safety threads, failing CI/proof gate, and current-main merge conflict are addressed. |
f61a088 to
d8d8f75
Compare
Dependency Changes DetectedThis PR changes dependency-related files. Maintainers should confirm these changes are intentional. Changed files:
Maintainer follow-up:
|
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
d8d8f75 to
bd4da03
Compare
|
Verification for current head
Re-review progress:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bd4da03e68
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| continue; | ||
| } | ||
| if (char === "'" || char === '"') { |
There was a problem hiding this comment.
Mask template literals before module-access scan
maskCodeLiteralsAndComments only strips ' and " strings, so backtick template literals are scanned as code by rejectsModuleAccess. That makes safe snippets like const msg = `import docs later`; fail with code mode module access is disabled even though they never import a module. Fresh evidence: this exact snippet still matches the current regex path because the template literal text is left unmasked. Please mask template-literal contents (while still scanning ${...} expressions) before running the module-access regex.
Useful? React with 👍 / 👎.
|
Landed via rebase merge onto
|
Summary
Adds experimental OpenClaw code mode: an opt-in agent tool mode where the provider sees only
execandwait, while enabled OpenClaw/plugin/MCP/client tools remain available through a run-scoped hidden catalog bridge.This is OpenClaw-owned and generic. It is separate from Codex Code mode, does not import Codex internals, and keeps nested tool calls inside the existing OpenClaw approval, hook, session, transcript, and accounting paths.
What Changed
src/agents/code-mode.tsplus a QuickJS-WASI worker entry insrc/agents/code-mode.worker.ts.execcells, with lazy TypeScript transpilation and bounded timeout, output, snapshot, pending-call, and catalog-search limits.tools.search(...),tools.call(...),text(...),json(...), andyield_control(...).tools.codeMode.enabled === true, leaving onlyexecandwaitprovider-visible.tools.codeModeand documents the feature atdocs/reference/code-mode.md.Runtime Contract
Code mode activates only when explicitly enabled and when the run installs the code-mode controls. Plain config objects such as
tools.codeMode: { timeoutMs: 5000 }do not implicitly enable it.When active, the model can search and call hidden enabled tools from bounded guest code, but those calls still route through OpenClaw's existing tool execution path. Raw no-tool runs,
disableTools, and empty allowlists do not trigger code-mode provider-surface enforcement.Suspended runs are bounded before persistence, and completed outputs are bounded before returning to the model. The worker URL resolver handles built package layouts so
dist/agents/code-mode.jsresolves the siblingdist/agents/code-mode.worker.jsentry.Verification
pnpm deadcode:dependenciespnpm lint --threads=8pnpm check:test-typespnpm test src/agents/code-mode.test.ts src/agents/pi-embedded-runner/openai-stream-wrappers.test.ts src/agents/pi-embedded-runner/run/attempt.test.tspnpm buildpnpm check:docsgit diff --check origin/main...HEADReal behavior proof
Behavior addressed: Opt-in OpenClaw code mode exposes only
execandwaitto the OpenAI Responses model while nested OpenClaw tools remain available behind the hidden catalog bridge.Real environment tested: Local OpenClaw embedded Pi agent on macOS from rebased PR head
bd4da03e68dfeb4e8a899a4d059e9f2769184290, isolatedOPENCLAW_STATE_DIR, OpenAI Responses provider, liveopenai/gpt-5.5, realOPENAI_API_KEYfrom the shell.Exact steps or command run after this patch: Ran
pnpm openclaw agent --local --agent main --session-id code-mode-live-ship-<timestamp> --model openai/gpt-5.5 --thinking low --json --timeout 240withtools.allow=["session_status"],tools.codeMode.enabled=true,OPENCLAW_AGENT_HARNESS_FALLBACK=none,OPENCLAW_DEBUG_CODE_MODE=1, andOPENCLAW_DEBUG_MODEL_PAYLOAD=tools.Evidence after fix: Copied live terminal output from the run:
Observed result after fix: Provider payload debug showed
tools=count=2 names=exec,wait; the saved transcripts contained provider-visibleexecandwaitplus nestedsession_status; the assistant returned exactlyCODEMODE_LIVE_SHIP_OK.What was not tested: No additional live providers beyond OpenAI Responses/
gpt-5.5in this final ship rerun; local focused tests andcheck:changedcover the non-live runtime hardening paths.