Skip to content

feat(exec): add normalized auto mode#70543

Merged
jesse-merhi merged 19 commits into
openclaw:mainfrom
vincentkoc:codex/exec-auto-mode
May 29, 2026
Merged

feat(exec): add normalized auto mode#70543
jesse-merhi merged 19 commits into
openclaw:mainfrom
vincentkoc:codex/exec-auto-mode

Conversation

@vincentkoc

@vincentkoc vincentkoc commented Apr 23, 2026

Copy link
Copy Markdown
Member

Summary

Adds normalized tools.exec.mode support for host exec policy and maps tools.exec.mode: "auto" to Guardian-reviewed Codex app-server execution.

The native auto path uses deterministic allowlist/safe-bin matches directly. On approval misses, it asks the model-backed exec reviewer for a one-shot allow decision. Anything else routes to human approval.

Scope Boundary

Included:

  • tools.exec.mode values: deny, allowlist, ask, auto, full.
  • Native gateway and node exec auto-review on approval misses.
  • Codex app-server mapping from OpenClaw auto to Guardian/auto_review approvals.
  • Plugin SDK subpath for exec approval policy helpers so the Codex plugin does not import core internals.
  • Docs/schema/baseline updates for the new config surface.

Not included:

  • No session-persisted execMode.
  • No execOverrides.mode.
  • No slash /exec mode=... directive surface.
  • No auto-reply/session patch persistence changes.
  • No unrelated bug-fix backlog items.

Behavior Notes

  • If native auto-review returns allow-once, OpenClaw records and resolves a suppressed one-shot approval.
  • If native auto-review returns ask, malformed output, or anything unsupported, OpenClaw routes to human approval.
  • If auto-review asks for human approval, askFallback=full does not execute the command.
  • Bound native Codex conversation binding rejects touched exec policies that require interactive human approvals, because that path cannot route the approval back to a human.
  • Direct node system.run auto-review requires the prepared systemRunPlan before approved scoped execution.
  • Security-audit suppression edits stay on the human approval path in auto mode.
  • Sandboxed exec still works when global tools.exec.mode: "auto" is configured and tools.exec.host stays on the default auto route.
  • Legacy security=full, ask=on-miss stays full execution behavior; only full/always prompts.

Verification

Release-candidate proof was completed on commit 7f39268bed3920270c02d3c72ffa10ee22138802.

  • Structured auto-review completed with zero accepted/actionable findings after the final changes.
  • Focused regression coverage passed for native exec auto-review, node system.run, gateway approval routing, sandbox exec fallback, config/schema behavior, Codex app-server policy mapping, SDK export, doctor/security warnings, and the release-matrix regressions found while validating the PR.
  • Full Release Validation run 26626309934 completed successfully.
  • The full validation umbrella reported these jobs green: Docker runtime image verification, release package artifact preparation, normal full CI, plugin prerelease validation, product performance evidence, release/live/Docker/QA validation, package Telegram E2E, and final validation aggregation.
  • Public child workflow evidence: CI 26626597167, Plugin Prerelease 26626597790, OpenClaw Release Checks 26626596624, OpenClaw Performance 26626596215, and NPM Telegram Beta E2E 26626763921 all completed successfully.

Real Behavior Proof

Behavior addressed: Native OpenClaw tools.exec.mode: "auto" and Codex app-server Guardian-reviewed approvals now share one normalized config surface.

Real environment tested: OpenClaw release-candidate validation in GitHub Actions on commit 7f39268bed3920270c02d3c72ffa10ee22138802, including normal CI, plugin prerelease, live provider/release checks, Docker/package acceptance, cross-platform release checks, product performance evidence, and Telegram package E2E. Focused local regression coverage also passed before the release-candidate matrix was dispatched.

Exact steps or command run after this patch: Ran final structured auto-review to zero accepted/actionable findings, then ran the full release validation matrix to completion on the PR head SHA.

Evidence after fix: Full Release Validation run 26626309934 completed successfully. Its child workflows for CI, Plugin Prerelease, OpenClaw Release Checks, OpenClaw Performance, and NPM Telegram Beta E2E all completed successfully.

Observed result after fix: Auto exec mode policy normalization, auto-review allow paths, auto-review-to-human paths, fallback-full blocking, sandbox exec with global auto mode, Codex app-server policy mapping, node system.run approval-plan enforcement, security-suppression human review, legacy full/on-miss reporting, config/schema behavior, and release-candidate live/Docker/package/Telegram paths are covered by passing proof.

What was not tested: No intentional release-candidate proof gaps remain from this validation pass. Manual exploratory UI review was not separately performed beyond the automated release/live matrix.

Compatibility / Migration

Existing tools.exec.security and tools.exec.ask continue to work. New configs should prefer tools.exec.mode for the normalized policy. Explicit lower-scope legacy policy values still take precedence where they already apply.

@aisle-research-bot

aisle-research-bot Bot commented Apr 23, 2026

Copy link
Copy Markdown

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Legacy execSecurity/execAsk overrides can be silently bypassed by persisted execMode
1. 🟠 Legacy execSecurity/execAsk overrides can be silently bypassed by persisted execMode
Property Value
Severity High
CWE CWE-284
Location src/agents/exec-defaults.ts:37-42

Description

resolveLayeredExecMode gives sessionEntry.execMode absolute precedence over legacy policy fields (execSecurity/execAsk). Because resolveExecModePolicy fully overrides security/ask whenever mode is set, a session (or agent/global config) that has an old/persisted execMode value can silently negate later attempts to restrict execution via legacy fields.

Impact:

  • If an operator/admin (or older client) sets execSecurity="deny" (or otherwise tightens policy) without also clearing execMode, the effective policy will still be derived from execMode (e.g., full/auto), allowing execution when it should be blocked.
  • This creates a policy-precedence “lock-in” where a previously set execMode can prevent future lockdown via legacy knobs, potentially leading to unintended command execution.

Vulnerable code:

if (params.sessionEntry?.execMode) {
  return params.sessionEntry.execMode as ExecMode;
}

This occurs before checking legacy overrides, and later:

  • resolveExecModePolicy({ mode, security: rawSecurity, ask: rawAsk }) ignores rawSecurity/rawAsk when mode is set.

Recommendation

Make legacy policy fields take precedence over mode when both are present, or enforce mutual exclusivity.

Option A (prefer legacy overrides):

function resolveLayeredExecMode({ sessionEntry, agentExec, globalExec }: Params): ExecMode | undefined {// If legacy knobs are set, ignore mode (forces resolveExecModePolicy to use security/ask)
  if (sessionEntry?.execSecurity !== undefined || sessionEntry?.execAsk !== undefined) {
    return undefined;
  }
  if (sessionEntry?.execMode) return sessionEntry.execMode as ExecMode;
  if (agentExec?.security !== undefined || agentExec?.ask !== undefined) return undefined;
  if (agentExec?.mode) return agentExec.mode;
  return globalExec?.mode;
}

Option B (reject mixed settings): when applying patches/config, if execSecurity/execAsk is set, automatically clear execMode (and vice versa), or return a validation error.

Also apply the same precedence rule anywhere else mode is resolved (e.g., node-host policy evaluation).


Analyzed PR: #70543 at commit 7b119e6

Last updated on: 2026-04-23T08:29:27Z

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation app: web-ui App: web-ui gateway Gateway runtime agents Agent runtime and tooling extensions: codex size: L maintainer Maintainer-authored PR labels Apr 23, 2026
@greptile-apps

greptile-apps Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a unified tools.exec.mode config surface (deny | allowlist | ask | auto | full) that normalizes the previously separate OpenClaw exec-approval knobs and Codex app-server Guardian config. It wires a new defaultExecAutoReviewer into the gateway and node-host paths, and maps mode=auto to Codex's guardian policy when no explicit Codex-side override is present.

  • P1 security bug in src/infra/exec-auto-review.ts: sed is placed in READ_ONLY_BINARIES but sed -i mutates files; the auto-reviewer will allow-once in-place edits at risk level "low", bypassing human approval under mode=auto.
  • P2: modePolicy.autoReview is computed but never forwarded in src/node-host/invoke-system-run.ts, so mode=auto has no auto-review effect on the embedded/pi-runner execution path.

Confidence Score: 4/5

Safe to merge after resolving the sed read-only misclassification, which allows file-mutating commands to bypass human review under mode=auto.

One P1 security defect: sed is incorrectly treated as read-only, meaning sed -i commands are auto-approved without human review. The rest of the normalization plumbing, layered override logic, Codex Guardian mapping, session/directive propagation, and tests are well-structured. The autoReview gap in invoke-system-run.ts is P2.

src/infra/exec-auto-review.ts — sed in READ_ONLY_BINARIES needs -i/--in-place flag guard or removal before this ships with mode=auto enabled.

Security Review

  • File mutation via sed -i auto-approved (src/infra/exec-auto-review.ts): sed is listed in READ_ONLY_BINARIES but supports in-place file editing via the -i flag. The reviewer's compound-operator and dangerous-token guards do not catch this, so mode=auto will silently approve file-mutating sed invocations without human review.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/exec-auto-review.ts
Line: 57

Comment:
**`sed` is not read-only — `-i` flag enables in-place file mutation**

`sed` is included in `READ_ONLY_BINARIES`, but `sed -i 's/old/new/' config.txt` silently rewrites files on disk. Because `commandLooksCompound` and `hasDangerousToken` both miss it (no compound operator, and `sed` is absent from the dangerous-token regex), `isReadOnlyCommand` returns `true` for any `sed` invocation and the auto-reviewer returns `allow-once` (risk: "low"), bypassing human review for a file-mutating command.

The `git` entry is correctly guarded via a subcommand allowlist; `sed` needs equivalent argv inspection (e.g. reject if any arg matches `-i` or `--in-place`) or should be removed from the set until that guard is added.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/node-host/invoke-system-run.ts
Line: 373-391

Comment:
**`modePolicy.autoReview` computed but never consumed**

`resolveExecModePolicy` returns an `autoReview` boolean, but only `modePolicy.security` and `modePolicy.ask` are forwarded from `evaluateSystemRunPolicyPhase`. Setting `tools.exec.mode="auto"` on the embedded/pi-runner path will correctly adjust security and ask values, but the auto-reviewer will never fire for commands that miss the allowlist on this host. If this is an intentional scope boundary for this PR, a comment documenting the gap would help future contributors.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(exec): add normalized auto mode" | Re-trigger Greptile

Comment thread src/infra/exec-auto-review.ts Outdated
"ls",
"pwd",
"rg",
"sed",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security sed is not read-only — -i flag enables in-place file mutation

sed is included in READ_ONLY_BINARIES, but sed -i 's/old/new/' config.txt silently rewrites files on disk. Because commandLooksCompound and hasDangerousToken both miss it (no compound operator, and sed is absent from the dangerous-token regex), isReadOnlyCommand returns true for any sed invocation and the auto-reviewer returns allow-once (risk: "low"), bypassing human review for a file-mutating command.

The git entry is correctly guarded via a subcommand allowlist; sed needs equivalent argv inspection (e.g. reject if any arg matches -i or --in-place) or should be removed from the set until that guard is added.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/exec-auto-review.ts
Line: 57

Comment:
**`sed` is not read-only — `-i` flag enables in-place file mutation**

`sed` is included in `READ_ONLY_BINARIES`, but `sed -i 's/old/new/' config.txt` silently rewrites files on disk. Because `commandLooksCompound` and `hasDangerousToken` both miss it (no compound operator, and `sed` is absent from the dangerous-token regex), `isReadOnlyCommand` returns `true` for any `sed` invocation and the auto-reviewer returns `allow-once` (risk: "low"), bypassing human review for a file-mutating command.

The `git` entry is correctly guarded via a subcommand allowlist; `sed` needs equivalent argv inspection (e.g. reject if any arg matches `-i` or `--in-place`) or should be removed from the set until that guard is added.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread src/node-host/invoke-system-run.ts Outdated
Comment on lines 373 to 391
): Promise<SystemRunPolicyPhase | null> {
const cfg = await loadSystemRunConfig(opts);
const agentExec = resolveAgentExecConfig(cfg, parsed.agentId);
const configuredSecurity = opts.resolveExecSecurity(
agentExec?.security ?? cfg.tools?.exec?.security,
);
const configuredAsk = opts.resolveExecAsk(agentExec?.ask ?? cfg.tools?.exec?.ask);
const globalExec = cfg.tools?.exec;
const mode =
agentExec?.mode ??
(agentExec?.security !== undefined || agentExec?.ask !== undefined
? undefined
: globalExec?.mode);
const modePolicy = resolveExecModePolicy({
mode,
security: opts.resolveExecSecurity(agentExec?.security ?? globalExec?.security),
ask: opts.resolveExecAsk(agentExec?.ask ?? globalExec?.ask),
});
const configuredSecurity = modePolicy.security;
const configuredAsk = modePolicy.ask;
const approvals = resolveExecApprovals(parsed.agentId, {
security: configuredSecurity,
ask: configuredAsk,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 modePolicy.autoReview computed but never consumed

resolveExecModePolicy returns an autoReview boolean, but only modePolicy.security and modePolicy.ask are forwarded from evaluateSystemRunPolicyPhase. Setting tools.exec.mode="auto" on the embedded/pi-runner path will correctly adjust security and ask values, but the auto-reviewer will never fire for commands that miss the allowlist on this host. If this is an intentional scope boundary for this PR, a comment documenting the gap would help future contributors.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/node-host/invoke-system-run.ts
Line: 373-391

Comment:
**`modePolicy.autoReview` computed but never consumed**

`resolveExecModePolicy` returns an `autoReview` boolean, but only `modePolicy.security` and `modePolicy.ask` are forwarded from `evaluateSystemRunPolicyPhase`. Setting `tools.exec.mode="auto"` on the embedded/pi-runner path will correctly adjust security and ask values, but the auto-reviewer will never fire for commands that miss the allowlist on this host. If this is an intentional scope boundary for this PR, a comment documenting the gap would help future contributors.

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b119e64fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/infra/exec-auto-review.ts Outdated
"ls",
"pwd",
"rg",
"sed",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove sed from read-only auto-review allowlist

The auto reviewer currently classifies every sed invocation as read-only, but sed supports in-place writes (for example, -i) without needing shell composition tokens. In tools.exec.mode="auto", an allowlist miss can therefore be auto-approved even when the command mutates files, which violates the intended “read-only inspection command” boundary and skips human approval for state-changing operations.

Useful? React with 👍 / 👎.

Comment thread src/infra/exec-auto-review.ts Outdated
]);

const GIT_READ_ONLY_SUBCOMMANDS = new Set([
"branch",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate git branch arguments before auto-approving

The reviewer marks git branch as read-only based only on subcommand name, but git branch mutates repository refs when given create/delete flags or branch names (for example, git branch feature-x, git branch -D old). With mode=auto, these approval misses can be auto-approved as low risk and execute without operator confirmation, allowing unintended repo mutations.

Useful? React with 👍 / 👎.

@vincentkoc vincentkoc self-assigned this May 18, 2026
@vincentkoc vincentkoc force-pushed the codex/exec-auto-mode branch from 7b119e6 to 05b0336 Compare May 18, 2026 04:41
@vincentkoc vincentkoc requested a review from a team as a code owner May 18, 2026 04:41
@openclaw-barnacle openclaw-barnacle Bot added cli CLI command changes size: XL and removed size: L labels May 18, 2026
@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 10:03 AM ET / 14:03 UTC.

Summary
Adds normalized tools.exec.mode and tools.exec.reviewer, wires native gateway/node exec auto-review plus Codex Guardian mapping, and updates docs, schema, SDK/protocol exports, tests, and release validation surfaces.

Reproducibility: not applicable. this is a feature/config PR, not a bug report. The behavior is source-reviewable in the PR head and supported by CI/release validation evidence in the PR body.

Review metrics: 2 noteworthy metrics.

  • Public exec config surfaces: 4 added. tools.exec.mode, tools.exec.reviewer, tools.exec.reviewer.model, and tools.exec.reviewer.timeoutMs affect operator config and upgrade expectations.
  • Public SDK/protocol surfaces: 3 added. One plugin SDK subpath plus two gateway approval request params become compatibility contracts for plugins and clients.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster ✨ media proof bonus
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Record maintainer/security acceptance of the exec auto-review boundary and new public contracts.
  • Let the current-head broad CI finish cleanly before landing.

Risk before merge

  • [P2] tools.exec.mode=auto intentionally lets a model-backed reviewer convert approval misses into one-shot host execution, so prompt-injection handling, allow-once scope, timeout behavior, and human fallback boundaries need explicit maintainer/security acceptance.
  • [P1] The new tools.exec.mode / tools.exec.reviewer config and same-scope mixed-policy rejection are compatibility-sensitive for existing operator configs, upgrades, and documented legacy security/ask behavior.
  • [P1] The new plugin SDK subpath and gateway approval params become public plugin/client contracts once shipped, so maintainers should be comfortable supporting them.
  • [P1] Release-candidate validation is strong but was recorded on 7f39268; the current head has proof/dependency guard success, while broader current-head CI was still finishing during this review snapshot.

Maintainer options:

  1. Accept The New Exec Boundary (recommended)
    Maintainers can land after recording acceptance of model-backed allow-once exec review, fallback behavior, and current-head validation for the public surfaces.
  2. Tighten Before Merge
    If the security boundary is not acceptable as-is, narrow auto behavior or make the strictest behavior the default before exposing the config surface.
  3. Pause Public Contract Expansion
    If the SDK/protocol/config contract is not ready to support, pause or close this branch and split out a smaller internal-only proof path.

Next step before merge

  • [P2] The remaining action is maintainer/security acceptance of protected, public config/API/security-boundary changes, not a narrow automated repair.

Security
Cleared: No concrete implementation security bug was found in this pass; the remaining security concern is maintainer acceptance of the intentional auto-exec review boundary.

Review details

Best possible solution:

Land this only after maintainers explicitly accept the exec auto-review security boundary and the new public config, SDK, and protocol contracts, with current-head checks green.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a feature/config PR, not a bug report. The behavior is source-reviewable in the PR head and supported by CI/release validation evidence in the PR body.

Is this the best way to solve the issue?

Unclear as a product/security decision: the implementation path is coherent, but maintainers still need to accept the normalized config contract and model-backed exec approval boundary before merge.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against af3e354ff8c8.

Label changes

Label changes:

  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (linked_artifact): The PR body includes structured real behavior proof via release-candidate validation, and the current head has a successful Real behavior proof job.
  • remove status: ⏳ waiting on author: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P2: This is a broad but non-emergency feature/config change with security and compatibility implications that needs normal maintainer review.
  • merge-risk: 🚨 compatibility: The PR adds public config, rejects same-scope mixed exec policy fields, and changes upgrade-sensitive exec policy precedence.
  • merge-risk: 🚨 auth-provider: The auto reviewer can use model/provider configuration and Codex app-server mappings that affect auth/model routing for exec review.
  • merge-risk: 🚨 security-boundary: The core change intentionally lets model-backed review approve one-shot host exec without a human prompt under mode=auto.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (linked_artifact): The PR body includes structured real behavior proof via release-candidate validation, and the current head has a successful Real behavior proof job.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured real behavior proof via release-candidate validation, and the current head has a successful Real behavior proof job.
Evidence reviewed

What I checked:

  • Repository policy applied: Read the full root AGENTS.md plus scoped guides for docs, extensions, agents, gateway, plugin SDK, server methods, embedded runner, agent tools, and scripts; their config, SDK, plugin-boundary, and security-boundary guidance drove the review risks. (AGENTS.md:1)
  • Protected maintainer PR: Live PR metadata shows the PR is open, authored by a MEMBER, mergeable but unstable, with head 49dcf43 and 91 changed files. (49dcf436e94e)
  • New exec config surface: ExecToolConfig now exposes mode plus reviewer model/timeout configuration, making this a public operator config change. (src/config/types.tools.ts:299, 49dcf436e94e)
  • Mode policy mapping: resolveExecPolicyForMode maps auto to allowlist/on-miss with autoReview: true, while full keeps no-prompt host execution. (src/infra/exec-approvals.ts:119, 49dcf436e94e)
  • Gateway auto-review boundary: Gateway auto-review only runs for a bound single parsed command, skips security-audit suppression edits, records allow-once decisions, and otherwise falls back to human approval. (src/agents/bash-tools.exec-host-gateway.ts:451, 49dcf436e94e)
  • Node auto-review boundary: Direct system.run auto-review is gated on autoReview, non-always ask mode, successful analysis, a bound argv, a prepared systemRunPlan, no inline-eval hit, no suppression edit, and no security=deny. (src/node-host/invoke-system-run.ts:612, 49dcf436e94e)

Likely related people:

  • steipete: Recent current-main commits touch gateway exec approval routing, Codex app-server runtime seams, plugin SDK exports, and exec trust hardening across the same decision surfaces. (role: recent area contributor; confidence: high; commits: bb46b79d3c14, 524185a68ea2, 5b79ab090168; files: src/agents/bash-tools.exec-host-gateway.ts, src/node-host/invoke-system-run.ts, extensions/codex/src/app-server/config.ts)
  • vincentkoc: Current-main history shows recent exec default, node-host, Codex auth, and plugin SDK work, and this PR also carries Vincent as the member author and co-author on the branch stack. (role: feature owner / adjacent contributor; confidence: high; commits: 74e7b8d47b18, 4d6593642e5a, 5ef812293b08; files: src/agents/exec-defaults.ts, src/node-host/invoke-system-run.ts, extensions/codex/src/app-server/config.ts)
  • joshavant: Josh authored the PR commits and also appears in current-main Codex execution hardening and native tool policy commits, so he is relevant beyond only opening the branch. (role: adjacent owner / branch implementer with prior main history; confidence: medium; commits: ba06376c7955, e57b137aef41, 49dcf436e94e; files: extensions/codex/src/app-server/config.ts, extensions/codex/src/conversation-binding.ts, src/agents/exec-auto-reviewer.ts)
  • amittell: Recent main history changed nested approval metadata and async follow-up behavior in the gateway exec approval path touched by this PR. (role: recent adjacent contributor; confidence: medium; commits: 34c441c746c8; files: src/agents/bash-tools.exec-host-gateway.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. P2 Normal backlog priority with limited blast radius. impact:security Security boundary, credential, authz, sandbox, or sensitive-data risk. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 18, 2026
@vincentkoc vincentkoc force-pushed the codex/exec-auto-mode branch from 05b0336 to 1d45724 Compare May 18, 2026 07:28
@clawsweeper clawsweeper Bot added status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed impact:security Security boundary, credential, authz, sandbox, or sensitive-data risk. labels May 18, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Gilded Branchling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: stacks clean commits.
Image traits: location flaky test forest; accessory little merge flag; palette plum, gold, and soft gray; mood sleepy but ready; pose standing beside its cracked shell; shell frosted glass shell; lighting bright celebratory glints; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Gilded Branchling in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@github-actions

Copy link
Copy Markdown
Contributor

Dependency Changes Detected

This PR changes dependency-related files. Maintainers should confirm these changes are intentional.

Changed files:

  • package.json
  • packages/plugin-sdk/package.json

Maintainer follow-up:

  • Review whether the dependency changes are intentional.
  • Inspect resolved package deltas when lockfile, shrinkwrap, or workspace dependency policy changes are present.
  • Treat package-lock.json and npm-shrinkwrap.json diffs as security-review surfaces.
  • Run pnpm deps:changes:report -- --base-ref origin/main --markdown /tmp/dependency-changes.md --json /tmp/dependency-changes.json locally for detailed release-style evidence.

@joshavant

Copy link
Copy Markdown
Contributor

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@joshavant

Copy link
Copy Markdown
Contributor

Completed 2

@jesse-merhi

Copy link
Copy Markdown
Member

/clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Dependency graph guard cleared

This PR no longer has blocked dependency graph changes. A future dependency graph change requires a fresh /allow-dependencies-change comment after the guard blocks that new head SHA.

  • Current SHA: 49dcf436e94e696388ff2a3a7aa2868d12c7272f

@jesse-merhi

Copy link
Copy Markdown
Member

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

joshavant and others added 19 commits May 29, 2026 23:49
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com>
Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com>
@jesse-merhi

Copy link
Copy Markdown
Member

Merged via rebase onto main at 65f6e53e623d8c7f92e2fe0d7bcf967b67a12c69.

Verification before merge:

pnpm plugin-sdk:api:gen
node scripts/run-vitest.mjs test/scripts/package-acceptance-workflow.test.ts src/gateway/gateway-models.profiles.live.test.ts src/plugins/install.npm-spec.e2e.test.ts src/agents/exec-auto-reviewer.test.ts src/infra/exec-auto-review.test.ts src/infra/exec-approvals-policy.test.ts
git diff --check origin/main..HEAD

Observed result:

Vitest: 5 files passed, 124 tests passed
git diff --check: clean

GitHub checks on the merged PR head 49dcf436e94e696388ff2a3a7aa2868d12c7272f were green before merge, including Dependency Guard 26641414038, CI 26641414393, CodeQL 26641414363, CodeQL Critical Quality 26641414413, OpenGrep PR Diff 26641414323, Workflow Sanity 26641414362, Real behavior proof 26641414021, and TUI PTY 26641414324.

The landed PR range contains 19 commits, from 80227005a0 through 65f6e53e62; each has the Vincent Koc and jesse-merhi co-author trailers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui channel: discord Channel integration: discord commands Command implementations docker Docker and sandbox tooling docs Improvements or additions to documentation extensions: codex gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. scripts Repository scripts size: XL status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants