agents: add strict-agentic execution contract and revise update_plan semantics#64241
Conversation
|
PR1 is now ready for review and the non-draft CI path has started running on I wasn’t able to use GitHub’s reviewer-request mutation from this fork context, so I’m tagging likely runtime maintainers here instead: @steipete @vincentkoc. This slice is intentionally narrow and GPT-5-first:
|
Greptile SummaryThis PR adds an opt-in Confidence Score: 5/5Safe to merge; the only finding is a minor P2 inconsistency in the blocked-state return. All logic paths for the strict-agentic retry/block cycle are correct and well-tested. The GPT-5 guard, per-agent contract resolution, and update_plan auto-enable gating are consistent across all call sites. The single P2 comment (missing finalAssistantVisibleText in the blocked-state meta) is a minor cosmetic inconsistency that does not affect runtime correctness or user-facing behavior. src/agents/pi-embedded-runner/run.ts around the blocked-state early return (line ~1562) — finalAssistantVisibleText is omitted from meta unlike every other exit path. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run.ts
Line: 1562-1581
Comment:
**Missing `finalAssistantVisibleText` in blocked-state meta**
Every other early-exit return in this function (the retry-limit path at ~L1488, the incomplete-turn path at ~L1611, and the success path at L1645) includes `finalAssistantVisibleText` in `meta`. The blocked-state return omits it, so the planning text that caused the agent to be blocked is not preserved for callers that use this field for logging or display. The variable is already in scope (computed at L1450), so adding it here is a one-liner.
```suggestion
return {
payloads: [
{
text: STRICT_AGENTIC_BLOCKED_TEXT,
isError: true,
},
],
meta: {
durationMs: Date.now() - started,
agentMeta,
aborted,
systemPromptReport: attempt.systemPromptReport,
finalAssistantVisibleText,
},
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "agents: catch structured plan-only turns" | Re-trigger Greptile |
9539b56 to
ee1c4b2
Compare
There was a problem hiding this comment.
Pull request overview
Adds an opt-in strict-agentic embedded-Pi execution contract (scoped to OpenAI/OpenAI-Codex GPT‑5 family) and reworks update_plan to behave as a structured progress signal rather than user-visible filler, including stricter handling of plan-only turns.
Changes:
- Introduces
agents.defaults.embeddedPi.executionContractplus per-agent overrides and plumbing to resolve the correct agent in no-session-key flows. - Revises
update_plantool output to be non-chatty (no text content) and tolerant of extra per-step fields; updates gating so it’s not auto-enabled outside strict-agentic unless explicitly configured. - Extends embedded-Pi runner “planning-only” detection (including structured bullet plans), treats
update_planas non-progress for retry logic, and fail-closes with an explicit blocked state after the retry cap in strict-agentic mode.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/config/zod-schema.agent-runtime.ts | Adds per-agent embeddedPi.executionContract config schema. |
| src/config/zod-schema.agent-defaults.ts | Adds default embeddedPi.executionContract config schema. |
| src/config/zod-schema.agent-defaults.test.ts | Tests parsing of embeddedPi.executionContract in defaults. |
| src/config/types.tools.ts | Updates experimental planTool docs/comments to reflect new gating intent. |
| src/config/types.agents.ts | Adds per-agent embeddedPi.executionContract override type surface. |
| src/config/types.agent-defaults.ts | Introduces EmbeddedPiExecutionContract type and documents semantics. |
| src/config/schema.labels.ts | Adds labels for new embeddedPi execution contract fields. |
| src/config/schema.help.ts | Adds help text for new execution contract fields and revised planTool help. |
| src/config/schema.base.generated.ts | Regenerates base schema to include new fields/help. |
| src/agents/tools/update-plan-tool.ts | Makes update_plan non-chatty and tolerant of extra step fields. |
| src/agents/tools/update-plan-tool.test.ts | Updates expectations and adds coverage for extra step fields. |
| src/agents/pi-tools.ts | Threads modelId into tool construction for model-aware gating. |
| src/agents/pi-embedded-runner/run/incomplete-turn.ts | Enhances planning-only detection and adds strict-agentic retry limits/blocked text. |
| src/agents/pi-embedded-runner/run.ts | Resolves execution contract per agent, applies retry cap, surfaces blocked state on exhaustion. |
| src/agents/pi-embedded-runner/run.incomplete-turn.test.ts | Adds coverage for strict-agentic retries/blocked state and bullet-plan detection. |
| src/agents/openclaw-tools.update-plan.test.ts | Updates/extends tests for new update_plan gating rules and agent override resolution. |
| src/agents/openclaw-tools.ts | Passes modelId and uses session-agent resolution that supports explicit agentId override. |
| src/agents/openclaw-tools.registration.ts | Changes update_plan gating to depend on strict-agentic activation (or explicit flag). |
| src/agents/agent-scope.ts | Adds execution contract resolution and strict-agentic activation helper. |
| docs/gateway/configuration-reference.md | Updates docs for new planTool default behavior under strict-agentic. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9539b56ce4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds an opt-in embedded-Pi execution contract (strict-agentic) to prevent GPT‑5-family OpenAI/Codex runs from “finishing” on plan-only turns, and revises update_plan to behave like structured progress state rather than user-visible filler.
Changes:
- Introduces
agents.defaults.embeddedPi.executionContractand per-agent overrides, with schema/labels/help/docs updates. - Revises
update_planto return structureddetailswith no chattycontent, and tolerates extra per-step fields. - Updates embedded Pi planning-only retry logic to treat
update_planas non-progress, detect structured/bulleted plans, retry (2x in strict-agentic), and fail closed with an explicit blocked response.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/config/zod-schema.agent-runtime.ts | Adds per-agent embeddedPi.executionContract to runtime agent entry schema. |
| src/config/zod-schema.agent-defaults.ts | Adds defaults-level embeddedPi.executionContract schema. |
| src/config/zod-schema.agent-defaults.test.ts | Validates defaults schema accepts embeddedPi.executionContract. |
| src/config/types.tools.ts | Updates planTool docs/comments to remove OpenAI auto-enable implication. |
| src/config/types.agents.ts | Adds per-agent embeddedPi.executionContract override typing. |
| src/config/types.agent-defaults.ts | Defines EmbeddedPiExecutionContract type and documents semantics. |
| src/config/schema.labels.ts | Adds labels for new embeddedPi contract fields (defaults + per-agent). |
| src/config/schema.help.ts | Adds help text for embeddedPi contract fields and revises planTool help. |
| src/config/schema.base.generated.ts | Updates generated base schema output for new fields/help text. |
| src/agents/tools/update-plan-tool.ts | Makes update_plan non-chatty (empty content) and tolerant of extra step fields. |
| src/agents/tools/update-plan-tool.test.ts | Updates expectations and adds coverage for ignoring extra per-step fields. |
| src/agents/pi-tools.ts | Threads modelId into tool creation for model/provider-specific gating. |
| src/agents/pi-embedded-runner/run/incomplete-turn.ts | Adds structured plan detection, update_plan non-progress handling, retry-limit helper, blocked text. |
| src/agents/pi-embedded-runner/run.ts | Resolves execution contract per session/agentId, applies strict-agentic retry cap + blocked fail-closed response. |
| src/agents/pi-embedded-runner/run.incomplete-turn.test.ts | Adds strict-agentic blocked-state test (explicit agentId/no sessionKey) + structured plan detection tests. |
| src/agents/openclaw-tools.update-plan.test.ts | Updates gating tests: no default auto-enable; strict-agentic GPT‑5 auto-enable; planTool=false override; explicit agentId resolution. |
| src/agents/openclaw-tools.ts | Passes modelId and agentId/session context into update_plan gating; uses resolveSessionAgentIds. |
| src/agents/openclaw-tools.registration.ts | Reworks update_plan gating to depend on strict-agentic activation logic. |
| src/agents/agent-scope.ts | Adds execution-contract resolution + strict-agentic activation predicate (provider/model scoped). |
| docs/gateway/configuration-reference.md | Updates documentation for new planTool defaults/auto-enable behavior under strict-agentic GPT‑5. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 23d0a57e9a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 310c27ab64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
CI is green on Scope is still intentionally narrow to #64228 only:
This PR is ready for maintainer review. |
|
Rebuilt this branch from a clean Current intended scope remains:
This should address the dirty-branch bot warning and make the diff reviewable again. |
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in embedded Pi “strict-agentic” execution contract (configurable globally and per-agent) and updates the update_plan tool semantics + gating so GPT‑5 OpenAI/Codex runs can be kept moving past plan-only turns, while update_plan becomes structured progress state rather than user-visible filler.
Changes:
- Add
agents.defaults.embeddedPi.executionContractand per-agentagents.list[].embeddedPi.executionContractplumbing across config types, Zod schema, labels/help, and generated schema. - Revise
update_plantool behavior (no chatty text output; tolerate extra per-step fields) and adjust gating so it’s only auto-enabled for supported strict-agentic GPT‑5 OpenAI/Codex runs (unless explicitly enabled/disabled). - Harden embedded runner planning-only retry logic: treat
update_planas non-progress, detect structured “Plan:”/bullet plans, allow more retries under strict-agentic, then fail closed with an explicit blocked state.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/config/zod-schema.agent-runtime.ts | Adds per-agent embeddedPi.executionContract validation. |
| src/config/zod-schema.agent-defaults.ts | Adds default embeddedPi.executionContract validation. |
| src/config/zod-schema.agent-defaults.test.ts | Tests acceptance of embeddedPi.executionContract in defaults schema. |
| src/config/types.tools.ts | Updates experimental tool JSDoc to reflect strict-agentic auto-enable semantics. |
| src/config/types.agents.ts | Adds per-agent embedded Pi override type surface. |
| src/config/types.agent-defaults.ts | Introduces EmbeddedPiExecutionContract and documents contract semantics. |
| src/config/schema.labels.ts | Adds UI/schema labels for new embedded Pi execution contract fields. |
| src/config/schema.help.ts | Adds help text for new execution contract fields; updates planTool help text. |
| src/config/schema.base.generated.ts | Regenerates base schema to include executionContract + updated planTool help. |
| src/agents/tools/update-plan-tool.ts | Makes update_plan non-chatty (content: []) and tolerant of extra step fields. |
| src/agents/tools/update-plan-tool.test.ts | Updates expectations for new tool result shape + adds extra-field tolerance test. |
| src/agents/pi-tools.ts | Threads modelId into OpenClaw tool creation for provider/model-specific gating. |
| src/agents/pi-embedded-runner/run/incomplete-turn.ts | Adds structured plan detection, treats update_plan as non-progress, introduces strict-agentic retry limit + blocked text. |
| src/agents/pi-embedded-runner/run.ts | Resolves execution contract for session/explicit agentId; applies strict-agentic retry cap and blocked-state surfacing. |
| src/agents/pi-embedded-runner/run.incomplete-turn.test.ts | Adds coverage for bullet-plan detection, update_plan non-progress handling, strict-agentic blocked state behavior. |
| src/agents/openclaw-tools.update-plan.test.ts | Reworks gating tests: default-off, strict-agentic GPT‑5 auto-enable, explicit overrides, explicit agentId resolution. |
| src/agents/openclaw-tools.ts | Adds modelId to tool gating inputs; resolves agent ids via resolveSessionAgentIds. |
| src/agents/openclaw-tools.registration.ts | Replaces OpenAI-provider auto-enable with strict-agentic execution-contract-aware gating. |
| src/agents/agent-scope.ts | Adds execution contract resolution helpers + strict-agentic activation predicate (provider/model gated). |
| docs/gateway/configuration-reference.md | Updates docs to match new planTool auto-enable rules under strict-agentic. |
Comments suppressed due to low confidence (1)
src/agents/pi-embedded-runner/run/incomplete-turn.ts:144
- shouldApplyPlanningOnlyRetryGuard() normalizes provider casing but does not trim modelId. Because resolveHookModelSelection can override modelId without trimming, it’s possible for strict-agentic to become active (isStrictAgenticExecutionContractActive trims modelId) while planning-only retry/blocked-state logic never triggers (regex fails on leading/trailing whitespace), allowing plan-only completion in strict-agentic runs. Trim modelId (and ideally normalize similarly to isStrictAgenticExecutionContractActive) before applying the /^gpt-5/ gate so strict-agentic behavior can’t be bypassed by whitespace in hook overrides.
function shouldApplyPlanningOnlyRetryGuard(params: {
provider?: string;
modelId?: string;
}): boolean {
const provider = normalizeLowercaseStringOrEmpty(params.provider);
if (provider !== "openai" && provider !== "openai-codex") {
return false;
}
return /^gpt-5(?:[.-]|$)/i.test(params.modelId ?? "");
}
|
Addressed the latest review pass on this branch. Changes in this push:
Local validation:
One note on |
|
Rebased this branch onto current Why this mattered:
Validation on the rebased head:
Scope is still intentionally unchanged: this is the same strict-agentic GPT-5-first embedded-Pi slice, just refreshed onto current |
3a7a574 to
cb1453d
Compare
|
Latest state on this slice:
Scope stays intentionally narrow:
At this point this should be the first merge candidate for the program. |
|
Current head The earlier bot findings around blocked-state meta, resolved paths, and trimmed model-id behavior are addressed on this rebased head. Scope remains intentionally narrow: strict-agentic execution only, GPT-5-first and embedded-Pi-scoped. This should be the first merge candidate. If the current head looks good from a maintainer perspective, I’d love formal review on this slice. |
cb1453d to
cd12729
Compare
|
@steipete finished these now for your review Chat GPT major upgrade fixes (make GPT 5.4 = as good as Opus 4.6 in Openclaw). cut down to 4 PR's now. |
There was a problem hiding this comment.
Pull request overview
Adds an opt-in strict-agentic embedded Pi execution contract and redefines update_plan as a quiet, structured progress tool (vs. user-visible filler), with stricter handling of plan-only turns for supported GPT‑5 OpenAI/Codex runs.
Changes:
- Introduces
agents.defaults.embeddedPi.executionContractplus per-agent overrides, and resolves the active contract via explicit-agent-aware lookup (works for no-session-key / hook / cron flows). - Revises
update_plansemantics: no chatty text output, tolerates extra per-step fields, and is no longer auto-enabled except for strict-agentic GPT‑5 OpenAI/Codex runs (or explicittools.experimental.planTool). - Updates embedded Pi runner planning-only retry logic: treats
update_planas non-progress, retries more in strict-agentic mode, and fail-closes with an explicit blocked response when retries are exhausted.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/config/zod-schema.agent-runtime.ts | Adds per-agent embeddedPi.executionContract schema support. |
| src/config/zod-schema.agent-defaults.ts | Adds defaults schema for embeddedPi.executionContract. |
| src/config/zod-schema.agent-defaults.test.ts | Tests parsing of the new defaults execution contract. |
| src/config/types.tools.ts | Updates docs for experimental tool gating semantics. |
| src/config/types.agents.ts | Adds per-agent embedded Pi override typing. |
| src/config/types.agent-defaults.ts | Defines EmbeddedPiExecutionContract and documents semantics. |
| src/config/schema.labels.ts | Adds labels for new embedded Pi config fields. |
| src/config/schema.help.ts | Adds help text clarifying strict-agentic vs default behavior and planTool gating. |
| src/config/schema.base.generated.ts | Regenerates base config schema to include new fields/help/labels. |
| src/agents/tools/update-plan-tool.ts | Makes update_plan non-chatty; accepts extra per-step fields; returns structured details. |
| src/agents/tools/update-plan-tool.test.ts | Updates assertions for empty content + structured details.plan and extra-field tolerance. |
| src/agents/pi-tools.ts | Threads modelId into OpenClaw tool creation for provider/model-specific gating. |
| src/agents/pi-embedded-runner/run/incomplete-turn.ts | Enhances planning-only detection and retry semantics; adds strict-agentic retry limits and blocked text. |
| src/agents/pi-embedded-runner/run.ts | Resolves execution contract per session/agent, applies strict-agentic retry cap, and surfaces blocked state. |
| src/agents/pi-embedded-runner/run.incomplete-turn.test.ts | Adds coverage for strict-agentic blocked state, bullet-plan detection, and update_plan-as-non-progress behavior. |
| src/agents/openclaw-tools.update-plan.test.ts | Updates gating tests: default-off, strict-agentic GPT‑5 auto-enable, and planTool: false override. |
| src/agents/openclaw-tools.ts | Passes model provider/id and explicit agentId into update_plan gating and agent resolution. |
| src/agents/openclaw-tools.registration.ts | Changes update_plan gating API to depend on strict-agentic activation (and config override). |
| src/agents/agent-scope.ts | Adds execution contract resolution and strict-agentic activation predicate based on agent config + provider/model. |
| docs/gateway/configuration-reference.md | Documents updated tools.experimental.planTool defaults and precedence rules. |
| docs/.generated/plugin-sdk-api-baseline.sha256 | Updates generated baseline hashes. |
| docs/.generated/config-baseline.sha256 | Updates generated config baseline hashes. |
b2c0414 to
7c31b62
Compare
7c31b62 to
03c38a0
Compare
|
Landed via rebase merge. Thanks @100yenadmin. Commits on
Validation before merge:
|
|
Thanks @steipete @vincentkoc ! 🖤 let me know what you think of the rest of the GPT upgrades: Merge order PR A (#64241) - merged PR D is the proof layer. It should not block review of runtime-correctness slices. Completion gate |
|
gotta crash out but gave you edit permissions on all these @steipete |
Summary
This is PR 1 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64228.
It adds an opt-in
strict-agenticexecution contract for embedded Pi agents and revisesupdate_planso it behaves like structured progress state instead of user-visible filler. The runtime now stops treating plan-only turns as acceptable completion in strict-agentic mode and fails closed with an explicit blocked response after the retry cap.PR 1 is intentionally GPT-5-first: in this slice,
strict-agenticonly activates for embedded Piopenaiandopenai-codexGPT-5-family runs. Unsupported providers/models keep default behavior unlesstools.experimental.planToolis explicitly enabled.What changed
agents.defaults.embeddedPi.executionContractplus per-agent override supportupdate_planupdate_planonly for explicittools.experimental.planToolor supported strict-agentic GPT-5 runsupdate_plannon-chatty and tolerant of extra step fieldsplanTool: falseeven when strict-agentic is configuredupdate_planas non-progress in plan-only retry logicWhy
GPT-5.4 / Codex currently stalls too easily after planning or recap-style turns. This slice makes that behavior opt-in fixable at the runtime-contract level without changing the default execution mode for every agent.
Non-goals
Builds on prior groundwork
Validation
Focused checks run:
CI=1 pnpm vitest run src/agents/openclaw-tools.update-plan.test.tsCI=1 pnpm vitest run src/agents/tools/update-plan-tool.test.tsCI=1 pnpm vitest run src/agents/pi-embedded-runner/run.incomplete-turn.test.tsCI=1 pnpm vitest run src/config/zod-schema.agent-defaults.test.tsCI=1 pnpm vitest run src/agents/system-prompt.test.tsCI=1 pnpm vitest run src/agents/tool-catalog.test.tsCI=1 pnpm vitest run src/agents/openclaw-tools.sessions.test.tsCI=1 pnpm vitest run src/agents/openclaw-tools.nodes-workspace-guard.test.tsCI=1 pnpm vitest run src/agents/pi-embedded-runner/system-prompt.test.tsCI=1 pnpm vitest run extensions/openai/transport-policy.test.tsCI=1 pnpm vitest run src/plugin-sdk/provider-tools.test.tsLinked issues