Skip to content

[codex] Add outcome fallback runtime contracts#71038

Closed
100yenadmin wants to merge 5 commits intoopenclaw:mainfrom
electricsheephq:contract-first/outcome-fallback-runtime-contracts
Closed

[codex] Add outcome fallback runtime contracts#71038
100yenadmin wants to merge 5 commits intoopenclaw:mainfrom
electricsheephq:contract-first/outcome-fallback-runtime-contracts

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented Apr 24, 2026

Summary

Adds the outcome/fallback contract rung from RFC #71004. This is test-only: it locks how GPT-5 terminal outcomes are classified for fallback and how the Codex app-server adapter preserves terminal state for OpenClaw-owned classification.

No production runtime behavior changes in this PR.

flowchart TD
  Attempt["Agent attempt result"] --> Classifier["OpenClaw outcome classifier"]
  Classifier --> Fallback["Model fallback chain"]
  Codex["Codex app-server projector"] --> Attempt
  Pi["Pi run result"] --> Attempt
  Classifier --> Silent["Intentional NO_REPLY stays terminal"]
  Classifier --> Effects["Tool/block side effects stay terminal"]
  Classifier --> Classified["empty / reasoning-only / planning-only advance fallback"]
Loading

Files Changed And Why

File Purpose
test/helpers/agents/outcome-fallback-runtime-contract.ts Shared GPT-5 outcome/fallback fixture data.
src/agents/outcome-fallback-runtime-contract.test.ts Shared/Pi classifier and model-fallback advancement rows.
extensions/codex/src/app-server/outcome-fallback-runtime-contract.test.ts Codex projector terminal-state preservation and fallback-readiness rows.

Contract Matrix

Path Covered behavior
Pi / shared fallback Harness-owned empty, reasoning-only, and planning-only classifications map to format fallback codes.
Pi / shared fallback Classified GPT-5 terminal results advance to the configured fallback model.
Pi / shared fallback Intentional NO_REPLY, visible replies, aborts, block replies, and tool side effects do not trigger fallback.
Codex app-server adapter Empty terminal turns remain empty results for OpenClaw-owned fallback classification.
Codex app-server adapter Reasoning-only terminal turns preserve reasoning mirror text without visible assistant output.
Codex app-server adapter Planning-only terminal turns preserve plan mirror text without visible assistant output.
Codex app-server adapter Codex-projected empty/reasoning/planning terminal results are fallback-ready once a harness classification exists.
Codex app-server adapter Exact NO_REPLY remains an intentional silent terminal reply.
Codex app-server adapter Tool side-effect telemetry keeps fallback disabled.

Important Boundary

This PR does not claim the current Codex harness already supplies classify(). It proves:

  • The shared OpenClaw fallback classifier advances correctly when an attempt result has a harness-owned terminal classification.
  • The Codex app-server projector preserves enough terminal state for that classification seam to be wired in the runtime-plan/harness-adapter phase.

How This Helps The RuntimePlan Work

The plan should own outcome classification and fallback semantics once. These tests prevent the GPT-5.4 stall class from reappearing when Pi/Codex behavior is migrated into that shared policy.

Reviewer Notes

  • Test-only, no Codex harness classify() implementation in this PR.
  • The synthetic Codex fallback-readiness rows are deliberately framed as readiness, not current harness behavior.
  • This captures the audit’s Bug 1 class plus NO_REPLY/side-effect non-fallback behavior.

Verification

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/outcome-fallback-runtime-contract.test.ts — 16 passed
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/outcome-fallback-runtime-contract.test.ts — 10 passed
  • ./node_modules/.bin/oxlint --tsconfig tsconfig.oxlint.core.json test/helpers/agents/outcome-fallback-runtime-contract.ts src/agents/outcome-fallback-runtime-contract.test.ts extensions/codex/src/app-server/outcome-fallback-runtime-contract.test.ts
  • git diff --check -- test/helpers/agents/outcome-fallback-runtime-contract.ts src/agents/outcome-fallback-runtime-contract.test.ts extensions/codex/src/app-server/outcome-fallback-runtime-contract.test.ts

Refs #71004
Follows #71009
Follows #71029

@100yenadmin 100yenadmin marked this pull request as ready for review April 24, 2026 09:15
Copilot AI review requested due to automatic review settings April 24, 2026 09:15
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR adds three test-only files that lock the outcome/fallback runtime contract: shared fixture helpers, Pi fallback classifier tests, and Codex app-server adapter tests. No production code is changed. The tests accurately reflect the classifier and projector behavior they target.

Confidence Score: 5/5

Safe to merge — no production code is changed and all findings are P2.

The single finding is a P2 style/maintainability issue in a test helper: the meta-merge ordering in createContractRunResult makes the durationMs default dead code. All current tests pass correctly because every call site explicitly provides durationMs. No P0/P1 issues were found.

test/helpers/agents/outcome-fallback-runtime-contract.ts — minor meta-merge ordering issue in the factory helper.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: test/helpers/agents/outcome-fallback-runtime-contract.ts
Line: 17-29

Comment:
The inner `...overrides.meta` spread is overwritten by the outer `...overrides` later in the same object literal, so the `durationMs: 1` default inside it is dead code when `meta` is passed. The current tests happen to work because every caller explicitly provides `durationMs: 1` in the override, but the intended deep-merge of meta defaults never actually fires. Destructuring the override avoids the ordering conflict and makes the default meaningful.

```suggestion
  const { meta: metaOverride, ...restOverrides } = overrides;
  return {
    payloads: [],
    didSendViaMessagingTool: false,
    messagingToolSentTexts: [],
    messagingToolSentMediaUrls: [],
    messagingToolSentTargets: [],
    successfulCronAdds: 0,
    ...restOverrides,
    meta: {
      durationMs: 1,
      ...metaOverride,
    },
  };
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "test: add outcome fallback runtime contr..." | Re-trigger Greptile

Comment thread test/helpers/agents/outcome-fallback-runtime-contract.ts
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds contract tests that lock down outcome classification and fallback behavior for GPT-5 terminal results across the Pi fallback classifier and the Codex app-server adapter (per RFC #71004 / #71004).

Changes:

  • Introduces a shared test fixture for constructing fallback configs and embedded run results for outcome/fallback contract coverage.
  • Adds Pi-side tests asserting harness classifications (empty, reasoning-only, planning-only) map to format fallback codes and advance the fallback chain when appropriate.
  • Adds Codex app-server projector tests asserting raw terminal state (empty turn, exact NO_REPLY, tool side-effect telemetry) is preserved for OpenClaw-owned classification.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
test/helpers/agents/outcome-fallback-runtime-contract.ts Adds shared constants + helpers for outcome/fallback contract tests.
src/agents/outcome-fallback-runtime-contract.test.ts Adds Pi classifier + runWithModelFallback contract assertions.
extensions/codex/src/app-server/outcome-fallback-runtime-contract.test.ts Adds Codex adapter contract assertions about preserved terminal state + telemetry.

Comment thread test/helpers/agents/outcome-fallback-runtime-contract.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09b180f27b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/outcome-fallback-runtime-contract.test.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@steipete
Copy link
Copy Markdown
Contributor

Closing as superseded by #71096, which consolidates this contract shard into the single RuntimePlan / Pi-Codex parity review branch.

Codex review note: keeping the individual shard open would split the same contract surface across multiple stale branches now that #71096 carries the combined stack.

@steipete steipete closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants