RFC: Contract-first Pi/Codex agent runtime rewrite

## Summary

This RFC tracks a contract-first path for stabilizing the Pi/Codex agent runtime boundary.

Important scope calibration: this issue is **not** a commitment to do a large rewrite immediately. It is an umbrella that explains why the Phase 1 contract-test suite exists, what those tests protect, and what the smallest safe next step would be if maintainers agree to continue.

The core finding is narrow:

- The `AgentHarness` registry/SPI is already useful and should not be redesigned casually.
- The missing boundary is runtime-policy ownership: tools, auth/profile resolution, prompt overlays, schema normalization, transcript repair, delivery, fallback classification, transport params, and observability are still split across Pi runner code, Codex app-server glue, transports, tools, auth, and channels.
- As Codex takes over more execution paths, it can bypass or reassemble policy that Pi previously owned implicitly.

The goal is risk control: lock behavior first, then decide whether to introduce a shared prepared-turn plan. No production runtime refactor should be required just because this issue exists.

## What This Issue Is / Is Not

This issue **is**:

- A maintainer-facing explanation for the Phase 1 contract PRs.
- A map of OpenClaw-owned runtime policy that Pi and Codex should preserve consistently.
- A place to record known red rows that should not be forgotten if/when runtime-plan work starts.
- A guardrail against more whack-a-mole point fixes in #70743/#70772-style prototype branches.

This issue **is not**:

- A mandate to implement every phase listed below.
- A request to rewrite the harness SPI.
- A reason to block the Phase 1 test-only PRs on future runtime refactors.
- A commitment to split/rename `pi-embedded-runner` now.
- A commitment to ship WS pooling, Harness V2, or any structural move before maintainers explicitly choose that path.

## Current Status

Phase 1 contract-test PRs are open, test-only, and intentionally avoid production behavior changes.

| PR | Contract domain | What it locks | Status |
| --- | --- | --- | --- |
| #71009 | Dynamic tools | `before_tool_call`, execution, result middleware, `after_tool_call`, blocks/errors, telemetry, no double wrapping | Ready/mergeable |
| #71029 | Auth/profile | `openai/*`, `openai-codex/*`, `codex-cli/*`, app-server startup/resume profile forwarding, no cross-provider leakage | Ready/mergeable, with explicit future TODOs for full real `codex/*` harness startup |
| #71038 | Outcome/fallback | GPT-5 empty/reasoning-only/planning-only fallback classification, `NO_REPLY`, side-effect and block suppression, Codex terminal signal preservation | Ready/mergeable |
| #71039 | Delivery/NO_REPLY | Silent reply suppression, media preservation, dispatcher fallback when origin routing is incomplete, Codex terminal text preservation | Ready/mergeable, with JSON envelope `NO_REPLY` TODO |
| #71042 | Transcript repair | Text, structured, media, and data-URI-style orphan user-turn preservation; Codex projection behavior | Ready/mergeable |
| #71044 | Prompt overlays | GPT-5 overlay provider scoping, OpenAI-family personality fallback, Codex provider contribution surface | Ready/mergeable |
| #71046 | Schema normalization | Provider-prepared executable schemas across HTTP Responses, WS, compaction, and Codex dynamic-tool boundaries | Ready/mergeable, with raw parameter-free strict parity TODO |
| #71048 | Transport params | GPT-5 OpenAI-family defaults, `parallel_tool_calls`, `openai-codex-responses`, WS warmup default, provider prep composition | Ready/mergeable |

## Why This Is Needed

Recent evidence:

- #70965 showed the failure mode clearly: Codex mode could execute OpenClaw dynamic tools without preserving the existing `before_tool_call` contract because dynamic-tool ownership was implicit. The fix was correct, but the root cause was architectural.
- #70743 and #70772 showed the broader GPT-5.4 pattern: empty/planning-only/reasoning-only terminal outcomes, tool params, schema normalization, auth profile aliases, orphan turn repair, and follow-up delivery each needed separate fixes because policy was scattered across runner, transport, channel, plugin, and auth layers.
- #70760 improved harness observability, but observability alone does not prevent Codex and Pi from diverging on OpenClaw-owned behavior.

The safer discipline is contract-first: prove the intended behavior before moving ownership around.

## Architecture Framing

```mermaid
flowchart TD
  UserTurn["User turn"] --> RuntimePlan["Optional shared AgentRuntimePlan"]
  RuntimePlan --> Tools["Tool catalog + hooks"]
  RuntimePlan --> Auth["Auth/profile resolution"]
  RuntimePlan --> Prompt["Prompt + overlays"]
  RuntimePlan --> Transcript["Transcript repair policy"]
  RuntimePlan --> Delivery["Channel delivery policy"]
  RuntimePlan --> Fallback["Outcome classification + fallback"]
  RuntimePlan --> Transport["Transport params + schema normalization"]
  RuntimePlan --> Observability["Resolved backend/model/auth/transport events"]

  RuntimePlan --> Pi["Pi adapter"]
  RuntimePlan --> Codex["Codex app-server adapter"]

  Pi --> Outcome["AgentTurnOutcome"]
  Codex --> Outcome
  Outcome --> Delivery
  Outcome --> Fallback
```

Desired ownership boundary:

- OpenClaw owns tool catalog/hook behavior, auth/profile resolution, prompt overlays, transcript repair, channel delivery, fallback classification, transport params, schema normalization, and observability.
- Pi owns the Pi model/session implementation.
- Codex owns app-server startup, thread lifecycle, and model-loop mechanics.
- Harness selection chooses an adapter; it should not duplicate OpenClaw policy.

## What Phase 1 Gives Maintainers

Phase 1 gives maintainers a low-risk review baseline before any semantic migration:

- Each PR is small, test-only, and scoped to one policy domain.
- Each PR states what is covered and what is deliberately deferred.
- Known gaps are explicit `todo` rows instead of hidden assumptions.
- Later runtime changes can be judged by whether they keep these contracts green.
- If a future Codex/Pi change breaks OpenClaw-owned behavior, the failure should happen in tests before users see it.

## Known Rows To Carry Forward

These rows should not block Phase 1. They are reminders for whichever follow-up path maintainers choose.

| Domain | Deferred row | Why deferred |
| --- | --- | --- |
| Auth/profile | Real `codex/*` harness startup preserving `openai-codex:*` auth profiles | Requires full embedded runner + harness selection path, not just app-server unit surface |
| Auth/profile | `openai/*` forced through Codex harness using OpenAI-Codex OAuth | Crosses model selection, forced harness policy, and auth-provider validation |
| Delivery | JSON/enveloped `NO_REPLY` suppression | Requires production delivery-policy behavior change, not a test-only contract |
| Schema normalization | Raw parameter-free HTTP/WS strict-compatible parity | Requires schema normalization boundary migration |
| Transcript repair | Codex shared transcript repair strategy | Current PR covers projection only; shared repair strategy belongs in runtime-plan consumption |
| Transport params | Codex app-server startup/turn effort config parity | Adapter lifecycle concern for any Codex runtime-plan consumption |

## Minimal Next Decision

After the Phase 1 test-only PRs merge, maintainers can choose one of these paths:

1. Stop there for now.

   The contracts still add value by preventing accidental Pi/Codex divergence.

2. Add a small `AgentRuntimePlan` shape/prototype PR.

   This should be additive, internal, and mostly inert: define the prepared-turn object and producer tests, but do not force Pi or Codex to consume it yet.

3. Migrate one domain through the plan as a proof point.

   Pick the highest-confidence domain, likely tools or auth/profile, and require the existing contract rows to stay green. Do not combine this with file moves or renames.

Anything beyond that should be a separate maintainer decision, not assumed by this RFC.

## Candidate Follow-Ups, Not Current Commitments

These are possible later steps if the contract-first approach proves useful:

- Shared `AgentRuntimePlan` consumption by Pi.
- Shared `AgentRuntimePlan` consumption by Codex for OpenClaw-owned policy.
- Optional internal Harness V2 adapter layer.
- Runner split by ownership boundary.
- Naming/observability cleanup for `pi-embedded-runner` / `runEmbeddedPiAgent`.
- Optional WS session pooling/latency work.

They should land only as small reversible PRs with contract tests already in place.

## Safety Rules

- Do not ship hook surfaces before contract tests prove default behavior.
- Do not split files before parity behavior is locked.
- Do not let Codex own OpenClaw runtime policy.
- Do not remove user-visible functionality during any migration.
- Keep structural and behavioral changes in separate PRs.
- Keep #70743/#70772 frozen as evidence/prototypes unless maintainers request targeted fixes.

## Acceptance Criteria For This RFC

This RFC succeeds if:

- Maintainers can see why the Phase 1 contract suite exists.
- Each OpenClaw-owned runtime policy domain has an executable parity baseline or an explicit deferred row.
- Future Pi/Codex changes have a clear test surface before touching production runtime behavior.
- Follow-up refactors are optional, explicit, and separately reviewable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Contract-first Pi/Codex agent runtime rewrite #71004

Summary

What This Issue Is / Is Not

Current Status

Why This Is Needed

Architecture Framing

What Phase 1 Gives Maintainers

Known Rows To Carry Forward

Minimal Next Decision

Candidate Follow-Ups, Not Current Commitments

Safety Rules

Acceptance Criteria For This RFC

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PR	Contract domain	What it locks	Status
#71009	Dynamic tools	`before_tool_call`, execution, result middleware, `after_tool_call`, blocks/errors, telemetry, no double wrapping	Ready/mergeable
#71029	Auth/profile	`openai/`, `openai-codex/`, `codex-cli/*`, app-server startup/resume profile forwarding, no cross-provider leakage	Ready/mergeable, with explicit future TODOs for full real `codex/*` harness startup
#71038	Outcome/fallback	GPT-5 empty/reasoning-only/planning-only fallback classification, `NO_REPLY`, side-effect and block suppression, Codex terminal signal preservation	Ready/mergeable
#71039	Delivery/NO_REPLY	Silent reply suppression, media preservation, dispatcher fallback when origin routing is incomplete, Codex terminal text preservation	Ready/mergeable, with JSON envelope `NO_REPLY` TODO
#71042	Transcript repair	Text, structured, media, and data-URI-style orphan user-turn preservation; Codex projection behavior	Ready/mergeable
#71044	Prompt overlays	GPT-5 overlay provider scoping, OpenAI-family personality fallback, Codex provider contribution surface	Ready/mergeable
#71046	Schema normalization	Provider-prepared executable schemas across HTTP Responses, WS, compaction, and Codex dynamic-tool boundaries	Ready/mergeable, with raw parameter-free strict parity TODO
#71048	Transport params	GPT-5 OpenAI-family defaults, `parallel_tool_calls`, `openai-codex-responses`, WS warmup default, provider prep composition	Ready/mergeable

Domain	Deferred row	Why deferred
Auth/profile	Real `codex/` harness startup preserving `openai-codex:` auth profiles	Requires full embedded runner + harness selection path, not just app-server unit surface
Auth/profile	`openai/*` forced through Codex harness using OpenAI-Codex OAuth	Crosses model selection, forced harness policy, and auth-provider validation
Delivery	JSON/enveloped `NO_REPLY` suppression	Requires production delivery-policy behavior change, not a test-only contract
Schema normalization	Raw parameter-free HTTP/WS strict-compatible parity	Requires schema normalization boundary migration
Transcript repair	Codex shared transcript repair strategy	Current PR covers projection only; shared repair strategy belongs in runtime-plan consumption
Transport params	Codex app-server startup/turn effort config parity	Adapter lifecycle concern for any Codex runtime-plan consumption

Uh oh!

RFC: Contract-first Pi/Codex agent runtime rewrite #71004

Description

Summary

What This Issue Is / Is Not

Current Status

Why This Is Needed

Architecture Framing

What Phase 1 Gives Maintainers

Known Rows To Carry Forward

Minimal Next Decision

Candidate Follow-Ups, Not Current Commitments

Safety Rules

Acceptance Criteria For This RFC

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions