Skip to content

RFC: Contract-first Pi/Codex agent runtime rewrite #71004

@100yenadmin

Description

@100yenadmin

Summary

This RFC tracks a contract-first path for stabilizing the Pi/Codex agent runtime boundary.

Important scope calibration: this issue is not a commitment to do a large rewrite immediately. It is an umbrella that explains why the Phase 1 contract-test suite exists, what those tests protect, and what the smallest safe next step would be if maintainers agree to continue.

The core finding is narrow:

  • The AgentHarness registry/SPI is already useful and should not be redesigned casually.
  • The missing boundary is runtime-policy ownership: tools, auth/profile resolution, prompt overlays, schema normalization, transcript repair, delivery, fallback classification, transport params, and observability are still split across Pi runner code, Codex app-server glue, transports, tools, auth, and channels.
  • As Codex takes over more execution paths, it can bypass or reassemble policy that Pi previously owned implicitly.

The goal is risk control: lock behavior first, then decide whether to introduce a shared prepared-turn plan. No production runtime refactor should be required just because this issue exists.

What This Issue Is / Is Not

This issue is:

This issue is not:

  • A mandate to implement every phase listed below.
  • A request to rewrite the harness SPI.
  • A reason to block the Phase 1 test-only PRs on future runtime refactors.
  • A commitment to split/rename pi-embedded-runner now.
  • A commitment to ship WS pooling, Harness V2, or any structural move before maintainers explicitly choose that path.

Current Status

Phase 1 contract-test PRs are open, test-only, and intentionally avoid production behavior changes.

PR Contract domain What it locks Status
#71009 Dynamic tools before_tool_call, execution, result middleware, after_tool_call, blocks/errors, telemetry, no double wrapping Ready/mergeable
#71029 Auth/profile openai/*, openai-codex/*, codex-cli/*, app-server startup/resume profile forwarding, no cross-provider leakage Ready/mergeable, with explicit future TODOs for full real codex/* harness startup
#71038 Outcome/fallback GPT-5 empty/reasoning-only/planning-only fallback classification, NO_REPLY, side-effect and block suppression, Codex terminal signal preservation Ready/mergeable
#71039 Delivery/NO_REPLY Silent reply suppression, media preservation, dispatcher fallback when origin routing is incomplete, Codex terminal text preservation Ready/mergeable, with JSON envelope NO_REPLY TODO
#71042 Transcript repair Text, structured, media, and data-URI-style orphan user-turn preservation; Codex projection behavior Ready/mergeable
#71044 Prompt overlays GPT-5 overlay provider scoping, OpenAI-family personality fallback, Codex provider contribution surface Ready/mergeable
#71046 Schema normalization Provider-prepared executable schemas across HTTP Responses, WS, compaction, and Codex dynamic-tool boundaries Ready/mergeable, with raw parameter-free strict parity TODO
#71048 Transport params GPT-5 OpenAI-family defaults, parallel_tool_calls, openai-codex-responses, WS warmup default, provider prep composition Ready/mergeable

Why This Is Needed

Recent evidence:

The safer discipline is contract-first: prove the intended behavior before moving ownership around.

Architecture Framing

flowchart TD
  UserTurn["User turn"] --> RuntimePlan["Optional shared AgentRuntimePlan"]
  RuntimePlan --> Tools["Tool catalog + hooks"]
  RuntimePlan --> Auth["Auth/profile resolution"]
  RuntimePlan --> Prompt["Prompt + overlays"]
  RuntimePlan --> Transcript["Transcript repair policy"]
  RuntimePlan --> Delivery["Channel delivery policy"]
  RuntimePlan --> Fallback["Outcome classification + fallback"]
  RuntimePlan --> Transport["Transport params + schema normalization"]
  RuntimePlan --> Observability["Resolved backend/model/auth/transport events"]

  RuntimePlan --> Pi["Pi adapter"]
  RuntimePlan --> Codex["Codex app-server adapter"]

  Pi --> Outcome["AgentTurnOutcome"]
  Codex --> Outcome
  Outcome --> Delivery
  Outcome --> Fallback
Loading

Desired ownership boundary:

  • OpenClaw owns tool catalog/hook behavior, auth/profile resolution, prompt overlays, transcript repair, channel delivery, fallback classification, transport params, schema normalization, and observability.
  • Pi owns the Pi model/session implementation.
  • Codex owns app-server startup, thread lifecycle, and model-loop mechanics.
  • Harness selection chooses an adapter; it should not duplicate OpenClaw policy.

What Phase 1 Gives Maintainers

Phase 1 gives maintainers a low-risk review baseline before any semantic migration:

  • Each PR is small, test-only, and scoped to one policy domain.
  • Each PR states what is covered and what is deliberately deferred.
  • Known gaps are explicit todo rows instead of hidden assumptions.
  • Later runtime changes can be judged by whether they keep these contracts green.
  • If a future Codex/Pi change breaks OpenClaw-owned behavior, the failure should happen in tests before users see it.

Known Rows To Carry Forward

These rows should not block Phase 1. They are reminders for whichever follow-up path maintainers choose.

Domain Deferred row Why deferred
Auth/profile Real codex/* harness startup preserving openai-codex:* auth profiles Requires full embedded runner + harness selection path, not just app-server unit surface
Auth/profile openai/* forced through Codex harness using OpenAI-Codex OAuth Crosses model selection, forced harness policy, and auth-provider validation
Delivery JSON/enveloped NO_REPLY suppression Requires production delivery-policy behavior change, not a test-only contract
Schema normalization Raw parameter-free HTTP/WS strict-compatible parity Requires schema normalization boundary migration
Transcript repair Codex shared transcript repair strategy Current PR covers projection only; shared repair strategy belongs in runtime-plan consumption
Transport params Codex app-server startup/turn effort config parity Adapter lifecycle concern for any Codex runtime-plan consumption

Minimal Next Decision

After the Phase 1 test-only PRs merge, maintainers can choose one of these paths:

  1. Stop there for now.

    The contracts still add value by preventing accidental Pi/Codex divergence.

  2. Add a small AgentRuntimePlan shape/prototype PR.

    This should be additive, internal, and mostly inert: define the prepared-turn object and producer tests, but do not force Pi or Codex to consume it yet.

  3. Migrate one domain through the plan as a proof point.

    Pick the highest-confidence domain, likely tools or auth/profile, and require the existing contract rows to stay green. Do not combine this with file moves or renames.

Anything beyond that should be a separate maintainer decision, not assumed by this RFC.

Candidate Follow-Ups, Not Current Commitments

These are possible later steps if the contract-first approach proves useful:

  • Shared AgentRuntimePlan consumption by Pi.
  • Shared AgentRuntimePlan consumption by Codex for OpenClaw-owned policy.
  • Optional internal Harness V2 adapter layer.
  • Runner split by ownership boundary.
  • Naming/observability cleanup for pi-embedded-runner / runEmbeddedPiAgent.
  • Optional WS session pooling/latency work.

They should land only as small reversible PRs with contract tests already in place.

Safety Rules

Acceptance Criteria For This RFC

This RFC succeeds if:

  • Maintainers can see why the Phase 1 contract suite exists.
  • Each OpenClaw-owned runtime policy domain has an executable parity baseline or an explicit deferred row.
  • Future Pi/Codex changes have a clear test surface before touching production runtime behavior.
  • Follow-up refactors are optional, explicit, and separately reviewable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions