[codex] Add Pi/Codex runtime contract tests by 100yenadmin · Pull Request #71009 · openclaw/openclaw

100yenadmin · 2026-04-24T07:57:06Z

Summary

This is the first deliberately test-only rung from RFC #71004. It adds a shared Pi/Codex runtime contract fixture for OpenClaw-owned dynamic tools and locks the invariant that both harness paths must preserve the same before_tool_call, tool execution, tool-result middleware, after_tool_call, error, block, and telemetry semantics.

No production runtime behavior changes in this PR.

flowchart TD
  ToolCatalog["OpenClaw tool catalog"] --> Contract["OpenClaw-owned tool contract"]
  Contract --> Before["before_tool_call may mutate or block"]
  Before --> Pi["Pi adapter"]
  Before --> Codex["Codex app-server adapter"]
  Pi --> Execute["tool.execute receives final merged params"]
  Codex --> Execute
  Execute --> Middleware["Codex tool_result middleware"]
  Middleware --> After["after_tool_call sees final params/result/error"]
  Execute --> Telemetry["messaging/media telemetry"]

Why This Exists

#70965 showed the failure mode we want to prevent: Codex mode could accidentally bypass an OpenClaw-owned dynamic-tool hook contract because tool ownership was implicit. #70743 and #70772 showed the larger pattern: many GPT-5.4 Pi/Codex fixes are really runtime-contract problems, not isolated transport bugs.

This PR starts the safer path from #71004: capture parity as tests first, then refactor toward shared runtime contracts only after maintainers can see intended behavior in executable fixtures.

Files Changed And Why

File	Purpose
`test/helpers/agents/openclaw-owned-tool-runtime-contract.ts`	Shared fixture for installing OpenClaw-owned tool hooks and Codex result middleware in adapter tests.
`src/agents/openclaw-owned-tool-runtime-contract.test.ts`	Pi-side contract rows for mutation, blocking, errors, and messaging telemetry.
`extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts`	Codex app-server rows for wrapper ordering, middleware, no double wrapping, blocking, errors, and telemetry.

Contract Matrix

Contract	Pi adapter	Codex app-server adapter
`before_tool_call` runs for OpenClaw-owned dynamic tools	Yes	Yes
Partial hook param patches preserve original args and reach `tool.execute`	Yes	Yes
`after_tool_call` observes final merged params and successful result	Yes	Yes
Blocked tool fails closed and does not execute	Yes	Yes
Blocked/error tool reports through `after_tool_call`	Yes	Yes
Codex `tool_result` middleware runs before `after_tool_call` observes result	N/A	Yes
Already-wrapped tools are not double-wrapped	Existing Pi primitive	Yes
Successful messaging text/media/target telemetry is committed	Yes	Yes
Successful generated-media artifact telemetry is committed	Existing Pi media path	Yes
Messaging telemetry is not marked sent for blocked Codex message tool	N/A	Yes

How This Helps The RuntimePlan Work

This PR defines tool ownership before any shared AgentRuntimePlan exists. Later, the plan can produce the tool catalog/wrapping state once, and both Pi and Codex can consume it without reimplementing hook order or telemetry rules.

Reviewer Notes

This is intentionally one domain only: tools.
It does not change runtime behavior or add hook surfaces.
It should be reviewed as a contract baseline, not as a production bug fix.

Validation

node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts — 4 tests
node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts — 7 tests
./node_modules/.bin/oxlint --tsconfig tsconfig.oxlint.core.json test/helpers/agents/openclaw-owned-tool-runtime-contract.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts
git diff --check -- test/helpers/agents/openclaw-owned-tool-runtime-contract.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts

Refs #71004

Copilot

Pull request overview

Adds contract-style tests to lock parity between Pi and Codex harness paths for OpenClaw-owned dynamic tool execution (before/after hooks, param mutation, block semantics, and Codex tool_result middleware ordering), without changing production runtime behavior.

Changes:

Added a shared test helper fixture to install OpenClaw-owned tool hooks and Codex tool_result middleware.
Added Pi adapter contract tests covering param mutation propagation and fail-closed blocking behavior.
Added Codex app-server adapter contract tests covering wrapping, blocking, middleware ordering, error reporting, telemetry invariants, and no double-wrapping.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
test/helpers/agents/openclaw-owned-tool-runtime-contract.ts	Shared fixture for installing global tool hooks and Codex tool_result middleware in tests.
src/agents/openclaw-owned-tool-runtime-contract.test.ts	Pi adapter contract tests validating hook ordering and semantics for OpenClaw-owned dynamic tools.
extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts	Codex app-server adapter contract tests validating wrapping, middleware ordering, blocking, error reporting, and telemetry invariants.

greptile-apps · 2026-04-24T09:21:12Z

Greptile Summary

This PR adds a shared Pi/Codex runtime contract fixture and test suites to lock in the before_tool_call → execute → tool_result middleware → after_tool_call ownership invariant for OpenClaw-owned dynamic tools. No production changes are included. The overall logic is sound — the two-store design (global hook runner for before/after hooks, active plugin registry for Codex extension factories) works correctly, and the adjusted-params tracking via consumeAdjustedParamsForToolCall correctly surfaces mutated params to after_tool_call in both adapters. All findings are P2.

Confidence Score: 5/5

Safe to merge — test-only PR with no production behavior changes.

All findings are P2: one test-isolation gap (adjustedParamsByToolCallId not cleared by the reset helper), one silent-footgun risk for future test authors using installCodexToolResultMiddleware alone, and one type-safety erosion via as never casts. None of these affect the correctness of the contract tests in this PR or any production path.

test/helpers/agents/openclaw-owned-tool-runtime-contract.ts — resetOpenClawOwnedToolHooks cleanup and installCodexToolResultMiddleware usage guard.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Line: 69-72

Comment:
**Incomplete test isolation — `adjustedParamsByToolCallId` not cleared**

`resetOpenClawOwnedToolHooks` resets the global hook runner and active plugin registry, but leaves the module-level `adjustedParamsByToolCallId` map in `pi-tools.before-tool-call.ts` untouched. This map is populated by `wrapToolWithBeforeToolCallHook` and consumed by `runAgentHarnessAfterToolCallHook`. In the "does not double-wrap" Codex test, `runAgentHarnessAfterToolCallHook` is fire-and-forget (`void`) with no `vi.waitFor` assertion, so the entry for `run-wrapped:call-wrapped` may remain in the map after `afterEach` clears everything else. Future tests that reuse the same `runId:callId` pair would silently consume a stale adjusted-params entry, producing wrong `after_tool_call` `params` without any obvious failure signal. The fix is to call `__testing.adjustedParamsByToolCallId.clear()` (already exported under `__testing`) or export a dedicated reset from `pi-tools.before-tool-call.ts`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Line: 48-66

Comment:
**Silent hook gap when `installCodexToolResultMiddleware` is used alone**

`installCodexToolResultMiddleware` calls `setActivePluginRegistry` — a completely separate global state store from the one `installOpenClawOwnedToolHooks` sets via `initializeGlobalHookRunner`. A test that calls only `installCodexToolResultMiddleware` will have no `before_tool_call` or `after_tool_call` hooks registered, with no error or assertion to catch it. This is a footgun for future contract-test authors extending this fixture: they could write a middleware-focused test, forget the companion `installOpenClawOwnedToolHooks` call, and the hook assertions would simply never fire. Consider adding a guard in `installCodexToolResultMiddleware` that warns when the global hook runner is null, or documenting the required pairing clearly in JSDoc.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/openclaw-owned-tool-runtime-contract.test.ts
Line: 92-100

Comment:
**`as never` casts suppress type-safety on the test event boundary**

Both `handleToolExecutionStart` and `handleToolExecutionEnd` are called with `ctx as never` and the event object `as never`. If either function's signature changes (e.g., a required field is added to the context or the event discriminant expands), TypeScript will not catch the mismatch here, silently making the test an out-of-date fixture rather than a failing contract guard. The `createToolHandlerCtx` helper could be typed to the actual context type these handlers accept, even if that requires adding any missing optional fields with sensible test defaults.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "test: expand pi codex tool runtime contr..." | Re-trigger Greptile}

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

steipete · 2026-04-24T16:48:04Z

Closing as superseded by #71096, which consolidates this contract shard into the single RuntimePlan / Pi-Codex parity review branch.

Codex review note: keeping the individual shard open would split the same contract surface across multiple stale branches now that #71096 carries the combined stack.

test: add pi codex runtime contract coverage

3cef8ee

100yenadmin mentioned this pull request Apr 24, 2026

RFC: Contract-first Pi/Codex agent runtime rewrite #71004

Closed

openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: M labels Apr 24, 2026

100yenadmin mentioned this pull request Apr 24, 2026

Preserve dynamic tool hooks in Codex mode #70965

Merged

test: expand pi codex tool runtime contracts

a9dab9d

openclaw-barnacle Bot added size: L and removed size: M labels Apr 24, 2026

This was referenced Apr 24, 2026

[codex] Add auth profile runtime contracts #71029

Closed

[codex] Add outcome fallback runtime contracts #71038

Closed

[codex] Add delivery NO_REPLY runtime contracts #71039

Closed

100yenadmin marked this pull request as ready for review April 24, 2026 09:15

Copilot AI review requested due to automatic review settings April 24, 2026 09:15

Copilot started reviewing on behalf of 100yenadmin April 24, 2026 09:15 View session

100yenadmin mentioned this pull request Apr 24, 2026

[codex] Add transcript repair runtime contracts #71042

Closed

Copilot AI reviewed Apr 24, 2026

View reviewed changes

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread test/helpers/agents/openclaw-owned-tool-runtime-contract.ts

Comment thread test/helpers/agents/openclaw-owned-tool-runtime-contract.ts

Comment thread src/agents/openclaw-owned-tool-runtime-contract.test.ts Outdated

This was referenced Apr 24, 2026

[codex] Add prompt overlay runtime contracts #71044

Closed

[codex] Add schema normalization runtime contracts #71046

Closed

Eva added 4 commits April 24, 2026 16:52

test: tighten tool runtime contracts

e9b7dc7

test: reset tool contract param cache

33a77e8

test: document codex tool middleware fixture

08e1b10

test: type pi tool contract events

c6c9250

100yenadmin requested a review from Copilot April 24, 2026 10:15

Copilot started reviewing on behalf of 100yenadmin April 24, 2026 10:16 View session

test: satisfy pi tool contract test types

b54e3fe

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread test/helpers/agents/openclaw-owned-tool-runtime-contract.ts

test: cover tool media telemetry contracts

e27fe5c

test: reset plugin runtime after tool contracts

a2bfc83

100yenadmin requested a review from Copilot April 24, 2026 10:56

Copilot started reviewing on behalf of 100yenadmin April 24, 2026 10:56 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

This was referenced Apr 24, 2026

[codex] Add transport params runtime contracts #71048

Closed

[codex] Add contract-first Pi/Codex runtime plan suite #71096

Merged

steipete closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] Add Pi/Codex runtime contract tests#71009

[codex] Add Pi/Codex runtime contract tests#71009
100yenadmin wants to merge 9 commits intoopenclaw:mainfrom
electricsheephq:test/pi-codex-runtime-contracts

100yenadmin commented Apr 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

greptile-apps Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

steipete commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

100yenadmin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why This Exists

Files Changed And Why

Contract Matrix

How This Helps The RuntimePlan Work

Reviewer Notes

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

steipete commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

100yenadmin commented Apr 24, 2026 •

edited

Loading