Skip to content

[codex] Add Pi/Codex runtime contract tests#71009

Closed
100yenadmin wants to merge 9 commits intoopenclaw:mainfrom
electricsheephq:test/pi-codex-runtime-contracts
Closed

[codex] Add Pi/Codex runtime contract tests#71009
100yenadmin wants to merge 9 commits intoopenclaw:mainfrom
electricsheephq:test/pi-codex-runtime-contracts

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented Apr 24, 2026

Summary

This is the first deliberately test-only rung from RFC #71004. It adds a shared Pi/Codex runtime contract fixture for OpenClaw-owned dynamic tools and locks the invariant that both harness paths must preserve the same before_tool_call, tool execution, tool-result middleware, after_tool_call, error, block, and telemetry semantics.

No production runtime behavior changes in this PR.

flowchart TD
  ToolCatalog["OpenClaw tool catalog"] --> Contract["OpenClaw-owned tool contract"]
  Contract --> Before["before_tool_call may mutate or block"]
  Before --> Pi["Pi adapter"]
  Before --> Codex["Codex app-server adapter"]
  Pi --> Execute["tool.execute receives final merged params"]
  Codex --> Execute
  Execute --> Middleware["Codex tool_result middleware"]
  Middleware --> After["after_tool_call sees final params/result/error"]
  Execute --> Telemetry["messaging/media telemetry"]
Loading

Why This Exists

#70965 showed the failure mode we want to prevent: Codex mode could accidentally bypass an OpenClaw-owned dynamic-tool hook contract because tool ownership was implicit. #70743 and #70772 showed the larger pattern: many GPT-5.4 Pi/Codex fixes are really runtime-contract problems, not isolated transport bugs.

This PR starts the safer path from #71004: capture parity as tests first, then refactor toward shared runtime contracts only after maintainers can see intended behavior in executable fixtures.

Files Changed And Why

File Purpose
test/helpers/agents/openclaw-owned-tool-runtime-contract.ts Shared fixture for installing OpenClaw-owned tool hooks and Codex result middleware in adapter tests.
src/agents/openclaw-owned-tool-runtime-contract.test.ts Pi-side contract rows for mutation, blocking, errors, and messaging telemetry.
extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts Codex app-server rows for wrapper ordering, middleware, no double wrapping, blocking, errors, and telemetry.

Contract Matrix

Contract Pi adapter Codex app-server adapter
before_tool_call runs for OpenClaw-owned dynamic tools Yes Yes
Partial hook param patches preserve original args and reach tool.execute Yes Yes
after_tool_call observes final merged params and successful result Yes Yes
Blocked tool fails closed and does not execute Yes Yes
Blocked/error tool reports through after_tool_call Yes Yes
Codex tool_result middleware runs before after_tool_call observes result N/A Yes
Already-wrapped tools are not double-wrapped Existing Pi primitive Yes
Successful messaging text/media/target telemetry is committed Yes Yes
Successful generated-media artifact telemetry is committed Existing Pi media path Yes
Messaging telemetry is not marked sent for blocked Codex message tool N/A Yes

How This Helps The RuntimePlan Work

This PR defines tool ownership before any shared AgentRuntimePlan exists. Later, the plan can produce the tool catalog/wrapping state once, and both Pi and Codex can consume it without reimplementing hook order or telemetry rules.

Reviewer Notes

  • This is intentionally one domain only: tools.
  • It does not change runtime behavior or add hook surfaces.
  • It should be reviewed as a contract baseline, not as a production bug fix.

Validation

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts — 4 tests
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts — 7 tests
  • ./node_modules/.bin/oxlint --tsconfig tsconfig.oxlint.core.json test/helpers/agents/openclaw-owned-tool-runtime-contract.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts
  • git diff --check -- test/helpers/agents/openclaw-owned-tool-runtime-contract.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts

Refs #71004

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds contract-style tests to lock parity between Pi and Codex harness paths for OpenClaw-owned dynamic tool execution (before/after hooks, param mutation, block semantics, and Codex tool_result middleware ordering), without changing production runtime behavior.

Changes:

  • Added a shared test helper fixture to install OpenClaw-owned tool hooks and Codex tool_result middleware.
  • Added Pi adapter contract tests covering param mutation propagation and fail-closed blocking behavior.
  • Added Codex app-server adapter contract tests covering wrapping, blocking, middleware ordering, error reporting, telemetry invariants, and no double-wrapping.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
test/helpers/agents/openclaw-owned-tool-runtime-contract.ts Shared fixture for installing global tool hooks and Codex tool_result middleware in tests.
src/agents/openclaw-owned-tool-runtime-contract.test.ts Pi adapter contract tests validating hook ordering and semantics for OpenClaw-owned dynamic tools.
extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts Codex app-server adapter contract tests validating wrapping, middleware ordering, blocking, error reporting, and telemetry invariants.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR adds a shared Pi/Codex runtime contract fixture and test suites to lock in the before_tool_call → execute → tool_result middleware → after_tool_call ownership invariant for OpenClaw-owned dynamic tools. No production changes are included. The overall logic is sound — the two-store design (global hook runner for before/after hooks, active plugin registry for Codex extension factories) works correctly, and the adjusted-params tracking via consumeAdjustedParamsForToolCall correctly surfaces mutated params to after_tool_call in both adapters. All findings are P2.

Confidence Score: 5/5

Safe to merge — test-only PR with no production behavior changes.

All findings are P2: one test-isolation gap (adjustedParamsByToolCallId not cleared by the reset helper), one silent-footgun risk for future test authors using installCodexToolResultMiddleware alone, and one type-safety erosion via as never casts. None of these affect the correctness of the contract tests in this PR or any production path.

test/helpers/agents/openclaw-owned-tool-runtime-contract.ts — resetOpenClawOwnedToolHooks cleanup and installCodexToolResultMiddleware usage guard.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Line: 69-72

Comment:
**Incomplete test isolation — `adjustedParamsByToolCallId` not cleared**

`resetOpenClawOwnedToolHooks` resets the global hook runner and active plugin registry, but leaves the module-level `adjustedParamsByToolCallId` map in `pi-tools.before-tool-call.ts` untouched. This map is populated by `wrapToolWithBeforeToolCallHook` and consumed by `runAgentHarnessAfterToolCallHook`. In the "does not double-wrap" Codex test, `runAgentHarnessAfterToolCallHook` is fire-and-forget (`void`) with no `vi.waitFor` assertion, so the entry for `run-wrapped:call-wrapped` may remain in the map after `afterEach` clears everything else. Future tests that reuse the same `runId:callId` pair would silently consume a stale adjusted-params entry, producing wrong `after_tool_call` `params` without any obvious failure signal. The fix is to call `__testing.adjustedParamsByToolCallId.clear()` (already exported under `__testing`) or export a dedicated reset from `pi-tools.before-tool-call.ts`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Line: 48-66

Comment:
**Silent hook gap when `installCodexToolResultMiddleware` is used alone**

`installCodexToolResultMiddleware` calls `setActivePluginRegistry` — a completely separate global state store from the one `installOpenClawOwnedToolHooks` sets via `initializeGlobalHookRunner`. A test that calls only `installCodexToolResultMiddleware` will have no `before_tool_call` or `after_tool_call` hooks registered, with no error or assertion to catch it. This is a footgun for future contract-test authors extending this fixture: they could write a middleware-focused test, forget the companion `installOpenClawOwnedToolHooks` call, and the hook assertions would simply never fire. Consider adding a guard in `installCodexToolResultMiddleware` that warns when the global hook runner is null, or documenting the required pairing clearly in JSDoc.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/openclaw-owned-tool-runtime-contract.test.ts
Line: 92-100

Comment:
**`as never` casts suppress type-safety on the test event boundary**

Both `handleToolExecutionStart` and `handleToolExecutionEnd` are called with `ctx as never` and the event object `as never`. If either function's signature changes (e.g., a required field is added to the context or the event discriminant expands), TypeScript will not catch the mismatch here, silently making the test an out-of-date fixture rather than a failing contract guard. The `createToolHandlerCtx` helper could be typed to the actual context type these handlers accept, even if that requires adding any missing optional fields with sensible test defaults.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "test: expand pi codex tool runtime contr..." | Re-trigger Greptile

Comment thread test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Comment thread test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Comment thread src/agents/openclaw-owned-tool-runtime-contract.test.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@steipete
Copy link
Copy Markdown
Contributor

Closing as superseded by #71096, which consolidates this contract shard into the single RuntimePlan / Pi-Codex parity review branch.

Codex review note: keeping the individual shard open would split the same contract surface across multiple stale branches now that #71096 carries the combined stack.

@steipete steipete closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants