[codex] Add Pi/Codex runtime contract tests#71009
[codex] Add Pi/Codex runtime contract tests#71009100yenadmin wants to merge 9 commits intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds contract-style tests to lock parity between Pi and Codex harness paths for OpenClaw-owned dynamic tool execution (before/after hooks, param mutation, block semantics, and Codex tool_result middleware ordering), without changing production runtime behavior.
Changes:
- Added a shared test helper fixture to install OpenClaw-owned tool hooks and Codex tool_result middleware.
- Added Pi adapter contract tests covering param mutation propagation and fail-closed blocking behavior.
- Added Codex app-server adapter contract tests covering wrapping, blocking, middleware ordering, error reporting, telemetry invariants, and no double-wrapping.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| test/helpers/agents/openclaw-owned-tool-runtime-contract.ts | Shared fixture for installing global tool hooks and Codex tool_result middleware in tests. |
| src/agents/openclaw-owned-tool-runtime-contract.test.ts | Pi adapter contract tests validating hook ordering and semantics for OpenClaw-owned dynamic tools. |
| extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts | Codex app-server adapter contract tests validating wrapping, middleware ordering, blocking, error reporting, and telemetry invariants. |
Greptile SummaryThis PR adds a shared Pi/Codex runtime contract fixture and test suites to lock in the Confidence Score: 5/5Safe to merge — test-only PR with no production behavior changes. All findings are P2: one test-isolation gap (adjustedParamsByToolCallId not cleared by the reset helper), one silent-footgun risk for future test authors using installCodexToolResultMiddleware alone, and one type-safety erosion via as never casts. None of these affect the correctness of the contract tests in this PR or any production path. test/helpers/agents/openclaw-owned-tool-runtime-contract.ts — resetOpenClawOwnedToolHooks cleanup and installCodexToolResultMiddleware usage guard. Prompt To Fix All With AIThis is a comment left during a code review.
Path: test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Line: 69-72
Comment:
**Incomplete test isolation — `adjustedParamsByToolCallId` not cleared**
`resetOpenClawOwnedToolHooks` resets the global hook runner and active plugin registry, but leaves the module-level `adjustedParamsByToolCallId` map in `pi-tools.before-tool-call.ts` untouched. This map is populated by `wrapToolWithBeforeToolCallHook` and consumed by `runAgentHarnessAfterToolCallHook`. In the "does not double-wrap" Codex test, `runAgentHarnessAfterToolCallHook` is fire-and-forget (`void`) with no `vi.waitFor` assertion, so the entry for `run-wrapped:call-wrapped` may remain in the map after `afterEach` clears everything else. Future tests that reuse the same `runId:callId` pair would silently consume a stale adjusted-params entry, producing wrong `after_tool_call` `params` without any obvious failure signal. The fix is to call `__testing.adjustedParamsByToolCallId.clear()` (already exported under `__testing`) or export a dedicated reset from `pi-tools.before-tool-call.ts`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: test/helpers/agents/openclaw-owned-tool-runtime-contract.ts
Line: 48-66
Comment:
**Silent hook gap when `installCodexToolResultMiddleware` is used alone**
`installCodexToolResultMiddleware` calls `setActivePluginRegistry` — a completely separate global state store from the one `installOpenClawOwnedToolHooks` sets via `initializeGlobalHookRunner`. A test that calls only `installCodexToolResultMiddleware` will have no `before_tool_call` or `after_tool_call` hooks registered, with no error or assertion to catch it. This is a footgun for future contract-test authors extending this fixture: they could write a middleware-focused test, forget the companion `installOpenClawOwnedToolHooks` call, and the hook assertions would simply never fire. Consider adding a guard in `installCodexToolResultMiddleware` that warns when the global hook runner is null, or documenting the required pairing clearly in JSDoc.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/openclaw-owned-tool-runtime-contract.test.ts
Line: 92-100
Comment:
**`as never` casts suppress type-safety on the test event boundary**
Both `handleToolExecutionStart` and `handleToolExecutionEnd` are called with `ctx as never` and the event object `as never`. If either function's signature changes (e.g., a required field is added to the context or the event discriminant expands), TypeScript will not catch the mismatch here, silently making the test an out-of-date fixture rather than a failing contract guard. The `createToolHandlerCtx` helper could be typed to the actual context type these handlers accept, even if that requires adding any missing optional fields with sensible test defaults.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "test: expand pi codex tool runtime contr..." | Re-trigger Greptile |
Summary
This is the first deliberately test-only rung from RFC #71004. It adds a shared Pi/Codex runtime contract fixture for OpenClaw-owned dynamic tools and locks the invariant that both harness paths must preserve the same
before_tool_call, tool execution, tool-result middleware,after_tool_call, error, block, and telemetry semantics.No production runtime behavior changes in this PR.
Why This Exists
#70965 showed the failure mode we want to prevent: Codex mode could accidentally bypass an OpenClaw-owned dynamic-tool hook contract because tool ownership was implicit. #70743 and #70772 showed the larger pattern: many GPT-5.4 Pi/Codex fixes are really runtime-contract problems, not isolated transport bugs.
This PR starts the safer path from #71004: capture parity as tests first, then refactor toward shared runtime contracts only after maintainers can see intended behavior in executable fixtures.
Files Changed And Why
test/helpers/agents/openclaw-owned-tool-runtime-contract.tssrc/agents/openclaw-owned-tool-runtime-contract.test.tsextensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.tsContract Matrix
before_tool_callruns for OpenClaw-owned dynamic toolstool.executeafter_tool_callobserves final merged params and successful resultafter_tool_calltool_resultmiddleware runs beforeafter_tool_callobserves resultHow This Helps The RuntimePlan Work
This PR defines tool ownership before any shared
AgentRuntimePlanexists. Later, the plan can produce the tool catalog/wrapping state once, and both Pi and Codex can consume it without reimplementing hook order or telemetry rules.Reviewer Notes
Validation
node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts— 4 testsnode scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.ts— 7 tests./node_modules/.bin/oxlint --tsconfig tsconfig.oxlint.core.json test/helpers/agents/openclaw-owned-tool-runtime-contract.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.tsgit diff --check -- test/helpers/agents/openclaw-owned-tool-runtime-contract.ts src/agents/openclaw-owned-tool-runtime-contract.test.ts extensions/codex/src/app-server/openclaw-owned-tool-runtime-contract.test.tsRefs #71004