Skip to content

Add Codex happy path prompt snapshots#75807

Merged
pashpashpash merged 17 commits intomainfrom
codex/prompt-snapshot-happy-path
May 2, 2026
Merged

Add Codex happy path prompt snapshots#75807
pashpashpash merged 17 commits intomainfrom
codex/prompt-snapshot-happy-path

Conversation

@pashpashpash
Copy link
Copy Markdown
Contributor

After the message-tool and heartbeat changes, the default Codex path is important enough that we should be able to inspect it without reconstructing a live run by hand. This adds committed prompt snapshots for that happy path: an OpenAI model running through the Codex harness/runtime, message-tool-only visible replies, Telegram direct chat, Discord group chat, and a heartbeat turn with the structured heartbeat tool available.

The snapshots are generated from the same OpenClaw prompt composition pieces the Codex app-server path uses. They include the OpenClaw-owned developer instructions, selected thread start/resume params, turn input, and dynamic tool specs. They intentionally do not try to render Codex's hidden base prompt or turn-scoped collaboration-mode instructions, since those belong to the Codex runtime rather than OpenClaw.

This also adds pnpm prompt:snapshots:gen and pnpm prompt:snapshots:check, plus a test that keeps the committed artifacts current. While generating the snapshots, one remaining hardcoded NO_REPLY instruction surfaced in the TTS tool description; that is removed here so message-tool mode keeps a single silence convention: do not call the visible message tool when no visible reply is needed.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation scripts Repository scripts agents Agent runtime and tooling extensions: codex size: XL maintainer Maintainer-authored PR labels May 1, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 1, 2026

Codex review: needs maintainer review before merge.

Summary
The PR adds generator-backed Codex/message-tool happy-path prompt snapshots and scripts, documents them, removes a hardcoded TTS NO_REPLY instruction, and includes supporting tests plus CI-unblocking plugin/type metadata fixes.

Reproducibility: not applicable. for the requested feature; the snapshot drift behavior is covered by pnpm prompt:snapshots:check and test/scripts/prompt-snapshots.test.ts, but this read-only review did not run tests.

Next step before merge
No automated repair is appropriate because the latest branch has no discrete fixable finding; the remaining action is maintainer review and exact-head CI completion.

Security
Cleared: No concrete security or supply-chain concern found in the diff.

Review details

Best possible solution:

Keep this PR on the normal maintainer review path and land it only after exact-head CI completes and a maintainer accepts the broader adjunct fixes.

Do we have a high-confidence way to reproduce the issue?

Not applicable for the requested feature; the snapshot drift behavior is covered by pnpm prompt:snapshots:check and test/scripts/prompt-snapshots.test.ts, but this read-only review did not run tests.

Is this the best way to solve the issue?

Yes. Generator-backed snapshots using the Codex app-server composition helpers are the narrow maintainable direction, and the latest stale-file cleanup makes the documented regeneration command repair check drift.

What I checked:

  • Current main gap: Current main has no prompt:snapshots package scripts or happy-path prompt snapshot fixture directory, and the TTS tool description still hardcodes the NO_REPLY silence token. (src/agents/tools/tts-tool.ts:69, 3f2c3a69d76d)
  • Stale snapshot repair: The latest generator deletes stale committed .md/.json snapshot artifacts before writing the freshly generated set, which addresses the earlier ClawSweeper finding. (scripts/generate-prompt-snapshots.ts:96, 641f2deef120)
  • Drift coverage: The new Vitest file compares generated prompt snapshots with committed artifacts and verifies stale generated snapshot deletion. (test/scripts/prompt-snapshots.test.ts:12, 641f2deef120)
  • Prompt composition source: The snapshot helper builds Telegram direct, Discord group, and heartbeat scenarios from the same reply/context and Codex app-server helper functions exposed through the Codex test API. (test/helpers/agents/happy-path-prompt-snapshots.ts:333, 641f2deef120)
  • Live PR state: The GitHub API reports this PR open at head 641f2de with the protected maintainer label, maintainer_can_modify=false, mergeable=true, and mergeable_state=unstable. (641f2deef120)
  • Exact-head checks: At the latest check-runs query, 24 checks were successful, CodeQL was neutral, four were skipped, and checks-node-core was still in progress for the exact head SHA. (641f2deef120)

Likely related people:

  • steipete: Recent mainline commits own the Codex app-server lifecycle/runtime-loading area and the current TTS tool surface that this PR snapshots and adjusts. (role: recent maintainer and adjacent owner; confidence: high; commits: 81e1deade2b1, e5dc3f712e5c, f7ed29e11812; files: extensions/codex/src/app-server/thread-lifecycle.ts, extensions/codex/src/app-server/run-attempt.ts, src/agents/tools/tts-tool.ts)
  • pashpashpash: Prior merged main history added structured heartbeat responses, Codex tool replies, and Codex native hook behavior that this snapshot suite is intended to lock down; this is domain history beyond opening this PR. (role: introduced related behavior; confidence: medium; commits: 439d8edf68e2, 7a958d920c88; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/thread-lifecycle.ts)

Remaining risk / open question:

  • One exact-head CI check, checks-node-core, was still in progress at the latest API check.
  • The branch includes several CI-unblocking plugin/control-plane fixes beyond the prompt snapshot surface, so maintainer acceptance of that broader bundled diff is still needed.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3f2c3a69d76d.

@pashpashpash
Copy link
Copy Markdown
Contributor Author

Addressed the ClawSweeper finding in d4f912b859cbbd5a563524819ad37d15bb9dad95.

The Discord snapshot now builds its dynamic tool catalog from the Discord group context instead of reusing Telegram-derived tools, and the full dynamic tool JSON is split per scenario: Telegram direct, Discord group, and heartbeat. I also fixed the CI failure from the built-artifact shard by resolving the repo-local node_modules/.bin/oxfmt binary instead of assuming oxfmt is on PATH.

Checked locally with pnpm prompt:snapshots:check, pnpm test test/scripts/prompt-snapshots.test.ts src/agents/tools/tts-tool.test.ts, targeted oxfmt --check, git diff --check, and pnpm lint:scripts. Testbox tbx_01kqjnaz9qkhe5rspjpck5247s ran OPENCLAW_TESTBOX=1 pnpm check:changed successfully on the updated diff.

@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch from d4f912b to 4787266 Compare May 1, 2026 21:08
@pashpashpash
Copy link
Copy Markdown
Contributor Author

Addressed the follow-up ClawSweeper P3 in d4d153c.

pnpm prompt:snapshots:check now also enumerates committed .md and .json files in the happy-path snapshot directory and fails on stale files that are no longer generated, matching the stale-artifact coverage already present in the Vitest snapshot test.

Checked locally with pnpm exec oxfmt --check --threads=1 scripts/generate-prompt-snapshots.ts, pnpm prompt:snapshots:check, pnpm test test/scripts/prompt-snapshots.test.ts, pnpm lint:scripts, and git diff --check. Testbox tbx_01kqjp2pppv8bza65gz5mhhfxv ran OPENCLAW_TESTBOX=1 pnpm check:changed successfully on the final diff.

@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch from d4d153c to 9bffe0e Compare May 1, 2026 21:25
@pashpashpash
Copy link
Copy Markdown
Contributor Author

Pushed 9bffe0ed12f45340b49fb41b76c79b0dada6cd03 to handle the latest exact-head build failure.

The failed GitHub build path was not in the prompt snapshot code. scripts/write-cli-compat.ts imports LEGACY_DAEMON_CLI_EXPORTS and LegacyDaemonCliAccessors from src/cli/daemon-cli-compat.ts, but current main had stopped exporting those symbols. The branch now exports the existing const/type without changing their runtime behavior.

I reproduced the build path in Testbox. The first ad hoc Testbox shell exposed runner setup issues (pnpm not on PATH, then Node 20), so I reran the same build-artifact command with the hosted Node 22 toolcache on PATH: Testbox tbx_01kqjpq3pgp0fy9evkxv304644 passed PATH=/opt/hostedtoolcache/node/22.22.0/x64/bin:$PATH corepack pnpm build:ci-artifacts, including the previously failing write-cli-compat step. Local oxfmt --check for the touched file and git diff --check also passed.

@openclaw-barnacle openclaw-barnacle Bot added the cli CLI command changes label May 1, 2026
@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch from 9bffe0e to f806ab0 Compare May 1, 2026 21:37
@pashpashpash
Copy link
Copy Markdown
Contributor Author

Pushed f806ab0a633b659d55ab0efd6c87d6b2465bdb0d for the next exact-head CI failure.

checks-fast-contracts-plugins-c failed in src/plugins/contracts/config-footprint-guardrails.test.ts because current generated bundled-channel metadata no longer includes Mattermost after 0640db72b0 refreshed the release metadata baselines, but the guardrail still hardcoded Mattermost in the generated-metadata expectation. The PR does not touch Mattermost, but the branch now aligns that test with the generated metadata so the shard can pass on the current base.

Checked with pnpm test src/plugins/contracts/config-footprint-guardrails.test.ts, targeted oxfmt --check, and git diff --check. I also tried rerunning the targeted file in the warm Testbox, but the box had lost node_modules after full sync and then hit a Testbox-local frozen-lockfile override mismatch during reinstall, so I am treating GitHub exact-head CI as the useful remote signal here.

@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch 2 times, most recently from 9642285 to 95a9b7f Compare May 1, 2026 21:48
@openclaw-barnacle openclaw-barnacle Bot added the app: macos App: macos label May 1, 2026
@pashpashpash
Copy link
Copy Markdown
Contributor Author

Pushed 95a9b7f after rebasing onto current main.

The latest exact-head failure was checks-fast-protocol. It reproduced locally with pnpm protocol:check after the rebase: the JSON schema was current, but the two generated Swift protocol model files were stale for the current base. The branch now includes only the generated Swift updates for WizardStep.format and ChannelsStopParams.

Checked locally with pnpm protocol:check, pnpm prompt:snapshots:check, pnpm test test/scripts/prompt-snapshots.test.ts src/agents/tools/tts-tool.test.ts, and git diff --check origin/main...HEAD. New exact-head CI is running now.

@pashpashpash
Copy link
Copy Markdown
Contributor Author

The red check state was from a duplicate cancelled auto-response run on the current head SHA, not from the CI workflow. I reran that cancelled workflow run and it completed successfully.

Current evidence: head 95a9b7f21e5f25c31a0a789e22f2a7c8728078cb, GitHub status check rollup is now SUCCESS, CI run 25234641014 is green, and the PR is clean/mergeable.

@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch from 95a9b7f to b3d4b17 Compare May 2, 2026 02:07
@openclaw-barnacle openclaw-barnacle Bot removed the app: macos App: macos label May 2, 2026
@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch 2 times, most recently from 384bdf8 to b4f4d28 Compare May 2, 2026 03:47
@pashpashpash pashpashpash requested a review from a team as a code owner May 2, 2026 03:47
@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch 5 times, most recently from 22aa02b to 899cecb Compare May 2, 2026 05:18
@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch from 2c4c591 to 2ef6d46 Compare May 2, 2026 06:19
@openclaw-barnacle openclaw-barnacle Bot removed the commands Command implementations label May 2, 2026
@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch from 2ef6d46 to 2febdff Compare May 2, 2026 06:26
@pashpashpash pashpashpash force-pushed the codex/prompt-snapshot-happy-path branch from 2febdff to 151670d Compare May 2, 2026 14:53
@pashpashpash
Copy link
Copy Markdown
Contributor Author

Addressed the latest ClawSweeper P3 in 641f2de.

pnpm prompt:snapshots:gen now deletes stale generated .md and .json prompt snapshot artifacts before writing the current set, so the documented repair command can fix the same stale-file drift that pnpm prompt:snapshots:check and the Vitest test report. I also added a focused test for stale snapshot cleanup.

Checked locally with pnpm exec oxfmt --check --threads=1 scripts/generate-prompt-snapshots.ts test/scripts/prompt-snapshots.test.ts, pnpm prompt:snapshots:gen, pnpm prompt:snapshots:check, pnpm test test/scripts/prompt-snapshots.test.ts, and git diff --check.

@pashpashpash
Copy link
Copy Markdown
Contributor Author

The current red CI state on head 641f2deef120f195ee3496bab0907a9d2e822d1b is a runner cancellation, not a code/test failure.

checks-node-core-ui was cancelled while restoring the pnpm cache: the log says “The runner has received a shutdown signal” and then “The operation was canceled.” The aggregate checks-node-core failed only because that shard result was cancelled; the other node/core shards and the rest of the CI jobs had succeeded.

I reran the failed jobs for CI run 25254828103 and am watching the exact-head result now.

@pashpashpash
Copy link
Copy Markdown
Contributor Author

The rerun cleared the current red CI state.

CI run 25254828103 is now green on attempt 2 for exact head 641f2deef120f195ee3496bab0907a9d2e822d1b. The cancelled checks-node-core-ui shard passed on rerun, and the aggregate checks-node-core passed after that. The PR rollup now has no non-green required checks, mergeStateStatus is CLEAN, and mergeable is MERGEABLE.

@pashpashpash pashpashpash merged commit 563dca8 into main May 2, 2026
169 of 171 checks passed
@pashpashpash pashpashpash deleted the codex/prompt-snapshot-happy-path branch May 2, 2026 16:00
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
* Add Codex prompt snapshots

* Fix prompt snapshot scenario catalogs

* Harden prompt snapshot drift check

* Fix CLI compat build export

* fix: keep codex snapshots out of core plugin surface

* fix: harden prompt snapshot ci checks

* fix: accept readonly web search onboarding scopes

* fix: repair plugin sdk package boundary types

* fix: clear prompt snapshot ci regressions

* fix: clear latest main ci checks

* fix: resolve latest main discord helper overlap

* fix: refresh codex dynamic tool snapshots

* fix: align prompt snapshot branch with latest ci

* fix: isolate plugin auto enable tests

* test: refresh prompt dynamic tool snapshots

* fix: stabilize bundled channel auto enable

* fix: clean stale prompt snapshots
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* Add Codex prompt snapshots

* Fix prompt snapshot scenario catalogs

* Harden prompt snapshot drift check

* Fix CLI compat build export

* fix: keep codex snapshots out of core plugin surface

* fix: harden prompt snapshot ci checks

* fix: accept readonly web search onboarding scopes

* fix: repair plugin sdk package boundary types

* fix: clear prompt snapshot ci regressions

* fix: clear latest main ci checks

* fix: resolve latest main discord helper overlap

* fix: refresh codex dynamic tool snapshots

* fix: align prompt snapshot branch with latest ci

* fix: isolate plugin auto enable tests

* test: refresh prompt dynamic tool snapshots

* fix: stabilize bundled channel auto enable

* fix: clean stale prompt snapshots
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling cli CLI command changes docs Improvements or additions to documentation extensions: codex maintainer Maintainer-authored PR scripts Repository scripts size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant