Skip to content

fix(codex): trace app-server thread lifecycle timing#88101

Merged
steipete merged 1 commit into
openclaw:mainfrom
ai-hpc:fix/codex-thread-lifecycle-timing
Jun 2, 2026
Merged

fix(codex): trace app-server thread lifecycle timing#88101
steipete merged 1 commit into
openclaw:mainfrom
ai-hpc:fix/codex-thread-lifecycle-timing

Conversation

@ai-hpc

@ai-hpc ai-hpc commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #84640.

Adds Codex app-server thread lifecycle timing around startOrResumeThread so the hidden gap between embedded attempt-dispatch and Codex session.started is visible in trace logs. The summary records start/resume action, total time, session key, and per-stage timings for binding reads, thread start/resume requests, binding writes, and thread-ready.

The timing extends the existing profiler-gated lifecycle tracker: normal runs log at trace level when trace logging is enabled, and slow lifecycle spans still emit warning-level summaries when the configured thresholds are crossed.

Real behavior proof

content: behavior

environment: Local OpenClaw source checkout rebased on f499841be6, host Node 22.22.2 and pnpm 11.1.0, OpenClaw CLI local agent run with Codex app-server auth profile.

steps:

  1. Run a real local openclaw agent --local turn with trace logging against openai/gpt-5.5.
  2. Confirm the Codex app-server lifecycle summary is emitted before the turn completes.
  3. Run the focused Codex app-server lifecycle test with the Codex extension Vitest config.
  4. Run extension production typecheck against the finalized branch.
  5. Run git diff --check against upstream/main.

evidence:

OPENCLAW_LOG_LEVEL=trace OPENCLAW_TEST_CONSOLE=1 pnpm openclaw agent --local --agent main --session-key codex-lifecycle-proof-84640-final --model openai/gpt-5.5 --message "Reply exactly: pong" --json

[agent/embedded] [trace:embedded-run] startup stages: runId=8436a207-b562-4392-a9aa-e46fa30954be sessionId=3b64f9d8-0d75-402e-a3af-f304a3c1e8e7 phase=attempt-dispatch totalMs=20947 stages=workspace:2ms@2ms,runtime-plugins:20237ms@20239ms,hooks:3ms@20242ms,model-resolution:416ms@20658ms,auth:224ms@20882ms,context-engine:5ms@20887ms,attempt-workspace:50ms@20937ms,attempt-prompt:0ms@20937ms,attempt-runtime-plan:10ms@20947ms,attempt-dispatch:0ms@20947ms
[agent/embedded] [trace:codex-app-server] thread lifecycle: runId=8436a207-b562-4392-a9aa-e46fa30954be sessionId=3b64f9d8-0d75-402e-a3af-f304a3c1e8e7 sessionKey=agent:main:codex-lifecycle-proof-84640-final action=started totalMs=3547 stages=dynamic-tools-fingerprint:8ms@8ms,context-engine-binding:0ms@8ms,read-binding:1ms@10ms,merge-thread-config:0ms@50ms,thread-start-params:6ms@56ms,thread-start-request:3404ms@3460ms,thread-start-write-binding:21ms@3546ms,thread-ready:0ms@3546ms
"agentHarnessId": "codex"
"finalAssistantVisibleText": "pong"

node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/thread-lifecycle.test.ts --reporter=verbose
RUN  v4.1.7
Test Files  1 passed (1)
Tests  48 passed (48)
Duration  87.17s

node scripts/run-tsgo.mjs -p tsconfig.extensions.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions.tsbuildinfo
exit code: 0

git diff --check upstream/main...HEAD
exit code: 0

observedResult: The finalized branch emits the embedded attempt-dispatch summary and the new Codex app-server thread lifecycle summary in the same real local agent turn, then completes successfully through the Codex harness. Focused tests also cover start, resume, formatting, and warning-threshold behavior.

notTested: Full Docker gateway live Codex harness was not rerun after resolving the conflict with upstream profiler changes. An earlier pre-rebase Docker harness run from this branch passed the focused start/resume lane, but the final post-rebase proof above uses the real local CLI runner. The full extension-test typecheck lane was also not completed locally; the broader check-changed run reached extension production typecheck successfully, then the extension-test typecheck was stopped after it remained CPU-bound for several minutes. CI should cover the full extension-test typecheck.

Verification

node scripts/changed-lanes.mjs --base upstream/main --head HEAD --json
OPENCLAW_LOG_LEVEL=trace OPENCLAW_TEST_CONSOLE=1 pnpm openclaw agent --local --agent main --session-key codex-lifecycle-proof-84640-final --model openai/gpt-5.5 --message "Reply exactly: pong" --json
node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/thread-lifecycle.test.ts --reporter=verbose
node scripts/run-tsgo.mjs -p tsconfig.extensions.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions.tsbuildinfo
git diff --check upstream/main...HEAD

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge. Reviewed May 29, 2026, 2:47 PM ET / 18:47 UTC.

Summary
The PR adds Codex app-server thread lifecycle timing summaries and focused tests for trace logging, warning thresholds, start, and resume paths.

PR surface: Source +60, Tests +275. Total +335 across 2 files.

Reproducibility: yes. Current main source shows the lifecycle work happens before session.started without a trace-level summary, and the review finding is source-reproducible from the typed helper omitting required codeModeOnly while CI reports check-test-types failure.

Review metrics: 1 noteworthy metric.

  • Failing validation lane: 1 check-test-types failure. The failed lane covers test TypeScript, and the new test helper has a visible type-shape mismatch that should be fixed before merge.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🦞 diamond lobster
Patch quality: 🦐 gold shrimp
Result: needs maintainer review before merge.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Add the missing codeModeOnly value to the typed Codex app-server test helper or reuse an existing helper.
  • Rerun node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo and the focused lifecycle Vitest.

Risk before merge

  • [P1] The PR head should not merge until the check-test-types failure is fixed; the likely source is the newly typed Codex app-server test helper missing codeModeOnly.

Maintainer options:

  1. Decide the mitigation before merge
    Keep the Codex-plugin instrumentation approach, add the missing required test app-server option, and rerun the focused lifecycle test plus test-type proof before merge.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] A narrow automated repair is appropriate because the only concrete blocker found is a mechanical test helper type mismatch tied to the failing test-type lane.

Security
Cleared: The diff only changes Codex plugin logging/timing code and tests; I found no dependency, workflow, credential, permission, or supply-chain change.

Review findings

  • [P2] Add the missing codeModeOnly test option — extensions/codex/src/app-server/thread-lifecycle.test.ts:100-112
Review details

Best possible solution:

Keep the Codex-plugin instrumentation approach, add the missing required test app-server option, and rerun the focused lifecycle test plus test-type proof before merge.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main source shows the lifecycle work happens before session.started without a trace-level summary, and the review finding is source-reproducible from the typed helper omitting required codeModeOnly while CI reports check-test-types failure.

Is this the best way to solve the issue?

Mostly yes. Adding timing inside the existing Codex app-server lifecycle tracker is the right narrow solution, but this head needs the test helper type mismatch fixed before merge.

Full review comments:

  • [P2] Add the missing codeModeOnly test option — extensions/codex/src/app-server/thread-lifecycle.test.ts:100-112
    createThreadLifecycleAppServerOptions promises to return the real startOrResumeThread app-server option type, but the object omits the required codeModeOnly field from CodexAppServerRuntimeOptions. That matches the failing check-test-types lane on this head; add codeModeOnly (likely false) or reuse an existing typed helper so the test typecheck passes.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.9

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 040eba1cdc6b.

Label changes

Label changes:

  • add P2: This is a normal-priority Codex observability fix with a narrow pre-merge typecheck blocker.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a real local Codex agent turn showing the new lifecycle trace line and a successful response, with focused test and typecheck output as supplemental evidence.
  • add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🦞 diamond lobster and patch quality is 🦐 gold shrimp.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR body includes after-fix terminal output from a real local Codex agent turn showing the new lifecycle trace line and a successful response, with focused test and typecheck output as supplemental evidence.

Label justifications:

  • P2: This is a normal-priority Codex observability fix with a narrow pre-merge typecheck blocker.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🦞 diamond lobster and patch quality is 🦐 gold shrimp.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR body includes after-fix terminal output from a real local Codex agent turn showing the new lifecycle trace line and a successful response, with focused test and typecheck output as supplemental evidence.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from a real local Codex agent turn showing the new lifecycle trace line and a successful response, with focused test and typecheck output as supplemental evidence.
Evidence reviewed

PR surface:

Source +60, Tests +275. Total +335 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 115 55 +60
Tests 1 276 1 +275
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 391 56 +335

Acceptance criteria:

  • [P1] node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo.
  • [P1] node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/thread-lifecycle.test.ts --reporter=verbose.
  • [P1] git diff --check upstream/main...HEAD.

What I checked:

  • Root policy read: Read the full root AGENTS.md and applied the OpenClaw PR review rules for scoped policy, proof, CI, and member-authored keep-open handling. (AGENTS.md:1, 040eba1cdc6b)
  • Scoped policy read: Read the extensions scoped AGENTS.md; the PR stays within the Codex plugin boundary and does not add core/plugin API surface. (extensions/AGENTS.md:1, 040eba1cdc6b)
  • Current main still has only slow warning timing: Current main has an internal lifecycle timing tracker and warning log gated by profiler thresholds, but no trace-level lifecycle summary for normal trace logging around startOrResumeThread. (extensions/codex/src/app-server/thread-lifecycle.ts:127, 040eba1cdc6b)
  • PR diff inspected: The patch adds a trace/warn timing summary path in thread-lifecycle.ts and new tests in thread-lifecycle.test.ts; the central change is not already implemented on current main. (extensions/codex/src/app-server/thread-lifecycle.ts:157, 72140c097449)
  • Linked report supports the observable gap: The linked issue reports a production trace gap between embedded attempt-dispatch and Codex session.started and asks for Codex app-server thread lifecycle stage timing.
  • Real behavior proof supplied: The PR body includes terminal output from a real local openclaw agent --local Codex turn showing the new [trace:codex-app-server] thread lifecycle summary before a successful pong response, plus focused Vitest, production tsgo, and diff-check output. (72140c097449)

Likely related people:

  • Peter Steinberger: Authored the app-server module split and several early Codex app-server lifecycle/hardening commits that established the current start/resume seam. (role: feature-history contributor; confidence: high; commits: 3b65e2302a55, 9ac7a0398213, 545490c5920d; files: extensions/codex/src/app-server/thread-lifecycle.ts, extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/attempt-startup.ts)
  • Shakker: Current blame on the lifecycle timing helper region points to a recent commit that touched the Codex app-server lifecycle file in current main. (role: recent area contributor; confidence: medium; commits: 40a9c38736f8; files: extensions/codex/src/app-server/thread-lifecycle.ts)
  • Vincent Koc: Recent history on the same Codex app-server lifecycle/startup files includes auth-profile and startup reuse fixes adjacent to startOrResumeThread behavior. (role: adjacent auth/runtime contributor; confidence: medium; commits: f1cc8f0cfc7c, 4a4f52b09730, 859eb0666282; files: extensions/codex/src/app-server/thread-lifecycle.ts, extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/attempt-startup.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. labels May 29, 2026
@steipete steipete self-assigned this Jun 2, 2026
@steipete steipete force-pushed the fix/codex-thread-lifecycle-timing branch from 72140c0 to b9fe82e Compare June 2, 2026 10:08
@steipete

steipete commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Maintainer fix applied and pushed to the PR branch at b9fe82e65e0.

Summary:

  • Rebased the branch onto current origin/main and resolved the thread-lifecycle.ts conflict.
  • Preserved current main abort-signal handling while keeping the new Codex app-server lifecycle timing trace/warn instrumentation.
  • Fixed the test helper type failure by adding the required codeModeOnly: false runtime option.

Local verification:

  • git diff --check origin/main...HEAD
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/thread-lifecycle.test.ts --reporter=verbose - 52 passed
  • pnpm install --frozen-lockfile
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo
  • /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main - clean, no accepted/actionable findings

Known proof gap:

  • Full PR CI is running on b9fe82e65e0; I am watching it before merge.

@steipete steipete merged commit a02a7aa into openclaw:main Jun 2, 2026
151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex app-server thread lifecycle latency is hidden between attempt-dispatch and session.started

2 participants