fix(codex): trace app-server thread lifecycle timing#88101
Conversation
|
Codex review: needs changes before merge. Reviewed May 29, 2026, 2:47 PM ET / 18:47 UTC. Summary PR surface: Source +60, Tests +275. Total +335 across 2 files. Reproducibility: yes. Current main source shows the lifecycle work happens before Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Keep the Codex-plugin instrumentation approach, add the missing required test app-server option, and rerun the focused lifecycle test plus test-type proof before merge. Do we have a high-confidence way to reproduce the issue? Yes. Current main source shows the lifecycle work happens before Is this the best way to solve the issue? Mostly yes. Adding timing inside the existing Codex app-server lifecycle tracker is the right narrow solution, but this head needs the test helper type mismatch fixed before merge. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 040eba1cdc6b. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +60, Tests +275. Total +335 across 2 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
72140c0 to
b9fe82e
Compare
|
Maintainer fix applied and pushed to the PR branch at Summary:
Local verification:
Known proof gap:
|
Summary
Fixes #84640.
Adds Codex app-server thread lifecycle timing around
startOrResumeThreadso the hidden gap between embeddedattempt-dispatchand Codexsession.startedis visible in trace logs. The summary records start/resume action, total time, session key, and per-stage timings for binding reads, thread start/resume requests, binding writes, and thread-ready.The timing extends the existing profiler-gated lifecycle tracker: normal runs log at trace level when trace logging is enabled, and slow lifecycle spans still emit warning-level summaries when the configured thresholds are crossed.
Real behavior proof
content: behavior
environment: Local OpenClaw source checkout rebased on
f499841be6, host Node 22.22.2 and pnpm 11.1.0, OpenClaw CLI local agent run with Codex app-server auth profile.steps:
openclaw agent --localturn with trace logging againstopenai/gpt-5.5.git diff --checkagainstupstream/main.evidence:
observedResult: The finalized branch emits the embedded
attempt-dispatchsummary and the new Codex app-serverthread lifecyclesummary in the same real local agent turn, then completes successfully through the Codex harness. Focused tests also cover start, resume, formatting, and warning-threshold behavior.notTested: Full Docker gateway live Codex harness was not rerun after resolving the conflict with upstream profiler changes. An earlier pre-rebase Docker harness run from this branch passed the focused start/resume lane, but the final post-rebase proof above uses the real local CLI runner. The full extension-test typecheck lane was also not completed locally; the broader
check-changedrun reached extension production typecheck successfully, then the extension-test typecheck was stopped after it remained CPU-bound for several minutes. CI should cover the full extension-test typecheck.Verification