fix(codex): handle v0.136+ TUI footer and skip MCP tool-call markers …#274
Merged
Conversation
…in extraction
Two bugs in CodexProvider, both surfaced by the e2e suite on codex 0.136+:
1. TUI footer detection — codex 0.136 dropped the "N% left" segment; the
footer is now just "model · path". TUI_FOOTER_PATTERN never matched, so
get_status() couldn't distinguish the suggestion hint ("› Run /review on
my current changes") from a real user message and pinned terminals at
IDLE forever. Pattern extended to anchor on "·\s+[~/]".
2. extract_last_message_from_script() anchored on the first "•" after the
user prompt, which can be a "• Called <mcp_server>.<tool>(...)" tool-call
marker. The lines that follow are tool output, not the model's reply, so
when codex called load_skill before responding the extracted message
leaked the cao-worker-protocols skill body (which mentions
"[CAO Handoff]") into the test output. Now iterates "•" matches and
skips MCP_TOOL_CALL_PATTERN. Also tightened ASSISTANT_PREFIX_PATTERN
from "\s*•" to "[^\S\n]*•" so the match anchors on the bullet line
itself, not a preceding blank line.
Verified end-to-end: 11/12 codex e2e pass (1 xfailed expected), 81/81
unit tests including 3 new regressions for the v0.136 footer and tool-call
marker filtering.
Run black on codex.py and the unit test file to satisfy the Code Quality CI check on PR #274.
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the Codex provider’s parsing logic to handle Codex v0.136+ TUI output changes and to avoid treating MCP tool-call markers as assistant replies during last-message extraction, with accompanying unit test coverage and changelog updates.
Changes:
- Extend Codex TUI footer detection to recognize v0.136+
model · pathformat so status detection doesn’t misclassify footer hint text as user input. - Update message extraction to skip
• Called <mcp_server>.<tool>(...)tool-call markers so tool output (e.g., skill bodies) doesn’t leak into extracted assistant responses. - Add unit tests covering v0.136 footer behavior and tool-call marker filtering; update changelog entry date and fixed items.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
test/providers/test_codex_provider_unit.py |
Adds regression tests for v0.136 footer parsing and for skipping MCP tool-call markers during extraction. |
src/cli_agent_orchestrator/providers/codex.py |
Updates regex patterns and extraction logic to handle v0.136 footer changes and avoid anchoring on MCP tool-call markers. |
CHANGELOG.md |
Updates the 2.2.0 date and records fixes related to Codex footer detection and tool-call marker filtering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…tion Address Copilot review on PR #274 — two related concerns: 1. MCP_TOOL_CALL_PATTERN was too loose: "^[^\S\n]*•\s+Called\s+\S+" matched any "• Called <word>" so a real model bullet like "• Called attention to the bug" would be filtered out as a tool call. Tightened to require the documented "<server>.<tool>(" shape: "^[^\S\n]*•\s+Called\s+[\w-]+\.[\w-]+\(". 2. get_status()'s COMPLETED check searched ASSISTANT_PREFIX_PATTERN without filtering tool-call markers. A "• Called <server>.<tool>(...)" bullet emitted before the model has actually replied could trip COMPLETED on its own. Factored the per-line tool-call filter into a _find_assistant_marker() helper and applied it in get_status() (both COMPLETED detection and assistant_after_last_user gating) and in extract_last_message_from_script(). Tests: - test_extract_does_not_filter_called_as_english_word — guards against over-broad filtering of legitimate model bullets. - test_get_status_idle_when_only_tool_call_after_user — IDLE when the only "•" after the user prompt is a tool-call marker. - test_get_status_completed_when_real_reply_after_tool_call — COMPLETED when a real "•" reply follows a tool-call marker. 84 unit tests pass.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
call-me-ram
added a commit
to call-me-ram/cli-agent-orchestrator
that referenced
this pull request
Jun 4, 2026
Resolve the codex test conflict introduced by awslabs#274 (codex v0.136 TUI footer + MCP tool-call marker filtering). awslabs#274 added its get_status regressions against the old tmux-reading contract (mock_tmux.get_history + get_status()); adapt them to the event-driven get_status(buffer) contract this branch introduces. codex.py auto-merged cleanly — the awslabs#274 regex/extraction fixes layer onto buffer-based get_status unchanged. Full unit suite green: 1948 passed, 1 skipped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…in extraction
Two bugs in CodexProvider, both surfaced by the e2e suite on codex 0.136+:
TUI footer detection — codex 0.136 dropped the "N% left" segment; the footer is now just "model · path". TUI_FOOTER_PATTERN never matched, so get_status() couldn't distinguish the suggestion hint ("› Run /review on my current changes") from a real user message and pinned terminals at IDLE forever. Pattern extended to anchor on "·\s+[~/]".
extract_last_message_from_script() anchored on the first "•" after the user prompt, which can be a "• Called <mcp_server>.(...)" tool-call marker. The lines that follow are tool output, not the model's reply, so when codex called load_skill before responding the extracted message leaked the cao-worker-protocols skill body (which mentions "[CAO Handoff]") into the test output. Now iterates "•" matches and skips MCP_TOOL_CALL_PATTERN. Also tightened ASSISTANT_PREFIX_PATTERN from "\s*•" to "[^\S\n]*•" so the match anchors on the bullet line itself, not a preceding blank line.
Verified end-to-end: 11/12 codex e2e pass (1 xfailed expected), 81/81 unit tests including 3 new regressions for the v0.136 footer and tool-call marker filtering.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.