-
Notifications
You must be signed in to change notification settings - Fork 328
[cli-tools-test] audit: Codex run audit missing token/turn metrics and firewall failure root cause #23838
Description
Problem Description
The audit tool produces an incomplete report for failed Codex engine runs. Two specific gaps were identified during exploratory testing:
-
Missing execution metrics: Codex runs report
turns: 0,token_usage: undefined, andtool_types: 0even when the agent executed for several minutes. Themetricssection only contains{"error_count":1,"warning_count":0}. -
Missing firewall root cause: When the agent job fails due to a firewall-blocked domain, the
audittool does not surface this as a failure cause. Thefirewall_analysisfield is absent from the audit output, and the error cause is buried in a 10.7 MBagent-stdio.log.
Example Run
- Run: §23833156240 (Smoke Codex, failure)
- Duration: 6.0m,
agentjob failed
Actual Audit Output (truncated)
{
"metrics": { "error_count": 1, "warning_count": 0 },
"session_analysis": { "wall_time": "6.0m", "timeout_detected": false },
"observability_insights": [
{
"title": "Directed execution path",
"evidence": "turns=0 tool_types=0"
}
]
}No firewall_analysis field was present despite clear firewall events in the logs.
Actual Failure Root Cause (from agent-stdio.log)
The agent attempted to access chatgpt.com, which is not in the workflow's allowed domains. The firewall blocked the request. This caused the Codex agent to exit with code 1. The warning in the logs provides the blocked domain in a --allow-domains suggestion, but this is not surfaced in the audit report.
Steps to Reproduce
- Run
auditon run23833156240:Use agentic-workflows audit tool with run_id_or_url: 23833156240 - Observe:
metricslackstoken_usageandturns,firewall_analysisis absent - Compare with a Claude run audit (e.g.,
23832622309) which has full metrics
Expected Behavior
- Codex metrics: The audit should report available execution metrics for Codex runs, even if the token format differs from Claude/Copilot. At minimum, show
wall_time,action_minutes, andexit_code. - Firewall root cause: When an agent job fails and the logs contain firewall-block indicators,
firewall_analysisshould be populated withblocked_domains. Akey_findingsentry should explain: "Agent attempted to access blocked domain: chatgpt.com" with a recommendation to add it tonetwork.allowed.
Environment
- Repository: github/gh-aw
- Run ID: 23833406848
- Date: 2026-04-01
- Tool:
agentic-workflowsMCP server -auditcommand
Impact
- Severity: Medium
- Frequency: Every Codex failure with firewall blocks
- Workaround: Manually inspect
agent-stdio.logfor[WARN]lines referencing blocked domains
Contrast: Claude Run Audit (Working Correctly)
For comparison, auditing Claude run 23832622309 correctly produces:
token_usage: 1169969turns: 30firewall_analysis(when blocked requests exist)- Full
key_findingswith actionable recommendations
References:
- §23833156240 — Codex failure (missing metrics)
- §23832622309 — Claude success (full metrics for comparison)
- §23833406848 — This test run
Generated by Daily CLI Tools Exploratory Tester · ◷
- expires on Apr 8, 2026, 5:36 AM UTC