Skip to content

[cli-tools-test] audit: Codex run audit missing token/turn metrics and firewall failure root cause #23838

@github-actions

Description

@github-actions

Problem Description

The audit tool produces an incomplete report for failed Codex engine runs. Two specific gaps were identified during exploratory testing:

  1. Missing execution metrics: Codex runs report turns: 0, token_usage: undefined, and tool_types: 0 even when the agent executed for several minutes. The metrics section only contains {"error_count":1,"warning_count":0}.

  2. Missing firewall root cause: When the agent job fails due to a firewall-blocked domain, the audit tool does not surface this as a failure cause. The firewall_analysis field is absent from the audit output, and the error cause is buried in a 10.7 MB agent-stdio.log.

Example Run

  • Run: §23833156240 (Smoke Codex, failure)
  • Duration: 6.0m, agent job failed

Actual Audit Output (truncated)

{
  "metrics": { "error_count": 1, "warning_count": 0 },
  "session_analysis": { "wall_time": "6.0m", "timeout_detected": false },
  "observability_insights": [
    {
      "title": "Directed execution path",
      "evidence": "turns=0 tool_types=0"
    }
  ]
}

No firewall_analysis field was present despite clear firewall events in the logs.

Actual Failure Root Cause (from agent-stdio.log)

The agent attempted to access chatgpt.com, which is not in the workflow's allowed domains. The firewall blocked the request. This caused the Codex agent to exit with code 1. The warning in the logs provides the blocked domain in a --allow-domains suggestion, but this is not surfaced in the audit report.

Steps to Reproduce

  1. Run audit on run 23833156240:
    Use agentic-workflows audit tool with run_id_or_url: 23833156240
    
  2. Observe: metrics lacks token_usage and turns, firewall_analysis is absent
  3. Compare with a Claude run audit (e.g., 23832622309) which has full metrics

Expected Behavior

  1. Codex metrics: The audit should report available execution metrics for Codex runs, even if the token format differs from Claude/Copilot. At minimum, show wall_time, action_minutes, and exit_code.
  2. Firewall root cause: When an agent job fails and the logs contain firewall-block indicators, firewall_analysis should be populated with blocked_domains. A key_findings entry should explain: "Agent attempted to access blocked domain: chatgpt.com" with a recommendation to add it to network.allowed.

Environment

  • Repository: github/gh-aw
  • Run ID: 23833406848
  • Date: 2026-04-01
  • Tool: agentic-workflows MCP server - audit command

Impact

  • Severity: Medium
  • Frequency: Every Codex failure with firewall blocks
  • Workaround: Manually inspect agent-stdio.log for [WARN] lines referencing blocked domains

Contrast: Claude Run Audit (Working Correctly)

For comparison, auditing Claude run 23832622309 correctly produces:

  • token_usage: 1169969
  • turns: 30
  • firewall_analysis (when blocked requests exist)
  • Full key_findings with actionable recommendations

References:

Generated by Daily CLI Tools Exploratory Tester ·

  • expires on Apr 8, 2026, 5:36 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions