Skip to content

[cli-tools-test] Daily CLI Tools Testing: 10 workflow compilation failures + MCP metrics gap [2026-03-22] #22336

@github-actions

Description

@github-actions

Summary

Daily exploratory testing of the audit, logs, and compile MCP tools on 2026-03-22. Two significant issues found: 10 workflow compilation failures and MCP tool call metrics are not captured (always 0).


✅ What Worked Correctly

  • logs tool: Basic download, engine filter (claude, copilot), workflow-name filter, date range, count limit — all functional
  • logs edge case: Non-existent workflow returns a helpful error with suggestions to check the status tool
  • logs old date: Returns empty results (not an error) for queries with no data ✅
  • audit successful run: Full report with jobs, tool usage, firewall analysis, created items — all populated correctly (tested: Issue Monster §23415308371, Sergo §23413720096)
  • audit with URL: Supports full GitHub Actions run URLs as input ✅
  • compile individual workflow: Compiles issue-monster.md successfully ✅
  • compile strict=false: Correctly compiles workflows with internal fields when strict mode disabled ✅
  • Log files: 163 directories, 415 MB of logs downloaded — structure intact; agent-stdio.log, aw_info.json, detection.log, run_summary.json all present

🔴 Issue 1: 10 Workflows Fail to Compile (Critical)

Running compile (default strict: true) against all 177 workflows reveals 10 compilation failures:

4 smoke-* workflows: `sandbox.mcp.container` blocked in strict mode
smoke-copilot.md: strict mode: 'sandbox.mcp.container' is not allowed because it is an 
  internal implementation detail. Remove 'sandbox.mcp.container' or set 'strict: false'
```

Affected files:
- `smoke-copilot.md`
- `smoke-codex.md`
- `smoke-copilot-arm.md`
- `smoke-claude.md`

These workflows use `sandbox.mcp.container: "ghcr.io/github/gh-aw-mcpg"` but are missing `strict: false` in their frontmatter. Compiling each individually with `strict: false` succeeds. **Fix**: add `strict: false` to these four smoke workflow frontmatters.

</details>

<details>
<summary><b>6 workflows: Missing `vulnerability-alerts: read` permission for `dependabot` toolset</b></summary>

```
Missing required permissions for GitHub toolsets:
  - vulnerability-alerts: read (required by dependabot)

Affected files:

  • daily-firewall-report.md
  • deep-report.md
  • dependabot-go-checker.md
  • github-mcp-structural-analysis.md
  • github-mcp-tools-report.md
  • security-review.md

These workflows use the dependabot GitHub toolset but don't declare vulnerability-alerts: read in their permissions block. This likely became a required permission in a recent update. Fix: add vulnerability-alerts: read to each workflow's permissions block, or remove the dependabot toolset if not needed.


🟡 Issue 2: MCP Tool Call Sizes and Durations Always Zero (Medium)

In both logs and audit output, all MCP server tool calls show input_size: 0, output_size: 0, and avg_duration: "0ns" — regardless of the server (github, safeoutputs, serena) or tool called.

Example from audit of run 23413720096:

{
  "server_name": "serena",
  "tool_name": "onboarding",
  "call_count": 1,
  "total_input_size": 0,
  "total_output_size": 0,
  "max_input_size": 0,
  "max_output_size": 0
}
```

By contrast, native tool calls (bash, Read, etc.) do show real input/output sizes. This is a tracking gap: users cannot tell how much data was transferred to/from MCP servers, making it harder to diagnose performance issues or unexpected MCP behavior.

---

### 🟡 Issue 3: Audit of Invalid Run ID Returns Opaque Error (Low)

```
McpError: MCP error -32603: calling "tools/call": failed to fetch run metadata

When auditing a non-existent run ID (e.g., 99999999999), the error code -32603 (generic internal error) and message "failed to fetch run metadata" doesn't clearly indicate the run ID was not found. A user-facing message like "Run 99999999999 not found — verify the run ID exists in this repository" would be more helpful.


📊 Test Metrics

Phase Tests Run Pass Fail
Phase 1: Discovery 2 2 0
Phase 2: Logs 7 7 0
Phase 3: Audit 4 4 0
Phase 4: Compile 4 3 1
Phase 5: Edge Cases 4 3 1

Resources:

  • 177 workflows discovered, 167 compile successfully
  • 163 log directories, 415 MB of log data
  • Logs download speed: ~10 runs in < 10s ✅
  • Audit duration: ~5s per run ✅
  • Compile (all 177): ~5s ✅

References:

Generated by Daily CLI Tools Exploratory Tester ·

  • expires on Mar 29, 2026, 11:58 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions