-
Notifications
You must be signed in to change notification settings - Fork 330
[cli-tools-test] Daily CLI Tools Testing: 10 workflow compilation failures + MCP metrics gap [2026-03-22] #22336
Description
Summary
Daily exploratory testing of the audit, logs, and compile MCP tools on 2026-03-22. Two significant issues found: 10 workflow compilation failures and MCP tool call metrics are not captured (always 0).
✅ What Worked Correctly
logstool: Basic download, engine filter (claude,copilot), workflow-name filter, date range, count limit — all functionallogsedge case: Non-existent workflow returns a helpful error with suggestions to check thestatustoollogsold date: Returns empty results (not an error) for queries with no data ✅auditsuccessful run: Full report with jobs, tool usage, firewall analysis, created items — all populated correctly (tested: Issue Monster §23415308371, Sergo §23413720096)auditwith URL: Supports full GitHub Actions run URLs as input ✅compileindividual workflow: Compilesissue-monster.mdsuccessfully ✅compilestrict=false: Correctly compiles workflows with internal fields when strict mode disabled ✅- Log files: 163 directories, 415 MB of logs downloaded — structure intact; agent-stdio.log, aw_info.json, detection.log, run_summary.json all present
🔴 Issue 1: 10 Workflows Fail to Compile (Critical)
Running compile (default strict: true) against all 177 workflows reveals 10 compilation failures:
4 smoke-* workflows: `sandbox.mcp.container` blocked in strict mode
smoke-copilot.md: strict mode: 'sandbox.mcp.container' is not allowed because it is an
internal implementation detail. Remove 'sandbox.mcp.container' or set 'strict: false'
```
Affected files:
- `smoke-copilot.md`
- `smoke-codex.md`
- `smoke-copilot-arm.md`
- `smoke-claude.md`
These workflows use `sandbox.mcp.container: "ghcr.io/github/gh-aw-mcpg"` but are missing `strict: false` in their frontmatter. Compiling each individually with `strict: false` succeeds. **Fix**: add `strict: false` to these four smoke workflow frontmatters.
</details>
<details>
<summary><b>6 workflows: Missing `vulnerability-alerts: read` permission for `dependabot` toolset</b></summary>
```
Missing required permissions for GitHub toolsets:
- vulnerability-alerts: read (required by dependabot)
Affected files:
daily-firewall-report.mddeep-report.mddependabot-go-checker.mdgithub-mcp-structural-analysis.mdgithub-mcp-tools-report.mdsecurity-review.md
These workflows use the dependabot GitHub toolset but don't declare vulnerability-alerts: read in their permissions block. This likely became a required permission in a recent update. Fix: add vulnerability-alerts: read to each workflow's permissions block, or remove the dependabot toolset if not needed.
🟡 Issue 2: MCP Tool Call Sizes and Durations Always Zero (Medium)
In both logs and audit output, all MCP server tool calls show input_size: 0, output_size: 0, and avg_duration: "0ns" — regardless of the server (github, safeoutputs, serena) or tool called.
Example from audit of run 23413720096:
{
"server_name": "serena",
"tool_name": "onboarding",
"call_count": 1,
"total_input_size": 0,
"total_output_size": 0,
"max_input_size": 0,
"max_output_size": 0
}
```
By contrast, native tool calls (bash, Read, etc.) do show real input/output sizes. This is a tracking gap: users cannot tell how much data was transferred to/from MCP servers, making it harder to diagnose performance issues or unexpected MCP behavior.
---
### 🟡 Issue 3: Audit of Invalid Run ID Returns Opaque Error (Low)
```
McpError: MCP error -32603: calling "tools/call": failed to fetch run metadataWhen auditing a non-existent run ID (e.g., 99999999999), the error code -32603 (generic internal error) and message "failed to fetch run metadata" doesn't clearly indicate the run ID was not found. A user-facing message like "Run 99999999999 not found — verify the run ID exists in this repository" would be more helpful.
📊 Test Metrics
| Phase | Tests Run | Pass | Fail |
|---|---|---|---|
| Phase 1: Discovery | 2 | 2 | 0 |
| Phase 2: Logs | 7 | 7 | 0 |
| Phase 3: Audit | 4 | 4 | 0 |
| Phase 4: Compile | 4 | 3 | 1 |
| Phase 5: Edge Cases | 4 | 3 | 1 |
Resources:
- 177 workflows discovered, 167 compile successfully
- 163 log directories, 415 MB of log data
- Logs download speed: ~10 runs in < 10s ✅
- Audit duration: ~5s per run ✅
- Compile (all 177): ~5s ✅
References:
- §23415460310 — this test run
- §23415308371 — Issue Monster (tested successful audit)
- §23413720096 — Sergo Claude (tested complex audit)
Generated by Daily CLI Tools Exploratory Tester · ◷
- expires on Mar 29, 2026, 11:58 PM UTC