-
Notifications
You must be signed in to change notification settings - Fork 328
[WHM] Workflow Health Dashboard — 2026-04-02 #24097
Copy link
Copy link
Closed
Labels
cookieIssue Monster Loves Cookies!Issue Monster Loves Cookies!
Description
Overview
Score: 75/100 (↑1 from 74). 179 workflows total. Run §23899445141
Summary
| Category | Count |
|---|---|
| Total workflows | 179 |
| Healthy (≥80) | 175 |
| P1 — Active failures | 1 |
| P2 — Team decision (not_planned) | 2 |
| Watch | 2 |
| Stale lock files | 10 (↓12 from last run) |
P1 — Critical Issues 🚨
Smoke Multi PR (NEW — Schedule Failures)
- Status: 5/5 recent schedule runs failing (Mar 29 – Apr 2)
- Latest run: #604 — failure (2026-04-02)
- Error:
add_comment (status-comment)safe output fails on schedule: "Target is 'triggering' but not running in issue or pull request context" - Root cause:
status-comment: truegenerates anadd_commentwithtarget: triggering, which fails hard on schedule runs (no issue/PR context) - Impact: Smoke test for multi-PR creation can't verify success via safe outputs
- Action: Issue [P1] Smoke Multi PR: safe_outputs fails on schedule runs (add_comment target:triggering) #24096 created for investigation
- Priority: P1
P2 — Team Decisions (not_planned) ⚠️
Smoke Update Cross-Repo PR (#23193 closed not_planned)
- Status: Still failing on schedule (run [Custom Engine Test] Test Issue Created by Custom Engine #485, 2026-04-02). Issues closed by
@pelikhanas not_planned. - Root cause: push_repo_memory git branch bug
Smoke Create Cross-Repo PR (#23715 closed not_planned)
- Status: Still failing on schedule (run [Custom Engine Test] Test Issue Created by Custom Engine #485, 2026-04-02). Same root cause as Update.
Watch 👁️
Smoke Claude (#23528 open)
- Schedule runs: Run [task] Verify GitHub token permissions for Daily Firewall Report workflow #2613 (Apr 2) = ✅ SUCCESS; Run Reject minute units in stop-after relative time deltas #2611 (Apr 1) = ❌ failure
- Pattern: Intermittent failures ~25-30% of schedule runs. MCP timeout at 412s.
- Note: Today's scheduled run succeeded. Still monitoring.
Smoke Multi PR (escalated to P1 above)
- Previous note (APR): Run #604 used 89 turns/12.3min — well above normal 2–4 turns. Resource heavy before the safe_outputs failure.
Stale Lock Files (10) 🔧
Improved from 22 last run → 12 files were recompiled. Remaining stale files:
commit-changes-analyzer, copilot-pr-nlp-analysis, daily-mcp-concurrency-analysis, daily-rendering-scripts-verifier, developer-docs-consolidator, github-mcp-tools-report, issue-monster, plan, security-compliance, weekly-issue-summary
Healthy Workflows ✅
175 workflows operating normally with no issues detected.
Systemic Issues
Status-Comment Failure on Schedule Runs
- Affected: Smoke Multi PR (confirmed), potentially other workflows with
status-comment: trueon schedule triggers - Pattern:
add_commentwithtarget: triggeringfails hard when there is no triggering issue/PR (schedule context) - Recommendation: Safe outputs system should silently skip
status-commenton schedule runs instead of counting it as a failure
Smoke Claude MCP Timeout (~$10-15/week)
- Pattern: HTTP connection closes at ~412s on some runs; intermittent (~25-30% of schedule runs)
- Issues: [observability escalation] Smoke test workflows repeatedly exceed resource and control thresholds (Smoke Claude, Smoke Copilot) #23528, [aw] Smoke Claude failed #23067 (open)
Recommendations
High Priority
- Fix
Smoke Multi PR— add_comment with target:triggering should not fail on schedule runs (P1) - Investigate whether
status-comment: truecauses similar failures in other scheduled workflows
Medium Priority
- Recompile 10 stale lock files when their source
.mdfiles are stable - Resolve Smoke Claude MCP timeout (ongoing cost ~$10-15/week)
Low Priority
- Close PR Triage Agent issue [aw] PR Triage Agent failed #23151 (9+ consecutive successes, fully recovered)
Actions Taken This Run
- Created issue [P1] Smoke Multi PR: safe_outputs fails on schedule runs (add_comment target:triggering) #24096 for Smoke Multi PR P1 failure
- Updated shared repo memory with health status
- Closed previous dashboard issue [WHM] Workflow Health Dashboard — 2026-04-01 #23881 (via comment)
Trends
- Overall health score: 75/100 (↑1 from 74 last run)
- Stale lock files: 10 (↓12 from last run's 22) — significant improvement
- New P1: Smoke Multi PR safe_outputs failure on schedule
- Previous P1s resolved: Smoke Update/Create Cross-Repo PR closed as not_planned
Last updated: 2026-04-02T12:05Z
Next check: 2026-04-03
References:
- §23899445141 — This WHM run
- Smoke Multi PR run #604 — Latest failure
- Previous WHM Dashboard #23881
Note
🔒 Integrity filter blocked 1 item
The following item were blocked because they don't meet the GitHub integrity level.
- #19836
search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | noneGenerated by Workflow Health Manager - Meta-Orchestrator · ◷
- expires on Apr 3, 2026, 12:13 PM UTC
Reactions are currently unavailable
Metadata
Metadata
Labels
cookieIssue Monster Loves Cookies!Issue Monster Loves Cookies!
Type
Fields
Give feedbackNo fields configured for issues without a type.