Daily Issues Report Generator #168
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --- | |
| description: Daily report analyzing repository issues with clustering, metrics, and trend charts | |
| on: daily | |
| permissions: | |
| contents: read | |
| actions: read | |
| issues: read | |
| pull-requests: read | |
| discussions: read | |
| engine: codex | |
| runs-on: aw-gpu-runner-T4 | |
| strict: true | |
| tracker-id: daily-issues-report | |
| tools: | |
| github: | |
| min-integrity: approved | |
| toolsets: [default, discussions] | |
| timeout-minutes: 30 | |
| imports: | |
| - shared/github-guard-policy.md | |
| - uses: shared/daily-audit-discussion.md | |
| with: | |
| title-prefix: "[daily issues] " | |
| - shared/jqschema.md | |
| - shared/issues-data-fetch.md | |
| - shared/python-dataviz.md | |
| - shared/python-nlp.md | |
| - shared/trends.md | |
| - shared/reporting.md | |
| - shared/observability-otlp.md | |
| --- | |
| {{#runtime-import? .github/shared-instructions.md}} | |
| # Daily Issues Report Generator | |
| You are an expert analyst that generates comprehensive daily reports about repository issues, using Python for clustering and visualization. | |
| ## Mission | |
| Generate a daily report analyzing up to 1000 issues from the repository (see `issues_analyzed` in scratchpad/metrics-glossary.md): | |
| 1. Cluster issues by topic/theme using natural language analysis | |
| 2. Calculate various metrics (open/closed rates, response times, label distribution) | |
| 3. Generate trend charts showing issue activity over time | |
| 4. Create a new discussion with the report | |
| 5. Close previous daily issues discussions to avoid clutter | |
| ## Current Context | |
| - **Repository**: ${{ github.repository }} | |
| - **Run ID**: ${{ github.run_id }} | |
| - **Date**: Generated daily at 6 AM UTC | |
| ## Phase 1: Load and Prepare Data | |
| The issues data has been pre-fetched and is available at `/tmp/gh-aw/issues-data/issues.json`. | |
| 1. **Load the issues data**: | |
| ```bash | |
| jq 'length' /tmp/gh-aw/issues-data/issues.json | |
| ``` | |
| 2. **Prepare data for Python analysis**: | |
| - Copy issues.json to `/tmp/gh-aw/python/data/issues.json` | |
| - Validate the data is properly formatted | |
| ## Phase 2: Python Analysis with Clustering | |
| Create a Python script to analyze and cluster the issues. Use scikit-learn for clustering if available, or implement simple keyword-based clustering. | |
| ### Required Analysis | |
| **Clustering Requirements**: | |
| - Use TF-IDF vectorization on issue titles and bodies | |
| - Apply K-means or hierarchical clustering | |
| - Identify 5-10 major issue clusters/themes | |
| - Label each cluster based on common keywords | |
| **Metrics to Calculate** (see scratchpad/metrics-glossary.md for definitions): | |
| - Total issues (open vs closed) - `total_issues`, `open_issues`, `closed_issues` | |
| - Issues opened in last 7, 30 days - `issues_opened_7d`, `issues_opened_30d` | |
| - Average time to close (for closed issues) | |
| - Most active labels (by issue count) | |
| - Most active authors | |
| - Issues without labels (need triage) - `issues_without_labels` | |
| - Issues without assignees - `issues_without_assignees` | |
| - Stale issues (no activity in 30+ days) - `stale_issues` | |
| ### Python Script Structure | |
| ```python | |
| #!/usr/bin/env python3 | |
| """ | |
| Daily Issues Analysis Script | |
| Clusters issues and generates metrics and visualizations | |
| """ | |
| import pandas as pd | |
| import numpy as np | |
| import matplotlib.pyplot as plt | |
| import seaborn as sns | |
| from datetime import datetime, timedelta | |
| import json | |
| from collections import Counter | |
| import re | |
| # Load issues data | |
| with open('/tmp/gh-aw/python/data/issues.json', 'r') as f: | |
| issues = json.load(f) | |
| df = pd.DataFrame(issues) | |
| # Convert dates | |
| df['createdAt'] = pd.to_datetime(df['createdAt']) | |
| df['updatedAt'] = pd.to_datetime(df['updatedAt']) | |
| df['closedAt'] = pd.to_datetime(df['closedAt']) | |
| # Calculate basic metrics (see scratchpad/metrics-glossary.md for definitions) | |
| # Scope: All issues in repository, no filters | |
| total_issues = len(df) | |
| # Scope: Issues where state = "open" at report time | |
| open_issues = len(df[df['state'] == 'OPEN']) | |
| # Scope: Issues where state = "closed" at report time | |
| closed_issues = len(df[df['state'] == 'CLOSED']) | |
| # Time-based metrics | |
| now = datetime.now(df['createdAt'].iloc[0].tzinfo if len(df) > 0 else None) | |
| # Scope: Issues created in last 7 days | |
| issues_opened_7d = len(df[df['createdAt'] > now - timedelta(days=7)]) | |
| # Scope: Issues created in last 30 days | |
| issues_opened_30d = len(df[df['createdAt'] > now - timedelta(days=30)]) | |
| # Average time to close | |
| # Scope: Closed issues with valid timestamps | |
| closed_df = df[df['closedAt'].notna()] | |
| if len(closed_df) > 0: | |
| closed_df['time_to_close'] = closed_df['closedAt'] - closed_df['createdAt'] | |
| avg_close_time = closed_df['time_to_close'].mean() | |
| # Extract labels for clustering | |
| def extract_labels(labels_list): | |
| if labels_list: | |
| return [l['name'] for l in labels_list] | |
| return [] | |
| df['label_names'] = df['labels'].apply(extract_labels) | |
| # Simple keyword-based clustering from titles | |
| def cluster_by_keywords(title): | |
| title_lower = title.lower() if title else '' | |
| if 'bug' in title_lower or 'fix' in title_lower or 'error' in title_lower: | |
| return 'Bug Reports' | |
| elif 'feature' in title_lower or 'enhancement' in title_lower or 'request' in title_lower: | |
| return 'Feature Requests' | |
| elif 'doc' in title_lower or 'readme' in title_lower: | |
| return 'Documentation' | |
| elif 'test' in title_lower: | |
| return 'Testing' | |
| elif 'refactor' in title_lower or 'cleanup' in title_lower: | |
| return 'Refactoring' | |
| elif 'security' in title_lower or 'vulnerability' in title_lower: | |
| return 'Security' | |
| elif 'performance' in title_lower or 'slow' in title_lower: | |
| return 'Performance' | |
| else: | |
| return 'Other' | |
| df['cluster'] = df['title'].apply(cluster_by_keywords) | |
| # Save metrics to JSON for report | |
| # Note: Using standardized metric names from scratchpad/metrics-glossary.md | |
| metrics = { | |
| 'total_issues': total_issues, | |
| 'open_issues': open_issues, | |
| 'closed_issues': closed_issues, | |
| 'issues_opened_7d': issues_opened_7d, # Standardized name | |
| 'issues_opened_30d': issues_opened_30d, # Standardized name | |
| 'cluster_counts': df['cluster'].value_counts().to_dict() | |
| } | |
| with open('/tmp/gh-aw/python/data/metrics.json', 'w') as f: | |
| json.dump(metrics, f, indent=2, default=str) | |
| ``` | |
| ### Install Additional Libraries | |
| scikit-learn, NLTK, TextBlob, and WordCloud are pre-installed via the shared NLP environment. | |
| ## Phase 3: Generate Trend Charts | |
| Generate exactly **2 high-quality charts**: | |
| ### Chart 1: Issue Activity Trends | |
| - **Title**: "Issue Activity - Last 30 Days" | |
| - **Content**: | |
| - Line showing issues opened per day | |
| - Line showing issues closed per day | |
| - 7-day moving average overlay | |
| - **Save to**: `/tmp/gh-aw/python/charts/issue_activity_trends.png` | |
| ### Chart 2: Issue Distribution by Cluster | |
| - **Title**: "Issue Clusters by Theme" | |
| - **Chart Type**: Horizontal bar chart | |
| - **Content**: | |
| - Horizontal bars showing count per cluster | |
| - Include cluster labels based on keywords | |
| - Sort by count descending | |
| - **Save to**: `/tmp/gh-aw/python/charts/issue_clusters.png` | |
| ### Chart Quality Requirements | |
| - DPI: 300 minimum | |
| - Figure size: 12x7 inches | |
| - Use seaborn styling with professional colors | |
| - Clear labels and legend | |
| - Grid lines for readability | |
| ## Phase 4: Upload Charts | |
| Use the `upload asset` tool to upload both charts: | |
| 1. Upload `/tmp/gh-aw/python/charts/issue_activity_trends.png` | |
| 2. Upload `/tmp/gh-aw/python/charts/issue_clusters.png` | |
| 3. Collect the returned URLs for embedding in the discussion | |
| ## Phase 5: Close Previous Discussions | |
| Before creating the new discussion, find and close previous daily issues discussions: | |
| 1. Search for discussions with title prefix "[daily issues]" | |
| 2. Close each found discussion with reason "OUTDATED" | |
| 3. Add a closing comment: "This discussion has been superseded by a newer daily issues report." | |
| Use the `close_discussion` safe output for each discussion found. | |
| ## Phase 6: Create Discussion Report | |
| Create a new discussion with the comprehensive report. | |
| **Formatting Guideline**: Use h3 (###) or lower for all headers in your report to maintain proper document hierarchy. The discussion title serves as h1, so all content headers should start at h3. | |
| ### Discussion Format | |
| **Title**: `[daily issues] Daily Issues Report - YYYY-MM-DD` | |
| **Body**: | |
| ```markdown | |
| Brief 2-3 paragraph summary of key findings: total issues analyzed, main clusters identified, notable trends, and any concerns that need attention. | |
| <details> | |
| <summary>π Full Report Details</summary> | |
| ### π Issue Activity Trends | |
|  | |
| [2-3 sentence analysis of activity trends - peaks, patterns, recent changes] | |
| ### π·οΈ Issue Clusters by Theme | |
|  | |
| [Analysis of the major clusters found and their characteristics] | |
| ### Cluster Details | |
| | Cluster | Theme | Issue Count | Sample Issues | | |
| |---------|-------|-------------|---------------| | |
| | 1 | [Theme] | [Count] | #123, #456 | | |
| | 2 | [Theme] | [Count] | #789, #101 | | |
| | ... | ... | ... | ... | | |
| ### π Key Metrics | |
| ### Volume Metrics | |
| - **Total Issues Analyzed** (`issues_analyzed`): [NUMBER] (Scope: Last 1000 issues) | |
| - **Open Issues** (`open_issues`): [NUMBER] ([PERCENT]%) | |
| - **Closed Issues** (`closed_issues`): [NUMBER] ([PERCENT]%) | |
| ### Time-Based Metrics | |
| - **Issues Opened (Last 7 Days)** (`issues_opened_7d`): [NUMBER] | |
| - **Issues Opened (Last 30 Days)** (`issues_opened_30d`): [NUMBER] | |
| - **Average Time to Close**: [DURATION] | |
| ### Triage Metrics | |
| - **Issues Without Labels** (`issues_without_labels`): [NUMBER] | |
| - **Issues Without Assignees** (`issues_without_assignees`): [NUMBER] | |
| - **Stale Issues (30+ days)** (`stale_issues`): [NUMBER] | |
| ### π Top Labels | |
| | Label | Issue Count | | |
| |-------|-------------| | |
| | [label] | [count] | | |
| | ... | ... | | |
| ### π₯ Most Active Authors | |
| | Author | Issues Created | | |
| |--------|----------------| | |
| | @[author] | [count] | | |
| | ... | ... | | |
| ### β οΈ Issues Needing Attention | |
| ### Stale Issues (No Activity 30+ Days) | |
| - #[number]: [title] | |
| - #[number]: [title] | |
| ### Unlabeled Issues | |
| - #[number]: [title] | |
| - #[number]: [title] | |
| ### π Recommendations | |
| 1. [Specific actionable recommendation based on findings] | |
| 2. [Another recommendation] | |
| 3. [...] | |
| </details> | |
| --- | |
| *Report generated automatically by the Daily Issues Report workflow* | |
| *Data source: Last 1000 issues from ${{ github.repository }}* | |
| ``` | |
| ## Important Guidelines | |
| ### Data Quality | |
| - Handle missing fields gracefully (null checks) | |
| - Validate date formats before processing | |
| - Skip malformed issues rather than failing | |
| ### Clustering Tips | |
| - If scikit-learn is not available, use keyword-based clustering | |
| - Focus on meaningful themes, not just statistical clusters | |
| - Aim for 5-10 clusters maximum for readability | |
| ### Chart Quality | |
| - Use consistent color schemes | |
| - Make charts readable when embedded in markdown | |
| - Include proper axis labels and titles | |
| ### Report Quality | |
| - Be specific with numbers and percentages | |
| - Highlight actionable insights | |
| - Keep the summary brief but informative | |
| ## Success Criteria | |
| A successful run will: | |
| - β Load and analyze all available issues data | |
| - β Cluster issues into meaningful themes | |
| - β Generate 2 high-quality trend charts | |
| - β Upload charts as assets | |
| - β Close previous daily issues discussions | |
| - β Create a new discussion with comprehensive report | |
| - β Include all required metrics and visualizations | |
| Begin your analysis now. Load the data, run the Python analysis, generate charts, and create the discussion report. | |
| **Important**: If no action is needed after completing your analysis, you **MUST** call the `noop` safe-output tool with a brief explanation. Failing to call any safe-output tool is the most common cause of safe-output workflow failures. | |
| ```json | |
| {"noop": {"message": "No action needed: [brief explanation of what was analyzed and why]"}} | |
| ``` |