Skip to content

[ci-coach] Optimize CI integration test matrix for better balance #6588

@github-actions

Description

@github-actions

CI Optimization: Integration Test Matrix Rebalancing

Summary

This PR optimizes the CI integration test matrix to reduce the critical path by addressing severe imbalances. The changes reduce the maximum group duration from 76.77s to an estimated 46s (39% improvement) while adding better test isolation and eliminating duplicate test execution.

Analysis Results

Baseline Metrics (from last 100 CI runs):

  • Success rate: 35% (indicating room for improvement)
  • Unit test duration: ~115s
  • Integration test duration: ~243s across 21 matrix groups
  • Problem: Maximum group duration was 76.77s while minimum was 0s (severe imbalance)

Integration Matrix Imbalance Issues:

Group                                     Duration    % of Total  Tests
CLI Completion & Safe Inputs              76.77s     31.6%       1753 (catch-all)
Workflow Misc                             61.04s     25.1%       6736 (catch-all)
CLI Compile & Poutine                     22.87s      9.4%         89
[...other groups 0-15s...]

Key Problems Identified:

  1. ⚠️ Catch-all groups too large: "CLI Completion & Safe Inputs" (76.77s) is 76x longer than smallest group
  2. ⚠️ Duplicate test execution: TestProgressFlagSignature (30s) runs in multiple groups
  3. ⚠️ Poor pattern matching: TestCompileWorkflows matches multiple patterns, running 3 times

Optimizations

1. Isolate Slow Test (New Group)

Type: Matrix Rebalancing
Impact: Reduces "CLI Completion & Safe Inputs" from 76.77s to ~46s (39% reduction)
Risk: Low

Changes:

  • Line 90-92: Added new "CLI Progress Flag" group for TestProgressFlagSignature

Rationale:
TestProgressFlagSignature takes 30+ seconds alone and was running as part of the catch-all "CLI Completion & Safe Inputs" group. Isolating it allows the catch-all group to run ~39% faster, improving overall matrix balance.

Before:

- name: "CLI Completion & Safe Inputs"
  packages: "./pkg/cli"
  pattern: ""  # Includes TestProgressFlagSignature (30s)
  # Duration: 76.77s

After:

- name: "CLI Progress Flag"  # NEW - Isolate slow 30s test
  packages: "./pkg/cli"
  pattern: "TestProgressFlagSignature"
  # Expected duration: ~30s

- name: "CLI Completion & Safe Inputs"
  packages: "./pkg/cli"
  pattern: ""  # Now excludes TestProgressFlagSignature
  # Expected duration: ~46s (39% reduction)

Benefits:

  • Better parallelization (slow test runs independently)
  • Reduces wait time for other integration jobs
  • Makes catch-all group more predictable

2. Fix Duplicate Test Execution

Type: Test Deduplication
Impact: Eliminates ~20s of wasted CI time from duplicate runs
Risk: Low

Changes:

  • Line 77: Changed pattern from TestCompile|TestPoutine to ^TestCompile[^W]|TestPoutine

Rationale:
The pattern TestCompile|TestPoutine was matching TestCompileWorkflows* tests, causing them to run in:

  1. "CLI Compile & Poutine" group (9.98s)
  2. "CLI Completion & Safe Inputs" group (9.51s)
  3. Other catch-all groups

This wastes CI time and provides no additional test coverage.

Before:

- name: "CLI Compile & Poutine"
  pattern: "TestCompile|TestPoutine"  # Matches TestCompileWorkflows too

After:

- name: "CLI Compile & Poutine"
  pattern: "^TestCompile[^W]|TestPoutine"  # Excludes TestCompileWorkflows

The regex ^TestCompile[^W] matches "TestCompile" at the start (^) followed by any character except "W" ([^W]), thus excluding "TestCompileWorkflows" while including other "TestCompile*" tests.

Benefits:

  • Eliminates duplicate test execution (~20s saved per run)
  • Each test runs exactly once in its intended group
  • Clearer test organization

3. Split Large Catch-All Group

Type: Matrix Rebalancing
Impact: Reduces "Workflow Misc" from 61.04s to ~30s per group
Risk: Low

Changes:

  • Lines 138-143: Split "Workflow Misc" into two groups with specific patterns

Rationale:
"Workflow Misc" was a catch-all containing 6,736 tests (25.1% of all integration tests) taking 61.04s. Splitting it into two groups allows better parallelization.

Before:

- name: "Workflow Misc"
  packages: "./pkg/workflow"
  pattern: ""  # All remaining workflow tests
  # Duration: 61.04s, 6736 tests

After:

- name: "Workflow Misc Part 1"  # Common test patterns
  packages: "./pkg/workflow"
  pattern: "TestAgent|TestCopilot|TestCustom|TestEngine|TestModel|TestNetwork|TestOpenAI|TestProvider"
  # Expected: ~30s

- name: "Workflow Misc Part 2"  # Catch-all for remaining
  packages: "./pkg/workflow"
  pattern: ""
  # Expected: ~30s

Benefits:

  • Better matrix balance (two 30s groups vs one 61s group)
  • Parallel execution reduces overall duration
  • Named patterns make test organization clearer

Expected Impact

Time Savings per CI Run:

  • Matrix max duration: 76.77s → ~46s (39% reduction)
  • Better balance: Max/avg ratio improved from 76.77s/11.6s to 46s/10.6s
  • Eliminated waste: ~20s of duplicate test execution removed
  • Parallel efficiency: Better distribution across matrix groups

Overall CI Improvements:

  • More predictable run times
  • Faster feedback for PRs (integration tests are on critical path)
  • Easier to identify slow tests (better isolation)
  • Reduced GitHub Actions minutes consumption

Validation Results

✅ All validations passed:

  • YAML syntax validated (23 matrix groups, up from 21)
  • Changes verified with git diff
  • gh-aw binary tested successfully
  • Test coverage maintained (all catch-all groups preserved)

Testing Plan

After merge, monitor:

  • Integration matrix group durations match predictions
  • No test coverage gaps (all tests still run)
  • Overall CI runtime improvement
  • Success rate improvement

Metrics Baseline

Current state (for future comparison):

  • Average CI run time: Variable (35% success rate indicates issues)
  • Integration max group: 76.77s
  • Integration min group: 0s (several empty groups)
  • Unit test duration: 114.83s
  • Total integration duration: 243.11s

Expected state after optimization:

  • Integration max group: ~46s (39% reduction)
  • Better balanced: 46s max vs ~30s avg
  • Reduced duplicate execution: ~20s saved
  • Overall matrix efficiency: Improved from 31.6% max load to 18.9%

Analysis based on CI runs from §20254346458

References:

AI generated by CI Optimization Coach


Note

This was originally intended as a pull request, but the git push operation failed.

Workflow Run: View run details and download patch artifact

The patch file is available as an artifact (aw.patch) in the workflow run linked above.
To apply the patch locally:

# Download the artifact from the workflow run https://github.com/githubnext/gh-aw/actions/runs/20254582195
# (Use GitHub MCP tools if gh CLI is not available)
gh run download 20254582195 -n aw.patch
# Apply the patch
git am aw.patch
Show patch preview (59 of 59 lines)
From 56e4477c1dcb947b248dc78fd0a288df969fc116 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Tue, 16 Dec 2025 02:47:21 +0000
Subject: [PATCH] Optimize CI integration test matrix for better balance
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Isolate slow TestProgressFlagSignature (30s) into dedicated group
- Fix duplicate test execution by improving pattern matching
- Split large Workflow Misc catch-all into two balanced groups

Expected improvements:
- Max group duration: 76.77s → ~46s (39% reduction)
- Eliminate ~20s of duplicate test execution per run
- Better matrix balance for faster CI feedback
---
 .github/workflows/ci.yml | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index a8ab181..cc23440 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -74,7 +74,7 @@ jobs:
         test-group:
           - name: "CLI Compile & Poutine"
             packages: "./pkg/cli"
-            pattern: "TestCompile|TestPoutine"
+            pattern: "^TestCompile[^W]|TestPoutine"  # Exclude TestCompileWorkflows to avoid duplicates
           - name: "CLI MCP Playwright"
             packages: "./pkg/cli"
             pattern: "TestMCPInspectPlaywright"
@@ -87,6 +87,9 @@ jobs:
           - name: "CLI Logs & Firewall"
             packages: "./pkg/cli"
             pattern: "TestLogs|TestFirewall|TestNoStopTime|TestLocalWorkflow"
+          - name: "CLI Progress Flag"  # Isolate slow 30s test
+            packages: "./pkg/cli"
+            pattern: "TestProgressFlagSignature"
           - name: "CLI Completion & Safe Inputs"
             packages: "./pkg/cli"
             pattern: ""  # Catch-all for tests not matched by other CLI patterns
@@ -132,7 +135,10 @@ jobs:
           - name: "Workflow Job Management"
             packages: "./pkg/workflow"
             pattern: "Te
... (truncated)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions