Skip to content

[refactor] 🔧 Semantic Function Clustering Analysis: Refactoring Opportunities #3525

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

This analysis examines the Go codebase to identify refactoring opportunities through semantic function clustering and duplicate detection.

Executive Summary

Repository: githubnext/gh-aw
Total Files Analyzed: 195 non-test Go files (130 in pkg/workflow/, 59 in pkg/cli/, 7 in other packages)
Total Functions Cataloged: ~1,500+ functions and methods
Major Issues Identified:

  • 10 files with highly mixed purposes requiring decomposition
  • 🔄 50+ duplicate or near-duplicate functions across files
  • 📦 100+ helper functions scattered without centralization
  • 🎯 Multiple high-impact refactoring opportunities with clear patterns
Full Analysis Report

Function Inventory by Package

pkg/workflow/ (130 files, ~1,200 functions)

Primary workflow compilation and execution logic with multiple AI engine implementations.

pkg/cli/ (59 files, ~300 functions)

CLI command implementations, workflow management, and developer tooling.

Other packages (6 files)

  • pkg/parser/ (7 files) - Parsing utilities
  • pkg/console/ (4 files) - Console formatting
  • pkg/constants/ (1 file) - Constants
  • pkg/logger/ (1 file) - Logging
  • pkg/timeutil/ (1 file) - Time utilities

Top 10 Function Naming Patterns

pkg/workflow/

  1. generate* (115+ occurrences) - YAML generation, prompts, scripts, configurations
  2. parse* (70+ occurrences) - Config parsing, input parsing, data structure parsing
  3. extract* (65+ occurrences) - Data extraction from various sources
  4. build* (60+ occurrences) - Job building, step building, configuration building
  5. validate* (45+ occurrences) - Validation logic scattered across files
  6. render* (40+ occurrences) - MCP config rendering (heavily duplicated)
  7. Get* (35+ occurrences) - Getter methods and functions
  8. New* (30+ occurrences) - Constructor functions
  9. has* (25+ occurrences) - Boolean check functions
  10. collect* (20+ occurrences) - Dependency and data collection

pkg/cli/

  1. New* (13 occurrences) - Command constructors
  2. parse* (24 occurrences) - URL, spec, and log parsing
  3. extract* (19 occurrences) - Data extraction from logs and configs
  4. build* (13 occurrences) - Report and summary building
  5. ensure* (11 occurrences) - Resource creation helpers
  6. display* (10 occurrences) - Display and rendering functions
  7. get* (18 occurrences) - Getter functions
  8. render* (8 occurrences) - JSON and console rendering
  9. validate* (6 occurrences) - Validation functions
  10. analyze* (5 occurrences) - Log analysis functions

Identified Issues

1. Files with Highly Mixed Purposes

pkg/workflow/compiler.go ⚠️ CRITICAL

Issue: God object with 30+ methods spanning multiple responsibilities
Contains:

  • Compiler initialization and setup
  • Workflow compilation orchestration
  • Job generation logic
  • Validation functions
  • Warning tracking
  • Expression extraction
  • Permission handling

Impact: Hard to maintain, test, and understand
Recommendation: Split into:

  • compiler.go - Core compilation orchestration
  • compiler_validation.go - Validation logic
  • compiler_jobs.go - Job generation (already exists, consolidate)
  • compiler_expressions.go - Expression handling

pkg/workflow/compiler_yaml.go ⚠️ CRITICAL

Issue: Mixes YAML generation with prompt generation, step conversion, error handling, and log parsing
Functions: generate*, convert*, split* across unrelated concerns
Recommendation: Split into:

  • yaml_generation.go - YAML generation only
  • prompt_generation.go - Prompt building
  • step_conversion.go - Step transformation logic

pkg/workflow/expressions.go ⚠️ CRITICAL

Issue: Contains expression parsing, AST node implementations, condition building, and tree traversal
Recommendation: Split into:

  • expressions.go - Core expression types
  • expression_parser.go - Parsing logic
  • expression_builder.go - Condition tree building
  • expression_ast.go - AST node implementations

pkg/workflow/permissions.go

Issue: Mixes permission parsing, creation helpers, YAML rendering, and validation
Recommendation: Extract rendering to separate file, group validation logic

pkg/workflow/validation.go

Issue: Contains schema validation, expression validation, container validation, and repository features
Recommendation: Split by validation concern type

pkg/cli/logs.go ⚠️ MASSIVE (58 functions)

Issue: Mixes downloading, parsing, extraction, aggregation, display, file operations, workflow analysis, missing tools detection, MCP failure detection, and firewall log parsing
Recommendation: Split into:

  • logs_download.go - Download orchestration
  • logs_parse.go - Log parsing
  • logs_analyze.go - Analysis functions
  • logs_display.go - Display logic
  • logs_metrics.go - Metrics extraction

pkg/cli/trial_command.go ⚠️ LARGE (32 functions)

Issue: Mixes trial orchestration, repository operations, workflow installation, secret management, and artifact handling
Recommendation: Extract utility functions to shared modules

pkg/cli/audit_report.go

Issue: Mixes data building with rendering (JSON + console) and metrics calculation
Recommendation: Separate rendering from data building

pkg/cli/update_command.go

Issue: Mixes update orchestration, git operations, workflow resolution, content downloading, and PR creation
Recommendation: Extract git and workflow operations to shared modules

pkg/cli/add_command.go

Issue: Mixes workflow addition, PR creation, git operations, compilation, and copilot setup
Recommendation: Extract git operations and compilation to shared modules

2. Duplicate and Near-Duplicate Functions

A. MCP Configuration Rendering (HIGHEST PRIORITY) 🔥

Pattern: Identical MCP config rendering functions replicated across 4 engine files

Duplicates:

  • renderGitHubMCPConfig in custom_engine.go (plus 3+ other engine files per analysis)
  • renderPlaywrightMCPConfig in mcp-config.go (referenced by multiple engines)
  • renderSafeOutputsMCPConfig across engine implementations
  • renderAgenticWorkflowsMCPConfig across engine implementations

Files Affected:

  • pkg/workflow/claude_mcp.go
  • pkg/workflow/codex_engine.go
  • pkg/workflow/copilot_engine.go
  • pkg/workflow/custom_engine.go

Similarity: ~95%+ identical implementations
Impact: 4x code duplication, maintenance nightmare
Recommendation: Create pkg/workflow/mcp/ subpackage with shared rendering functions. Each engine imports and calls shared functions.

Estimated Savings: ~500-800 lines of duplicate code

B. Parse Config Functions (HIGH PRIORITY) 🔥

Pattern: Every output type has its own parse*Config function with nearly identical structure

Duplicates:

  • parseCommentsConfig (add_comment.go)
  • parseIssuesConfig (create_issue.go)
  • parseDiscussionsConfig (create_discussion.go)
  • parsePullRequestsConfig (create_pull_request.go)
  • parseAgentTaskConfig (create_agent_task.go)
  • parseCodeScanningAlertsConfig (create_code_scanning_alert.go)
  • parseMissingToolConfig (missing_tool.go)
  • parseUploadAssetConfig (publish_assets.go)

Similarity: ~80% similar structure
Example Pattern:

// Each function follows this pattern:
func parse*Config(rawConfig any) (*Config, error) {
    // 1. Type assertion to map
    // 2. Extract fields with similar logic
    // 3. Apply defaults
    // 4. Return config struct
}

Impact: 8+ duplicate implementations
Recommendation:

  • Create generic parseConfigMap helper with field extractors
  • Use Go generics (1.18+) to create type-safe config parser
  • Centralize in pkg/workflow/config/ subpackage

Estimated Savings: ~300-400 lines of duplicate code

C. Build Create Output Job Functions (HIGH PRIORITY) 🔥

Pattern: Nearly identical job building functions for different output types

Duplicates:

  • buildCreateOutputIssueJob (create_issue.go)
  • buildCreateOutputDiscussionJob (create_discussion.go)
  • buildCreateOutputPullRequestJob (create_pull_request.go)
  • buildCreateOutputAgentTaskJob (create_agent_task.go)
  • buildCreateOutputCodeScanningAlertJob (create_code_scanning_alert.go)
  • buildCreateOutputAddCommentJob (add_comment.go)
  • Plus others (10+ total)

Similarity: ~85% identical structure
Impact: 10+ duplicate implementations
Recommendation:

  • Create generic buildCreateOutputJob function with strategy pattern
  • Use interfaces for output-type-specific behavior
  • Consolidate in pkg/workflow/jobs/ subpackage

Estimated Savings: ~600-800 lines of duplicate code

D. File Operation Functions (pkg/cli/) 🔥

Exact Duplicates:

  1. fileExists (logs.go) and file checking in other files
// logs.go:1985
func fileExists(path string) bool {
    info, err := os.Stat(path)
    if err != nil {
        return false
    }
    return !info.IsDir()
}
  1. copyFile (trial_command.go:1538) vs copyFileSimple (logs.go:1994)
// Both implement identical file copying logic
// trial_command.go:1538-1555
func copyFile(src, dst string) error {
    sourceFile, err := os.Open(src)
    // ... identical implementation
}

// logs.go:1994-2011
func copyFileSimple(src, dst string) error {
    in, err := os.Open(src)
    // ... identical implementation
}

Impact: Multiple duplicate file operation functions
Recommendation: Create pkg/cli/fileutil/ package with:

  • FileExists(path string) bool
  • CopyFile(src, dst string) error
  • DirExists(path string) bool
  • CalculateDirectorySize(path string) (int64, error)

Estimated Savings: ~50-80 lines of duplicate code

E. Repository Slug Functions (pkg/cli/)

Duplicates:

  • getCurrentRepo (pr_command.go:100) - Returns owner, repo separately
  • GetCurrentRepoSlug (repo.go:95) - Returns "owner/repo" string (with caching)
  • getCurrentRepoSlugUncached (repo.go:31) - Returns "owner/repo" string
  • getCurrentRepositoryInfo (trial_command.go:461) - Wrapper around GetCurrentRepoSlug

Similarity: All use gh repo view but with different output formats
Impact: 4 different implementations of same functionality
Recommendation:

  • Standardize on GetCurrentRepoSlug() from repo.go (has caching)
  • Remove getCurrentRepo and replace calls with parsing GetCurrentRepoSlug output
  • Remove getCurrentRepositoryInfo wrapper

Estimated Savings: ~40-60 lines of duplicate code

F. Extract Package Functions (pkg/workflow/)

Duplicates:

  • extractNpxPackages (npm.go)
  • extractPipPackages (pip.go)
  • extractGoPackages (dependabot.go)

Similarity: ~70% similar structure
Recommendation: Create generic package extractor with package-type-specific parsers

G. Validation Functions (pkg/cli/)

Duplicates:

  • validateServerSecrets (mcp_validation.go)
  • checkAndSuggestSecrets (mcp_secrets.go)
  • checkSecretsAvailability (secrets.go)

Similarity: ~60-70% overlap in secret validation logic
Recommendation: Consolidate into single pkg/cli/validation/secrets.go module

H. Generate Upload Functions (pkg/workflow/)

Pattern: Multiple generateUpload functions with similar structure

Duplicates:

  • generateUploadAgentLogs
  • generateUploadAssets
  • generateUploadAwInfo
  • generateUploadPrompt
  • generateUploadAccessLogs
  • generateUploadMCPLogs

Similarity: ~75% similar patterns
Recommendation: Create generic upload step generator with artifact type parameter

3. Scattered Helper Functions

A. String/Name Sanitization (4 files in pkg/workflow/)

  • SanitizeName (strings.go)
  • SanitizeWorkflowName (strings.go)
  • SanitizeIdentifier (workflow_name.go)
  • normalizeWorkflowName (resolve.go)
  • normalizeSafeOutputIdentifier (safe_outputs.go)

Recommendation: Consolidate into strings.go with clear documentation

B. Logging Variables (40+ files in pkg/workflow/)

  • Every file declares its own log variable: actionCacheLog, actionPinsLog, agenticEngineLog, etc.
  • Pattern: var xyzLog = logger.New("workflow.xyz")

Recommendation: Not a refactoring priority - this is acceptable Go practice for package-level logging

C. YAML Writing Utilities (3 files in pkg/workflow/)

  • writeArgsToYAML / writeArgsToYAMLInline (args.go)
  • writeHeadersToYAML (env.go)
  • WriteShellScriptToYAML (sh.go)
  • WritePromptTextToYAML (sh.go)
  • WriteJavaScriptToYAML (js.go)
  • MarshalWithFieldOrder (yaml.go)

Recommendation: Consolidate into yaml_helpers.go or enhance existing yaml.go

D. Token/GitHub Token Helpers (3 files in pkg/workflow/)

  • getEffectiveGitHubToken (github_token.go)
  • getEffectiveCopilotGitHubToken (github_token.go)
  • getGitHubToken (mcp_servers.go)

Recommendation: Consolidate all GitHub token logic in github_token.go

E. Expression Extraction (4 files in pkg/workflow/)

  • extractStringValue (frontmatter_extraction.go)
  • ExtractFirstMatch (metrics.go)
  • extractErrorMessage (metrics.go)
  • extractWordBefore (js.go)
  • stripExpressionWrapper (expressions.go)

Recommendation: Create expression_helpers.go for all expression manipulation utilities

F. File System Operations (pkg/cli/)

Scattered across multiple files:

  • fileExists (logs.go)
  • dirExists (logs.go)
  • isDirEmpty (logs.go)
  • copyFile (trial_command.go)
  • copyFileSimple (logs.go)
  • calculateDirectorySize (audit_report.go)

Recommendation: Create pkg/cli/fileutil/ package (detailed in section 2.D)

G. Validation Helpers (pkg/cli/)

Scattered across multiple files:

  • isValidWorkflowFile (packages.go)
  • isValidGitHubIdentifier (spec.go)
  • isCommitSHA (spec.go)
  • isPermissionError (audit.go)
  • is403PermissionError (codespace.go)

Recommendation: Create pkg/cli/validation/ package

Refactoring Recommendations

Priority 1: High Impact, Clear Wins (Estimated: 1-2 weeks)

1.1 Consolidate MCP Configuration Rendering 🔥

Impact: HIGHEST - Eliminates ~500-800 lines of duplicate code
Effort: Medium (2-3 days)
Benefits:

  • Single source of truth for MCP config rendering
  • Easier to maintain and test
  • Consistent behavior across all engines

Steps:

  1. Create pkg/workflow/mcp/ subpackage
  2. Move all render*MCPConfig functions to mcp/config.go
  3. Update all engine files to import and use shared functions
  4. Add comprehensive tests for MCP rendering
  5. Update documentation

Files to Create:

  • pkg/workflow/mcp/config.go - Shared MCP config rendering
  • pkg/workflow/mcp/github.go - GitHub MCP specific
  • pkg/workflow/mcp/playwright.go - Playwright MCP specific
  • pkg/workflow/mcp/config_test.go - Tests

Files to Modify:

  • pkg/workflow/claude_mcp.go - Remove duplicates, import mcp package
  • pkg/workflow/codex_engine.go - Remove duplicates, import mcp package
  • pkg/workflow/copilot_engine.go - Remove duplicates, import mcp package
  • pkg/workflow/custom_engine.go - Remove duplicates, import mcp package

1.2 Consolidate Parse Config Functions 🔥

Impact: HIGH - Eliminates ~300-400 lines of duplicate code
Effort: Medium (2-3 days)
Benefits:

  • Consistent config parsing across all output types
  • Type-safe with Go generics
  • Single place to fix bugs

Steps:

  1. Create pkg/workflow/config/ subpackage
  2. Implement generic ParseConfigMap[T any](rawConfig any, parser ConfigParser[T]) (*T, error) function
  3. Define ConfigParser interface for output-type-specific parsing
  4. Refactor all parse*Config functions to use generic parser
  5. Add tests

Files to Create:

  • pkg/workflow/config/parser.go - Generic config parser
  • pkg/workflow/config/types.go - Common config types
  • pkg/workflow/config/parser_test.go - Tests

1.3 Create pkg/cli/fileutil Package 🔥

Impact: HIGH - Centralized file operations
Effort: Low (1 day)
Benefits:

  • Single place for all file operations
  • Easier to test and mock
  • Consistent error handling

Steps:

  1. Create pkg/cli/fileutil/ package
  2. Move file operations from logs.go, trial_command.go, audit_report.go
  3. Add comprehensive tests
  4. Update all references

Functions to Include:

// pkg/cli/fileutil/fileutil.go
func FileExists(path string) bool
func DirExists(path string) bool
func IsDirEmpty(path string) bool
func CopyFile(src, dst string) error
func CalculateDirectorySize(path string) (int64, error)

1.4 Standardize Repository Slug Functions

Impact: MEDIUM - Eliminates confusion
Effort: Low (1 day)
Benefits:

  • Single canonical function
  • Consistent caching
  • Clear API

Steps:

  1. Keep GetCurrentRepoSlug() from repo.go as canonical implementation
  2. Remove getCurrentRepo() from pr_command.go, replace calls with string splitting
  3. Remove getCurrentRepositoryInfo() wrapper from trial_command.go
  4. Update all references

Priority 2: Medium Impact (Estimated: 2-3 weeks)

2.1 Split Large Mixed-Purpose Files

2.1.1 Split pkg/workflow/compiler.go

  • Create compiler_validation.go - Move validation functions
  • Create compiler_expressions.go - Move expression handling
  • Keep core compilation orchestration in compiler.go

2.1.2 Split pkg/workflow/compiler_yaml.go

  • Create yaml_generation.go - YAML generation only
  • Create prompt_generation.go - Prompt building
  • Create step_conversion.go - Step transformation

2.1.3 Split pkg/cli/logs.go

  • Create logs_download.go - Download orchestration
  • Create logs_parse.go - Parsing logic
  • Create logs_analyze.go - Analysis functions
  • Create logs_display.go - Display logic
  • Create logs_metrics.go - Metrics extraction

2.2 Consolidate Build Create Output Job Functions

Impact: MEDIUM-HIGH - Eliminates ~600-800 lines of duplicate code
Effort: High (4-5 days)
Benefits:

  • Single job building framework
  • Consistent behavior
  • Easier to add new output types

Steps:

  1. Create pkg/workflow/jobs/ subpackage
  2. Define OutputJobBuilder interface
  3. Implement generic BuildCreateOutputJob function
  4. Create output-type-specific implementations
  5. Refactor all buildCreateOutput*Job functions

2.3 Create pkg/cli/validation Package

Contents:

  • secrets.go - Secret validation (consolidate 3 functions)
  • workflow.go - Workflow validation
  • identifier.go - GitHub identifier validation
  • permission.go - Permission error checking

2.4 Centralize Helper Functions

  • Create pkg/workflow/helpers/ package for scattered utilities
  • Move sanitization, extraction, parsing helpers
  • Group by concern (strings, validation, extraction)

Priority 3: Long-term Improvements (Estimated: 1-2 months)

3.1 Create Validation Subpackage

  • Create pkg/workflow/validation/ for all validation logic
  • Group validators by type (schema, expression, docker, npm, pip, etc.)
  • Implement consistent validation interface

3.2 Refactor Expression Handling

  • Split expressions.go into logical modules
  • Separate parsing, building, traversal concerns
  • Improve type safety

3.3 Consider Generics for Extract Functions

  • Use generics for package extraction functions
  • Reduce duplication in npm/pip/go package handling

3.4 Improve Error Handling Patterns

  • Standardize error creation and formatting
  • Create error helpers package
  • Consistent error wrapping

Implementation Checklist

Phase 1: Quick Wins (Week 1-2)

  • Create pkg/workflow/mcp/ package and consolidate MCP rendering
  • Create pkg/cli/fileutil/ package and move file operations
  • Standardize repository slug functions
  • Run tests and verify no regressions
  • Update documentation

Phase 2: Medium Refactoring (Week 3-5)

  • Consolidate parse config functions with generics
  • Create pkg/cli/validation/ package
  • Split compiler.go into focused files
  • Split logs.go into focused files
  • Run full test suite
  • Update documentation

Phase 3: Large Refactoring (Week 6-8)

  • Consolidate build create output job functions
  • Split compiler_yaml.go into focused files
  • Create validation subpackage
  • Refactor expression handling
  • Full integration testing
  • Update all documentation

Phase 4: Verification (Week 9-10)

  • Comprehensive testing across all changes
  • Performance testing
  • Documentation review
  • Code review
  • Release planning

Analysis Metadata

  • Analysis Date: 2025-11-09
  • Total Go Files Analyzed: 195 non-test files
  • Total Functions Cataloged: ~1,500+
  • Function Clusters Identified: 20+ major patterns
  • Outliers Found: 10 files with highly mixed purposes
  • Duplicates Detected: 50+ duplicate or near-duplicate functions
  • Detection Method: Serena semantic code analysis + manual verification
  • Estimated Total Duplicate Code: 2,000-3,000 lines
  • Estimated Refactoring Effort: 6-10 weeks for full implementation
  • Estimated Impact: 15-20% reduction in codebase size, significantly improved maintainability

Key Observations

  1. Good File Organization: Most files follow feature-based naming (compiler.go, parser.go, validator.go)
  2. Consistent Naming Patterns: Functions follow clear naming conventions (generate*, parse*, extract*)
  3. ⚠️ Engine Duplication: Massive duplication in MCP config rendering across 4 engine implementations
  4. ⚠️ Config Parsing Duplication: Every output type reimplements similar config parsing logic
  5. ⚠️ Job Building Duplication: 10+ nearly identical job building functions
  6. ⚠️ Mixed Concerns: Several large files (compiler.go, logs.go) with multiple responsibilities
  7. ⚠️ Scattered Helpers: File operations, validation, and string utilities scattered across many files
  8. 🎯 Clear Refactoring Path: Well-defined patterns make refactoring straightforward
  9. 🎯 High ROI Opportunities: MCP consolidation alone would eliminate 500-800 lines of duplicate code
  10. 🎯 Testable Changes: Each refactoring can be done incrementally with test coverage

Success Criteria

This refactoring initiative will be successful when:

  1. ✅ No MCP config rendering code is duplicated across engine implementations
  2. ✅ All file operations use centralized fileutil package
  3. ✅ Config parsing uses generic type-safe parser
  4. ✅ All files have single, clear responsibilities
  5. ✅ Helper functions are organized in dedicated packages
  6. ✅ All tests pass with no regressions
  7. ✅ Code coverage maintained or improved
  8. ✅ Build time not significantly affected
  9. ✅ Documentation updated to reflect new structure
  10. ✅ Team consensus on new organization patterns

Analysis Tool: Serena semantic code analysis
Analyzer: Claude Code Agent
Repository: https://github.com/githubnext/gh-aw
Branch: main
Commit: 0d0ab2f

AI generated by Semantic Function Refactoring

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions