[refactor] 🔧 Semantic Function Clustering Analysis: Refactoring Opportunities

# 🔧 Semantic Function Clustering Analysis

**Repository analyzed:** `githubnext/gh-aw`  
**Analysis date:** 2025-11-12  
**Total non-test Go files analyzed:** 206  
**Total functions cataloged:** 1,269  
**Total lines of code:** ~186,000

## Executive Summary

This analysis examined the codebase structure to identify opportunities for improved code organization through semantic function clustering. The repository follows strong naming conventions and feature-based organization patterns, particularly in the `pkg/cli` and `pkg/workflow` packages. 

**Key findings:**
- ✅ Strong file organization with clear naming patterns (`*_command.go`, `mcp_*`, `create_*`, `*_validation.go`)
- ⚠️ 5 high-priority outliers identified (functions in wrong files)
- ⚠️ 3 significant duplicate patterns detected (similar logic across files)
- ⚠️ Validation logic concentration issue in `pkg/workflow/validation.go`
- ✅ Minimal problematic duplication overall (most duplication is acceptable engine-specific customization)

<details>
<summary><b>Detailed Analysis Report</b></summary>

---

## Package Organization Overview

### pkg/cli (69 files)

**Organization patterns:**
- **Command pattern** (`*_command.go`): 10 files with `New*Command()` entry points
- **Feature prefixes** (`mcp_*`, `logs_*`): 16 MCP files, 5 logs files
- **Domain files**: GitHub (`github.go`, `git.go`, `repo.go`), Actions (`actions.go`, `workflows.go`)
- **Core infrastructure**: `commands.go`, `init.go`, `resolver.go`

**Strengths:**
- Clear command organization with consistent structure
- Feature clustering (MCP files together, logs files together)
- Well-defined entry points for CLI commands

**Issues identified:**
- `logs.go` is overloaded (35+ symbols, mixed concerns)
- Template strings stored in `commands.go` instead of dedicated file
- GitHub API operations scattered across multiple files

### pkg/workflow (123 files)

**Organization patterns:**
- **Operation-based** (`create_*.go`): 6 creation files with consistent structure
- **Validation suite** (`*_validation.go`): 13 validation files
- **Engine architecture** (`*_engine.go`): 10 engine-related files
- **Compiler core** (`compiler*.go`): 3 compiler files
- **Package managers**: 6 package manager files with paired validation
- **Prompt generation** (`*_prompt.go`): 6 specialized prompt files
- **MCP configuration**: 3 MCP-related files

**Strengths:**
- Excellent naming consistency
- Clear separation of concerns (creation, validation, compilation)
- Engine infrastructure properly separated from implementations
- Package managers follow consistent pairing pattern

**Issues identified:**
- `validation.go` is catch-all with 33+ functions (should be split)
- `safe_outputs_env_test_helpers.go` misnaming (not a test file)
- `config.go` is empty placeholder (4 lines)
- `frontmatter_extraction.go` very large (24 methods, could be split)

### Other packages (5 files)

- **pkg/console**: 4 files (console, render, format, spinner) - well organized
- **pkg/constants**: 1 file with all constants - appropriate
- **pkg/logger**: 1 file - simple logger implementation
- **pkg/parser**: 6 files (frontmatter, github, mcp, schema, yaml_error, json_path_locator) - well structured
- **pkg/timeutil**: 1 file - utility functions

---

## Function Clustering Results

### Cluster 1: Creation Functions (CRUD Operations)

**Pattern:** `create_*` functions for GitHub operations  
**Files:** 6 files in `pkg/workflow/`

```
create_issue.go              → CreateIssueConfig, parseCreateIssueConfig, buildCreateIssueJob
create_pull_request.go       → CreatePullRequestConfig, parse*, build*
create_discussion.go         → CreateDiscussionConfig, parse*, build*
create_agent_task.go         → CreateAgentTaskConfig, parse*, build*
create_pr_review_comment.go  → CreatePRReviewCommentConfig, parse*, build*
create_code_scanning_alert.go → CreateCodeScanningAlertConfig, parse*, build*
```

**Assessment:** ✅ **Well-organized** - Each operation has its own file with consistent structure

### Cluster 2: Validation Functions

**Pattern:** `validate*` and `check*` functions  
**Distribution:** Across 13+ files

**Primary locations:**
- `pkg/workflow/validation.go` (33+ functions) ⚠️ **OVERLOADED**
- Specialized validators (properly split):
  - `bundler_validation.go` (1 function)
  - `docker_validation.go` (1 function)
  - `npm_validation.go` (1 function)
  - `pip_validation.go` (4 functions)
  - `template_validation.go` (1 function)
  - `expression_validation.go` (2 functions)
  - `step_order_validation.go` (full tracker type)
  - `strict_mode_validation.go` (4 functions)
  - `mcp_config_validation.go` (6 functions)
  - `engine_validation.go` (2 functions)
  - `permissions_validator.go` (13 functions)

**Also in CLI:**
- `pkg/cli/mcp_validation.go` (2 functions)
- `pkg/cli/run_command.go` (`validateRemoteWorkflow`)
- `pkg/cli/add_command.go` (workflow validation)

**Issue:** validation.go contains unrelated validations:
- Expression sizes
- Container images
- Runtime packages
- GitHub Actions schema
- Secret references
- Repository features (6 helper functions)
- HTTP transport support
- Max turns support
- Web search support
- Agent file validation

### Cluster 3: Engine System

**Pattern:** `*Engine` implementations and infrastructure  
**Files:** 10 files in `pkg/workflow/`

```
Core infrastructure:
├── engine.go (base types, registry)
├── agentic_engine.go (BaseEngine, interfaces)
├── engine_helpers.go (15 shared utilities)
├── engine_validation.go (validation)
├── engine_output.go (output collection)
├── engine_firewall_support.go (firewall)
└── engine_network_hooks.go (network hooks)

Implementations:
├── claude_engine.go + claude_mcp.go + claude_settings.go + claude_tools.go + claude_logs.go
├── copilot_engine.go
├── codex_engine.go
└── custom_engine.go
```

**Assessment:** ✅ **Well-organized** - Clear separation between infrastructure and implementations

### Cluster 4: Package Extraction Functions

**Pattern:** `extract*FromCommands` functions  
**Significant similarity detected** ⚠️

**npm.go:**
```go
func extractNpxFromCommands(commands string) []string {
    var packages []string
    lines := strings.Split(commands, "\n")
    for _, line := range lines {
        words := strings.Fields(line)
        for i, word := range words {
            if word == "npx" && i+1 < len(words) {
                // Skip flags and find first package
                for j := i + 1; j < len(words); j++ {
                    pkg := words[j]
                    pkg = strings.TrimRight(pkg, "&|;")
                    if !strings.HasPrefix(pkg, "-") {
                        packages = append(packages, pkg)
                        break
                    }
                }
            }
        }
    }
    return packages
}
```

**pip.go:**
```go
func extractPipFromCommands(commands string) []string {
    var packages []string
    lines := strings.Split(commands, "\n")
    for _, line := range lines {
        words := strings.Fields(line)
        for i, word := range words {
            if (word == "pip" || word == "pip3") && i+1 < len(words) {
                for j := i + 1; j < len(words); j++ {
                    if words[j] == "install" {
                        // Same flag-skipping logic...
                    }
                }
            }
        }
    }
    return packages
}
```

**Similarity:** ~75% - Same structure, flag-skipping logic, and string processing

**Also similar:** `extractUvFromCommands`, `extractGoFromCommands`

### Cluster 5: Parsing Functions

**Pattern:** `parse*` functions  
**Locations:** Across multiple packages

**In pkg/parser:**
- `ParseImportDirective` (frontmatter.go)
- `ParseMCPConfig` (mcp.go)
- `parseJSONPath` (json_path_locator.go)

**In pkg/workflow:**
- `parseTimeDelta` family (time_delta.go): 5 related functions
- `parse*Tool` functions (tools_types.go): 12 tool-specific parsers
- `parse*Package` (dependabot.go): 3 package parsers

**In pkg/cli:**
- `parseRepoSpec`, `parseGitHubURL`, `parseWorkflowSpec`, `parseLocalWorkflowSpec`, `parseSourceSpec` (spec.go)
- `parsePRURL` (pr_command.go)
- `parseIssueSpec` (trial_command.go)
- `parseVersion` (semver.go)
- Multiple log parsing functions (logs_parsing.go, firewall_log.go, access_log.go)

**Assessment:** Generally well-organized, each parser handles specific domain

### Cluster 6: Extraction Functions

**Pattern:** `extract*` functions  
**High concentration** - 50+ extraction functions

**Common patterns:**
- **From frontmatter:** `extractToolsFromFrontmatter`, `extractMCPServersFromFrontmatter`, `extractRuntimesFromFrontmatter`
- **From content:** `extractToolsFromContent`, `extractStepsFromContent`, `extractEngineFromContent`
- **From logs:** `extractLogMetrics`, `extractMissingToolsFromRun`, `extractMCPFailuresFromRun`
- **From strings:** `extractSecretName`, `extractRepoSlug`, `extractDomainFromURL`
- **From configs:** `extractCustomArgs`, `extractSecretsFromValue`, `extractSecretsFromHeaders`

**Assessment:** Appropriate distribution, each extraction serves specific purpose

### Cluster 7: Rendering/Generation Functions

**Pattern:** `render*`, `generate*`, `build*` functions  
**Locations:** Primarily in pkg/workflow compiler and MCP config

**In pkg/workflow/mcp-config.go:**
- `renderPlaywrightMCPConfig` (+ variants)
- `renderSafeOutputsMCPConfig` (+ variants)
- `renderAgenticWorkflowsMCPConfig` (+ variants)
- `renderCustomMCPConfigWrapper`
- `renderBuiltinMCPServerBlock`

**In pkg/workflow/compiler_yaml.go:**
- Multiple YAML generation methods

**In pkg/workflow (various):**
- `generateCacheSteps`, `generateCacheMemorySteps`
- `generateSetupStep`, `generateCleanupStep`
- `buildArtifactDownloadSteps`, `buildCopilotParticipantSteps`
- `buildConditionTree`, `buildOr`, `buildAnd`

**In pkg/console:**
- `renderValue`, `renderStruct`, `renderSlice`, `renderMap`
- `renderContext`, `renderTableRow`

**Assessment:** Well-organized by domain (MCP config, compiler YAML, console output)

---

## Identified Issues

### 1. Outlier Functions (High Priority)

#### Issue #1: Setup Functions in Wrong File

**(redacted) `pkg/cli/add_command.go`  
**Problem:** Contains multiple setup functions unrelated to adding workflows

**Outlier functions:**
```go
func ensureCopilotInstructions(...)           // Line 819
func ensureAgenticWorkflowPrompt(...)         // Line 869  
func ensureAgenticWorkflowAgent(...)          // Line 897
func ensureSharedAgenticWorkflowAgent(...)    // Line 902
func ensureSetupAgenticWorkflowsAgent(...)    // Line 907
```

**Recommendation:** Move to `copilot_setup.go` or new `agent_setup.go` file  
**Impact:** Improved file cohesion, clearer separation of concerns

#### Issue #2: Git/PR Operations in Command File

**(redacted) `pkg/cli/add_command.go`  
**Problem:** Contains Git and PR operations that belong elsewhere

**Outlier functions:**
```go
func checkCleanWorkingDirectory(...)  // Line 912 → Should be in git.go
func createPR(...)                    // Line 934 → Should be in pr_command.go
```

**Recommendation:** Move to appropriate domain files  
**Impact:** Better organization, reusability across commands

#### Issue #3: Compilation Logic in Add Command

**(redacted) `pkg/cli/add_command.go`  
**Problem:** Contains compilation logic that overlaps with compile_command.go

**Outlier functions:**
```go
func compileWorkflow(...)              // Should use compile_command.go
func compileWorkflowWithTracking(...)  // Duplicates compilation logic
```

**Recommendation:** Refactor to use shared compilation utilities  
**Impact:** Reduced duplication, single source of truth for compilation

#### Issue #4: GitHub API Operations Scattered

**Problem:** GitHub API calls spread across multiple files

**Locations:**
- `pkg/cli/logs.go`: `fetchJobStatuses()`, `fetchJobDetails()`
- `pkg/cli/github.go`: `getGitHubHost()`
- `pkg/cli/actions.go`: `convertToGitHubActionsEnv()`
- `pkg/cli/workflows.go`: `fetchGitHubWorkflows()`

**Recommendation:** Consolidate into dedicated GitHub API client or enhance existing `github.go`  
**Impact:** Centralized API access, easier maintenance, consistent error handling

#### Issue #5: Test Helpers File Misnaming

**(redacted) `pkg/workflow/safe_outputs_env_test_helpers.go`  
**Problem:** Named like test file but NOT a test file (doesn't end with `_test.go`)

**Recommendation:** Rename to `safe_outputs_test_helpers.go` or `safe_outputs_env_helpers.go`  
**Impact:** Correct naming convention, clarity about file purpose

### 2. Duplicate or Near-Duplicate Functions

#### Duplicate #1: Package Extraction Pattern (High Priority)

**Similarity:** ~75% code similarity  
**Pattern:** Command-line package extraction across different package managers

**Files affected:**
- `pkg/workflow/npm.go`: `extractNpxFromCommands`
- `pkg/workflow/pip.go`: `extractPipFromCommands`, `extractUvFromCommands`
- `pkg/workflow/dependabot.go`: `extractGoFromCommands`

**Common logic:**
1. Split commands by newlines
2. Split each line into words
3. Find package manager command
4. Skip flags (starting with `-`)
5. Extract package names
6. Trim trailing shell operators (`&|;`)

**Code comparison:**

```go
// npm.go - extractNpxFromCommands
var packages []string
lines := strings.Split(commands, "\n")
for _, line := range lines {
    words := strings.Fields(line)
    for i, word := range words {
        if word == "npx" && i+1 < len(words) {
            for j := i + 1; j < len(words); j++ {
                pkg := words[j]
                pkg = strings.TrimRight(pkg, "&|;")
                if !strings.HasPrefix(pkg, "-") {
                    packages = append(packages, pkg)
                    break
                }
            }
        }
    }
}

// pip.go - extractPipFromCommands
var packages []string
lines := strings.Split(commands, "\n")
for _, line := range lines {
    words := strings.Fields(line)
    for i, word := range words {
        if (word == "pip" || word == "pip3") && i+1 < len(words) {
            for j := i + 1; j < len(words); j++ {
                if words[j] == "install" {
                    for k := j + 1; k < len(words); k++ {
                        pkg := words[k]
                        pkg = strings.TrimRight(pkg, "&|;")
                        if !strings.HasPrefix(pkg, "-") {
                            packages = append(packages, pkg)
                            break
                        }
                    }
                    break
                }
            }
        }
    }
}
```

**Recommendation:**  
Create `pkg/workflow/package_extraction.go` with generic extraction framework:

```go
type PackageExtractor struct {
    CommandNames []string          // e.g., ["pip", "pip3"]
    RequiredSubcommand string      // e.g., "install" (optional)
    TrimSuffixes string           // e.g., "&|;"
}

func (pe *PackageExtractor) ExtractPackages(commands string) []string {
    // Generic implementation
}

// Usage in npm.go:
var npxExtractor = PackageExtractor{
    CommandNames: []string{"npx"},
    TrimSuffixes: "&|;",
}

func extractNpxFromCommands(commands string) []string {
    return npxExtractor.ExtractPackages(commands)
}
```

**Estimated effort:** 3-4 hours  
**Benefits:** 
- Reduced code duplication (~150 lines → ~50 lines)
- Single source of truth for extraction logic
- Easier to fix bugs and add features
- Consistent behavior across package managers

#### Duplicate #2: Secret Extraction Functions

**Similarity:** ~60% code similarity  
**Pattern:** Extracting secrets from various sources

**Files affected:**
- `pkg/workflow/mcp-config.go`: `extractSecretsFromValue`, `extractSecretsFromHeaders`
- `pkg/cli/secrets.go`: `extractSecretsFromConfig`

**Common logic:**
- Pattern matching for `${{ secrets.NAME }}`
- Map building for secret names
- Similar regex/string parsing approaches

**Recommendation:** Consolidate into `pkg/workflow/secret_extraction.go` with shared utilities  
**Estimated effort:** 2-3 hours  
**Benefits:** Centralized secret detection logic, easier maintenance

#### Duplicate #3: Log Parsing Functions

**Similarity:** ~50-60% similarity  
**Pattern:** Line-by-line log parsing with similar structure

**Files affected:**
- `pkg/cli/firewall_log.go`: `parseFirewallLogLine`, `parseFirewallLog`
- `pkg/cli/access_log.go`: `parseSquidLogLine`, `parseSquidAccessLog`
- `pkg/cli/logs_parsing.go`: `parseLogFileWithEngine`, `parseAgentLog`

**Common patterns:**
- Open file
- Read line by line
- Parse line with regex or field splitting
- Accumulate results
- Error handling

**Recommendation:** Consider shared log parsing utilities in `pkg/cli/log_parser.go`  
**Estimated effort:** 4-5 hours  
**Benefits:** Reduced duplication, consistent error handling, reusable parsing framework

### 3. Validation Logic Concentration Issue

**(redacted) `pkg/workflow/validation.go`  
**Problem:** Catch-all file with 33+ unrelated validation functions (450+ lines)

**Current contents (mixed concerns):**
- Expression validation (`validateExpressionSizes`)
- Container validation (`validateContainerImages`)
- Runtime validation (`validateRuntimePackages`)
- Schema validation (`validateGitHubActionsSchema`)
- Secret validation (`validateSecretReferences`)
- Repository features (6 functions: `validateRepositoryFeatures`, `checkRepositoryHasDiscussions*`, `checkRepositoryHasIssues*`)
- Agent validation (`validateAgentFile`, `validateMaxTurnsSupport`, `validateWebSearchSupport`)
- HTTP transport validation

**Recommendation:** Split into focused files:

```
validation.go (keep only high-level orchestration)
├── repository_features_validation.go (repository feature checking)
├── schema_validation.go (GitHub Actions schema)
├── runtime_validation.go (packages, containers, expressions)
└── agent_validation.go (agent file, feature support)
```

**Estimated effort:** 3-4 hours  
**Benefits:**
- Clearer separation of concerns
- Easier to find and maintain validation logic
- Follows existing pattern of specialized validators
- Better testability

### 4. Scattered Helper Functions

**Issue:** Helper functions distributed but could benefit from consolidation

**Current distribution:**
- `engine_helpers.go`: 15 functions ✅ Good
- `config_helpers.go`: 4 functions ✅ Good
- `frontmatter_helpers.go`: 2 functions ✅ Good
- `prompt_step_helper.go`: 1 function ⚠️ Could be consolidated

**Recommendation:**  
- Consider `compiler_helpers.go` for internal Compiler helpers currently embedded in `compiler.go`
- Potentially consolidate single-function helper files

**Priority:** Low (current organization is acceptable)

### 5. Empty Placeholder File

**(redacted) `pkg/workflow/config.go`  
**Content:** 4 lines (just a comment saying content moved to config_helpers.go)

**Recommendation:** Remove file or repurpose for actual config types  
**Estimated effort:** 5 minutes  
**Impact:** Cleaner codebase

---

## Refactoring Recommendations

### Priority 1: High Impact (Recommended)

#### 1. Split validation.go

**Goal:** Break up overloaded validation file into focused modules

**Tasks:**
- Create `repository_features_validation.go` (6 functions)
- Create `schema_validation.go` (schema validation)
- Create `runtime_validation.go` (packages, containers, expressions)
- Create `agent_validation.go` (agent features)
- Keep orchestration in `validation.go`

**Estimated effort:** 3-4 hours  
**Benefits:**
- ✅ Improved code organization
- ✅ Easier to find specific validators
- ✅ Better testability
- ✅ Follows existing specialized validator pattern

#### 2. Create Package Extraction Framework

**Goal:** Eliminate duplication in package extraction logic

**Tasks:**
- Create `pkg/workflow/package_extraction.go`
- Implement generic `PackageExtractor` type
- Refactor npm.go, pip.go, dependabot.go to use framework
- Update tests

**Estimated effort:** 3-4 hours  
**Benefits:**
- ✅ ~150 lines of duplicated code → ~50 lines
- ✅ Single source of truth
- ✅ Easier to add new package managers
- ✅ Consistent bug fixes across all extractors

#### 3. Move Outlier Functions to Correct Files

**Goal:** Improve file cohesion by relocating misplaced functions

**Tasks:**
- Move setup functions from `add_command.go` to appropriate setup files
- Move `checkCleanWorkingDirectory` to `git.go`
- Move `createPR` to `pr_command.go` or extract shared PR utilities
- Refactor compilation logic to use shared utilities

**Estimated effort:** 2-3 hours  
**Benefits:**
- ✅ Better separation of concerns
- ✅ Improved code reusability
- ✅ Clearer file purposes

#### 4. Fix Naming Issues

**Goal:** Correct file naming inconsistencies

**Tasks:**
- Rename `safe_outputs_env_test_helpers.go` to `safe_outputs_env_helpers.go`
- Remove or repurpose empty `config.go`

**Estimated effort:** 15 minutes  
**Benefits:**
- ✅ Correct naming conventions
- ✅ Cleaner codebase

### Priority 2: Medium Impact (Consider)

#### 5. Consolidate GitHub API Operations

**Goal:** Centralize GitHub API interactions

**Tasks:**
- Audit all GitHub API calls across CLI package
- Create or enhance GitHub client abstraction
- Move scattered API operations to centralized location
- Add consistent error handling and retry logic

**Estimated effort:** 4-5 hours  
**Benefits:**
- ✅ Centralized API access
- ✅ Consistent error handling
- ✅ Easier to add caching/rate limiting
- ✅ Better testability

#### 6. Consolidate Secret Extraction

**Goal:** Unify secret detection logic

**Tasks:**
- Create `pkg/workflow/secret_extraction.go`
- Extract common secret pattern matching
- Refactor existing extraction functions to use shared utilities

**Estimated effort:** 2-3 hours  
**Benefits:**
- ✅ Consistent secret detection
- ✅ Single place to update patterns
- ✅ Reduced duplication

### Priority 3: Long-term Improvements (Optional)

#### 7. Extract Template Strings

**Goal:** Move template strings from code to dedicated location

**Tasks:**
- Create `templates.go` or move to `templates/` directory
- Extract templates from `commands.go`
- Update references

**Estimated effort:** 2-3 hours  
**Benefits:**
- ✅ Easier template maintenance
- ✅ Better separation of code and content

#### 8. Consider Log Parsing Framework

**Goal:** Create reusable log parsing utilities

**Tasks:**
- Identify common log parsing patterns
- Create `pkg/cli/log_parser.go` with generic utilities
- Refactor firewall_log.go, access_log.go, logs_parsing.go

**Estimated effort:** 5-6 hours  
**Benefits:**
- ✅ Consistent log parsing
- ✅ Reusable utilities
- ✅ Reduced duplication

#### 9. Split Large Frontmatter Extraction File

**Goal:** Break up `frontmatter_extraction.go` (24 methods)

**Consideration:**
- File contains 24 Compiler methods for frontmatter extraction
- Could split by extraction domain:
  - `frontmatter_tools_extraction.go` (tools, MCP, runtimes)
  - `frontmatter_config_extraction.go` (permissions, if, features)
  - `frontmatter_security_extraction.go` (firewall, network)

**Estimated effort:** 4-5 hours  
**Priority:** Low (current organization functional but could be improved)

---

## Implementation Checklist

### Phase 1: Quick Wins (1-2 days)

- [ ] Fix file naming: Rename `safe_outputs_env_test_helpers.go`
- [ ] Remove empty `config.go` placeholder
- [ ] Move `checkCleanWorkingDirectory` to `git.go`
- [ ] Move `createPR` function to appropriate location

### Phase 2: High-Impact Refactoring (3-5 days)

- [ ] Split `validation.go` into focused files
  - [ ] Create `repository_features_validation.go`
  - [ ] Create `schema_validation.go`
  - [ ] Create `runtime_validation.go`
  - [ ] Create `agent_validation.go`
  - [ ] Update imports and tests
- [ ] Create package extraction framework
  - [ ] Design `PackageExtractor` type
  - [ ] Implement generic extraction logic
  - [ ] Refactor npm.go to use framework
  - [ ] Refactor pip.go to use framework
  - [ ] Refactor dependabot.go to use framework
  - [ ] Update tests
- [ ] Move setup functions from `add_command.go`
  - [ ] Identify appropriate destination files
  - [ ] Move functions with proper documentation
  - [ ] Update references
  - [ ] Verify tests pass

### Phase 3: Medium-Impact Improvements (5-7 days)

- [ ] Consolidate GitHub API operations
  - [ ] Audit API calls across codebase
  - [ ] Design GitHub client abstraction
  - [ ] Implement centralized client
  - [ ] Migrate existing calls
  - [ ] Add error handling and retry logic
- [ ] Consolidate secret extraction
  - [ ] Create `secret_extraction.go`
  - [ ] Extract shared utilities
  - [ ] Refactor existing functions
  - [ ] Update tests

### Phase 4: Long-term Considerations (As needed)

- [ ] Extract template strings to dedicated location
- [ ] Create log parsing framework
- [ ] Consider splitting large frontmatter extraction file
- [ ] Review and consolidate prompt generation files

---

## Analysis Metadata

**Analysis method:** Serena semantic code analysis + naming pattern analysis + manual code inspection  
**Files analyzed:** 206 non-test Go files  
**Functions cataloged:** 1,269 functions  
**Lines of code:** ~186,000  
**Packages analyzed:**
- `pkg/cli`: 69 files
- `pkg/workflow`: 123 files  
- `pkg/console`: 4 files
- `pkg/constants`: 1 file
- `pkg/logger`: 1 file
- `pkg/parser`: 6 files
- `pkg/timeutil`: 1 file

**Detection methods:**
- Semantic symbol analysis using Serena MCP server
- Regex pattern matching for function naming patterns
- Manual code inspection of similar functions
- Symbol overview analysis for file organization assessment

**Code similarity assessment:**
- Package extraction functions: 75% similarity
- Secret extraction functions: 60% similarity
- Log parsing functions: 50-60% similarity

---

## Conclusion

The `gh-aw` codebase demonstrates strong organizational principles with clear naming conventions and feature-based file clustering. The analysis identified 5 high-priority outliers, 3 significant duplicate patterns, and several opportunities for improved code organization.

**Overall Assessment: 8/10**

**Strengths:**
- ✅ Excellent naming conventions (`create_*`, `*_validation`, `*_engine`, `mcp_*`)
- ✅ Consistent file patterns and clear separation of concerns
- ✅ Well-organized engine architecture
- ✅ Minimal problematic duplication (most is acceptable customization)
- ✅ Clear feature clustering (MCP files, logs files, validation files)

**Areas for Improvement:**
- ⚠️ validation.go is overloaded with mixed concerns
- ⚠️ Package extraction logic duplicated across 3-4 files
- ⚠️ Some functions in wrong files (setup in add_command.go)
- ⚠️ Minor naming inconsistencies

**Recommended Next Steps:**
1. Address Priority 1 issues (high-impact, low-effort)
2. Implement package extraction framework (high-value refactoring)
3. Split validation.go into focused modules
4. Consider Priority 2 improvements based on development velocity

The proposed refactorings maintain the codebase's strong organizational foundation while addressing specific pain points and duplication patterns. All recommendations preserve existing functionality and aim to improve maintainability, testability, and code reuse.

</details>

---

**Labels:** refactoring, code-quality, technical-debt, good-first-issue

**Priority:** Medium  
**Estimated Total Effort:** 15-20 hours for Priority 1 + Priority 2 items




> AI generated by [Semantic Function Refactoring](https://github.com/githubnext/gh-aw/actions/runs/19290608224)

[refactor] 🔧 Semantic Function Clustering Analysis: Refactoring Opportunities #3713

Description

🔧 Semantic Function Clustering Analysis

Executive Summary

Package Organization Overview

pkg/cli (69 files)

pkg/workflow (123 files)

Other packages (5 files)

Function Clustering Results

Cluster 1: Creation Functions (CRUD Operations)

Cluster 2: Validation Functions

Cluster 3: Engine System

Cluster 4: Package Extraction Functions

Cluster 5: Parsing Functions

Cluster 6: Extraction Functions

Cluster 7: Rendering/Generation Functions

Identified Issues

1. Outlier Functions (High Priority)

Issue #1: Setup Functions in Wrong File

Issue #2: Git/PR Operations in Command File

Issue #3: Compilation Logic in Add Command

Issue #4: GitHub API Operations Scattered

Issue #5: Test Helpers File Misnaming

2. Duplicate or Near-Duplicate Functions

Duplicate #1: Package Extraction Pattern (High Priority)

Duplicate #2: Secret Extraction Functions

Duplicate #3: Log Parsing Functions

3. Validation Logic Concentration Issue

4. Scattered Helper Functions

5. Empty Placeholder File

Refactoring Recommendations

Priority 1: High Impact (Recommended)

1. Split validation.go

2. Create Package Extraction Framework

3. Move Outlier Functions to Correct Files

4. Fix Naming Issues

Priority 2: Medium Impact (Consider)

5. Consolidate GitHub API Operations

6. Consolidate Secret Extraction

Priority 3: Long-term Improvements (Optional)

7. Extract Template Strings

8. Consider Log Parsing Framework

9. Split Large Frontmatter Extraction File

Implementation Checklist

Phase 1: Quick Wins (1-2 days)

Phase 2: High-Impact Refactoring (3-5 days)

Phase 3: Medium-Impact Improvements (5-7 days)

Phase 4: Long-term Considerations (As needed)

Analysis Metadata

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions