[file-diet] Refactor `pkg/parser/import_processor.go` (1106 lines) into focused modules

### Overview

The file `pkg/parser/import_processor.go` has grown to **1106 lines**, significantly exceeding the 800-line healthy threshold. This monolithic file handles too many concerns in a single place: queue-based BFS traversal, field extraction for 15+ frontmatter keys, cycle detection, topological sorting, and remote origin resolution. Splitting it into focused modules will improve readability, testability, and maintainability.

### Current State

- **File**: `pkg/parser/import_processor.go`
- **Size**: 1106 lines
- **Test Coverage**: Related tests exist across multiple files (~789 lines in `import_topological_test.go` + `import_cycle_test.go`), but no dedicated `import_processor_test.go`
- **Complexity**: One mega-function `processImportsFromFrontmatterWithManifestAndSource` spans lines 182–849 (~667 lines) and handles BFS traversal, field extraction for 15+ frontmatter keys, deduplication, and result assembly inline

<details>
<summary>Full File Analysis</summary>

#### Functions

| Function | Lines (approx) | Responsibility |
|---|---|---|
| `ProcessImportsFromFrontmatter` | 82–89 | Public thin wrapper |
| `parseRemoteOrigin` | 117–167 | Parse workflowspec into remote origin struct |
| `ProcessImportsFromFrontmatterWithManifest` | 171–179 | Public wrapper |
| `ProcessImportsFromFrontmatterWithSource` | 177–180 | Public wrapper |
| `processImportsFromFrontmatterWithManifestAndSource` | 182–849 | **667-line mega-function** — BFS traversal + extraction of 15+ fields |
| `findCyclePath` | 853–874 | DFS cycle path construction |
| `dfsForCycle` | 879–914 | DFS helper for cycle detection |
| `topologicalSortImports` | 920–1072 | Kahn's algorithm for topological sort |
| `extractImportPaths` | 1074–1106 | Extract import paths from frontmatter |

#### Complexity Hotspots

- **Lines 182–360**: Import spec parsing and BFS queue seeding (complex spec parsing, remote origin detection, lock file validation, duplicate checking)
- **Lines 360–590**: BFS traversal loop — agent files, YAML workflow files, nested import discovery, remote path resolution
- **Lines 590–807**: Field extraction block — 15+ sequential blocks each extracting a different frontmatter key (tools, engines, mcp-servers, safe-outputs, safe-inputs, steps, runtimes, services, network, permissions, secret-masking, bots, skip-roles, skip-bots, plugins, post-steps, labels, cache, features)
- **Lines 920–1072**: Topological sort — already self-contained but lives in the same file

#### Duplicate Patterns

The field extraction block (lines 590–807) repeats the same pattern 15+ times:
```go
fooContent, err := extractXFromContent(string(content))
if err == nil && fooContent != "" && fooContent != "{}" {
 fooBuilder.WriteString(fooContent + "\n")
}
```
This repetitive code is a strong signal for extraction.

</details>

### Refactoring Strategy

#### Proposed File Splits

1. **`import_bfs.go`** (~300 lines)
 - Functions: `processImportsFromFrontmatterWithManifestAndSource` (the BFS traversal core — queue seeding, BFS loop, queue item dispatch, result assembly)
 - Responsibility: Orchestration of the BFS import traversal
 - Estimated LOC: ~300

2. **`import_field_extractor.go`** (~200 lines)
 - Functions: new `extractAllImportFields(content string, item importQueueItem, result *importAccumulator)` that consolidates the 15+ field extraction blocks from lines 590–807
 - Responsibility: Extracting all frontmatter fields from a single imported file and accumulating results
 - Estimated LOC: ~200

3. **`import_cycle.go`** (~120 lines)
 - Functions: `findCyclePath`, `dfsForCycle` (already cohesive — lines 851–914)
 - Responsibility: Cycle detection in the import dependency graph
 - Estimated LOC: ~120

4. **`import_topological.go`** (~160 lines)
 - Functions: `topologicalSortImports`, `extractImportPaths` (already cohesive — lines 916–1106)
 - Responsibility: Topological ordering of imports using Kahn's algorithm
 - Estimated LOC: ~160

5. **`import_remote.go`** (~60 lines)
 - Functions: `parseRemoteOrigin`, `remoteImportOrigin` type, `importQueueItem` type
 - Responsibility: Remote origin parsing and queue item types
 - Estimated LOC: ~60

6. **`import_processor.go`** (keep, reduced to ~200 lines)
 - Remaining: `ImportsResult`, `ImportSpec`, `ImportInputDefinition` types, public API functions (`ProcessImportsFromFrontmatter`, `ProcessImportsFromFrontmatterWithManifest`, `ProcessImportsFromFrontmatterWithSource`)
 - Responsibility: Public API surface and core types

#### Shared Utilities

An `importAccumulator` struct could centralize the ~25 builder/slice/set variables currently declared inline in the mega-function, passed to `extractAllImportFields` as a pointer receiver.

#### Interface Abstractions

- Consider an `importFieldExtractor` interface if the extraction logic needs to be tested independently or swapped

<details>
<summary>Test Coverage Plan</summary>

Add or augment tests for each new file:

1. **`import_cycle_test.go`** (already partially exists)
 - Test cases: simple cycle (A→B→A), longer cycle (A→B→C→A), no cycle, self-import
 - Target coverage: >80%

2. **`import_topological_test.go`** (already partially exists)
 - Test cases: linear chain, diamond dependency, independent files, circular (should error)
 - Target coverage: >80%

3. **`import_field_extractor_test.go`** (new)
 - Test cases: extraction of each field type, empty file, missing fields, deduplication of bots/labels/plugins
 - Target coverage: >80%

4. **`import_remote_test.go`** (new)
 - Test cases: valid workflowspec parsing, missing ref defaults to "main", invalid specs return nil
 - Target coverage: >80%

</details>

### Implementation Guidelines

1. **Preserve Behavior**: The public API (`ProcessImportsFromFrontmatter*`) must remain unchanged
2. **Maintain Exports**: All exported types and functions stay exported
3. **Introduce `importAccumulator`**: Move the 25+ local variables into a struct to reduce function signature complexity
4. **Incremental Changes**: Extract `import_cycle.go` first (already cohesive), then `import_topological.go`, then `import_remote.go`, then `import_field_extractor.go`
5. **Run Tests Frequently**: Verify `make test-unit` passes after each extraction
6. **Document Changes**: Add package-level comment to each new file explaining its role

### Acceptance Criteria

- [ ] `import_processor.go` reduced to ≤300 lines (types + public API only)
- [ ] `import_cycle.go` created with cycle detection functions (≤150 lines)
- [ ] `import_topological.go` created with topological sort functions (≤200 lines)
- [ ] `import_remote.go` created with remote origin types/parsing (≤100 lines)
- [ ] `import_field_extractor.go` created with field extraction logic (≤250 lines)
- [ ] `import_bfs.go` created with BFS traversal core (≤350 lines)
- [ ] All tests pass (`make test-unit`)
- [ ] Code passes linting (`make lint`)
- [ ] Build succeeds (`make build`)

<details>
<summary>Additional Context</summary>

- **Repository Guidelines**: Follow patterns in `AGENTS.md` — prefer many smaller files grouped by functionality
- **Validation Complexity**: Target 100–200 lines per file, hard limit 300 lines
- **Existing Test Files**: `import_topological_test.go`, `import_cycle_test.go`, `yaml_import_test.go`, `import_remote_nested_test.go` already provide good coverage scaffolding

</details>

---

**Priority**: Medium 
**Effort**: Medium (well-contained package, clear split boundaries, strong existing test coverage) 
**Expected Impact**: Improved maintainability, easier onboarding, reduced complexity per file

**References:**
- [§22488102624](https://github.com/github/gh-aw/actions/runs/22488102624)







> Generated by [Daily File Diet](https://github.com/github/gh-aw/actions/runs/22488102624)
> - [x] expires  on Mar 1, 2026, 1:32 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[file-diet] Refactor `pkg/parser/import_processor.go` (1106 lines) into focused modules #18691

Overview

Current State

Functions

Complexity Hotspots

Duplicate Patterns

Refactoring Strategy

Proposed File Splits

Shared Utilities

Interface Abstractions

Implementation Guidelines

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Function	Lines (approx)	Responsibility
`ProcessImportsFromFrontmatter`	82–89	Public thin wrapper
`parseRemoteOrigin`	117–167	Parse workflowspec into remote origin struct
`ProcessImportsFromFrontmatterWithManifest`	171–179	Public wrapper
`ProcessImportsFromFrontmatterWithSource`	177–180	Public wrapper
`processImportsFromFrontmatterWithManifestAndSource`	182–849	667-line mega-function — BFS traversal + extraction of 15+ fields
`findCyclePath`	853–874	DFS cycle path construction
`dfsForCycle`	879–914	DFS helper for cycle detection
`topologicalSortImports`	920–1072	Kahn's algorithm for topological sort
`extractImportPaths`	1074–1106	Extract import paths from frontmatter

[file-diet] Refactor pkg/parser/import_processor.go (1106 lines) into focused modules #18691

Description

Overview

Current State

Functions

Complexity Hotspots

Duplicate Patterns

Refactoring Strategy

Proposed File Splits

Shared Utilities

Interface Abstractions

Implementation Guidelines

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[file-diet] Refactor `pkg/parser/import_processor.go` (1106 lines) into focused modules #18691