Skip to content

[file-diet] Refactor pkg/parser/import_processor.go (1106 lines) into focused modules #18691

@github-actions

Description

@github-actions

Overview

The file pkg/parser/import_processor.go has grown to 1106 lines, significantly exceeding the 800-line healthy threshold. This monolithic file handles too many concerns in a single place: queue-based BFS traversal, field extraction for 15+ frontmatter keys, cycle detection, topological sorting, and remote origin resolution. Splitting it into focused modules will improve readability, testability, and maintainability.

Current State

  • File: pkg/parser/import_processor.go
  • Size: 1106 lines
  • Test Coverage: Related tests exist across multiple files (~789 lines in import_topological_test.go + import_cycle_test.go), but no dedicated import_processor_test.go
  • Complexity: One mega-function processImportsFromFrontmatterWithManifestAndSource spans lines 182–849 (~667 lines) and handles BFS traversal, field extraction for 15+ frontmatter keys, deduplication, and result assembly inline
Full File Analysis

Functions

Function Lines (approx) Responsibility
ProcessImportsFromFrontmatter 82–89 Public thin wrapper
parseRemoteOrigin 117–167 Parse workflowspec into remote origin struct
ProcessImportsFromFrontmatterWithManifest 171–179 Public wrapper
ProcessImportsFromFrontmatterWithSource 177–180 Public wrapper
processImportsFromFrontmatterWithManifestAndSource 182–849 667-line mega-function — BFS traversal + extraction of 15+ fields
findCyclePath 853–874 DFS cycle path construction
dfsForCycle 879–914 DFS helper for cycle detection
topologicalSortImports 920–1072 Kahn's algorithm for topological sort
extractImportPaths 1074–1106 Extract import paths from frontmatter

Complexity Hotspots

  • Lines 182–360: Import spec parsing and BFS queue seeding (complex spec parsing, remote origin detection, lock file validation, duplicate checking)
  • Lines 360–590: BFS traversal loop — agent files, YAML workflow files, nested import discovery, remote path resolution
  • Lines 590–807: Field extraction block — 15+ sequential blocks each extracting a different frontmatter key (tools, engines, mcp-servers, safe-outputs, safe-inputs, steps, runtimes, services, network, permissions, secret-masking, bots, skip-roles, skip-bots, plugins, post-steps, labels, cache, features)
  • Lines 920–1072: Topological sort — already self-contained but lives in the same file

Duplicate Patterns

The field extraction block (lines 590–807) repeats the same pattern 15+ times:

fooContent, err := extractXFromContent(string(content))
if err == nil && fooContent != "" && fooContent != "{}" {
    fooBuilder.WriteString(fooContent + "\n")
}

This repetitive code is a strong signal for extraction.

Refactoring Strategy

Proposed File Splits

  1. import_bfs.go (~300 lines)

    • Functions: processImportsFromFrontmatterWithManifestAndSource (the BFS traversal core — queue seeding, BFS loop, queue item dispatch, result assembly)
    • Responsibility: Orchestration of the BFS import traversal
    • Estimated LOC: ~300
  2. import_field_extractor.go (~200 lines)

    • Functions: new extractAllImportFields(content string, item importQueueItem, result *importAccumulator) that consolidates the 15+ field extraction blocks from lines 590–807
    • Responsibility: Extracting all frontmatter fields from a single imported file and accumulating results
    • Estimated LOC: ~200
  3. import_cycle.go (~120 lines)

    • Functions: findCyclePath, dfsForCycle (already cohesive — lines 851–914)
    • Responsibility: Cycle detection in the import dependency graph
    • Estimated LOC: ~120
  4. import_topological.go (~160 lines)

    • Functions: topologicalSortImports, extractImportPaths (already cohesive — lines 916–1106)
    • Responsibility: Topological ordering of imports using Kahn's algorithm
    • Estimated LOC: ~160
  5. import_remote.go (~60 lines)

    • Functions: parseRemoteOrigin, remoteImportOrigin type, importQueueItem type
    • Responsibility: Remote origin parsing and queue item types
    • Estimated LOC: ~60
  6. import_processor.go (keep, reduced to ~200 lines)

    • Remaining: ImportsResult, ImportSpec, ImportInputDefinition types, public API functions (ProcessImportsFromFrontmatter, ProcessImportsFromFrontmatterWithManifest, ProcessImportsFromFrontmatterWithSource)
    • Responsibility: Public API surface and core types

Shared Utilities

An importAccumulator struct could centralize the ~25 builder/slice/set variables currently declared inline in the mega-function, passed to extractAllImportFields as a pointer receiver.

Interface Abstractions

  • Consider an importFieldExtractor interface if the extraction logic needs to be tested independently or swapped
Test Coverage Plan

Add or augment tests for each new file:

  1. import_cycle_test.go (already partially exists)

    • Test cases: simple cycle (A→B→A), longer cycle (A→B→C→A), no cycle, self-import
    • Target coverage: >80%
  2. import_topological_test.go (already partially exists)

    • Test cases: linear chain, diamond dependency, independent files, circular (should error)
    • Target coverage: >80%
  3. import_field_extractor_test.go (new)

    • Test cases: extraction of each field type, empty file, missing fields, deduplication of bots/labels/plugins
    • Target coverage: >80%
  4. import_remote_test.go (new)

    • Test cases: valid workflowspec parsing, missing ref defaults to "main", invalid specs return nil
    • Target coverage: >80%

Implementation Guidelines

  1. Preserve Behavior: The public API (ProcessImportsFromFrontmatter*) must remain unchanged
  2. Maintain Exports: All exported types and functions stay exported
  3. Introduce importAccumulator: Move the 25+ local variables into a struct to reduce function signature complexity
  4. Incremental Changes: Extract import_cycle.go first (already cohesive), then import_topological.go, then import_remote.go, then import_field_extractor.go
  5. Run Tests Frequently: Verify make test-unit passes after each extraction
  6. Document Changes: Add package-level comment to each new file explaining its role

Acceptance Criteria

  • import_processor.go reduced to ≤300 lines (types + public API only)
  • import_cycle.go created with cycle detection functions (≤150 lines)
  • import_topological.go created with topological sort functions (≤200 lines)
  • import_remote.go created with remote origin types/parsing (≤100 lines)
  • import_field_extractor.go created with field extraction logic (≤250 lines)
  • import_bfs.go created with BFS traversal core (≤350 lines)
  • All tests pass (make test-unit)
  • Code passes linting (make lint)
  • Build succeeds (make build)
Additional Context
  • Repository Guidelines: Follow patterns in AGENTS.md — prefer many smaller files grouped by functionality
  • Validation Complexity: Target 100–200 lines per file, hard limit 300 lines
  • Existing Test Files: import_topological_test.go, import_cycle_test.go, yaml_import_test.go, import_remote_nested_test.go already provide good coverage scaffolding

Priority: Medium
Effort: Medium (well-contained package, clear split boundaries, strong existing test coverage)
Expected Impact: Improved maintainability, easier onboarding, reduced complexity per file

References:

Generated by Daily File Diet

  • expires on Mar 1, 2026, 1:32 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions