Skip to content

Refactor Validate-MarkdownFrontmatter.ps1 for improved modularity and testability #266

@WilliamBerryiii

Description

@WilliamBerryiii

Issue Type

enhancement

Description

The Validate-MarkdownFrontmatter.ps1 script (~1,600 lines) has grown to become one of the largest modules in the codebase. While it has comprehensive functionality, the Test-FrontmatterValidation function (~370 lines with 0% test coverage) has accumulated mixed concerns that hinder testability and maintainability.

Current State Analysis

Coverage metrics (from JaCoCo XML):

Method Lines Missed Lines Covered Coverage
Test-FrontmatterValidation 289 0 0%
ValidationResult class 4 0 0%
All other functions 79 135 63%
Script total 368 135 ~27%

Root cause: Test-FrontmatterValidation mixes orchestration, I/O, business rules, and output formatting in a single monolithic function, making unit testing impractical.

Identified Concerns in Test-FrontmatterValidation

  1. Repository root detection (lines 909-920)

    • Git command execution mixed with validation logic
  2. Input sanitization (lines 927-980)

    • Duplicate array sanitization for $Files and $Paths
    • Should be extracted to reusable helper
  3. File discovery (lines 982-1085)

    • Two branches: specific files vs. directory scanning
    • Gitignore pattern matching
    • ExcludePaths filtering
    • Mixed with validation flow
  4. Content-type-specific validation rules (lines 1110-1309+)

    • Root community files validation
    • DevContainer docs validation
    • VSCode README validation
    • GitHub resources (agents, chatmodes, instructions, prompts)
    • Docs directory validation
    • Each has different required/suggested fields
  5. GitHub Actions annotations (throughout)

    • Write-GitHubAnnotation calls interspersed with validation
    • Output formatting coupled to validation logic
  6. Results aggregation (throughout)

    • Error/warning collection mixed with validation
    • HashSet management for files with issues

Proposed Refactoring

Extract the following functions to improve single-responsibility adherence:

# Repository detection
function Get-RepositoryRoot { }

# Input sanitization (generic helper)
function Remove-EmptyArrayEntries {
    param([string[]]$Array)
}

# File discovery consolidation
function Get-MarkdownFilesToValidate {
    param(
        [string[]]$Files,
        [string[]]$Paths,
        [string[]]$ExcludePaths,
        [switch]$ChangedFilesOnly,
        [string]$BaseBranch
    )
}

# Content-type-specific validators
function Test-RootCommunityFileFields { }
function Test-DevContainerFileFields { }
function Test-VSCodeReadmeFields { }
function Test-GitHubResourceFields { }
function Test-DocsFileFields { }
function Test-CommonFields { }  # keywords, reading_time

# Date format validation
function Test-DateFormat {
    param([string]$Date, [string]$FilePath)
}

# Footer requirements determination
function Get-FooterRequirement {
    param([FileTypeInfo]$FileTypeInfo, [System.IO.FileInfo]$File)
}

# Results handling
function Add-ValidationIssue {
    param(
        [string]$Message,
        [ValidateSet('Error','Warning')]$Severity,
        [string]$FilePath,
        [ref]$Errors,
        [ref]$Warnings,
        [ref]$FilesWithErrors,
        [ref]$FilesWithWarnings
    )
}

Benefits

  1. Unit testability - Each extracted function can be tested in isolation without filesystem/git dependencies
  2. Coverage improvement - Currently untested 289 lines become accessible to unit tests
  3. Maintainability - Content-type rules isolated in dedicated functions
  4. Reusability - Remove-EmptyArrayEntries, Get-RepositoryRoot useful elsewhere
  5. Readability - Main orchestration function becomes a clear workflow

Coordination Notes

Acceptance Criteria

  • Test-FrontmatterValidation reduced to <100 lines of orchestration
  • Content-type validators extracted to separate functions
  • Input sanitization extracted to reusable helper
  • File discovery logic consolidated
  • Unit tests added for each extracted function
  • Line coverage for this file reaches >80%
  • All existing integration tests pass
  • PSScriptAnalyzer clean

Additional Context

The current architecture evolved organically as content types were added. This refactoring aligns with the pattern established in other linting scripts where pure functions are separated from I/O operations.

Related coverage improvement work:

Metadata

Metadata

Labels

enhancementNew feature or requesttestingTest infrastructure and test files

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions