Skip to content

feat: Add conditional workspace checkout to detection job for patch context #23191

@davidslater

Description

@davidslater

Background

The Detection Job runs as a separate GitHub Actions job after the Agent job completes. Its purpose is to analyze what the agent was asked to do, what it declared it did, and what it actually changed in code. Currently, it operates only on three artifact files downloaded from the agent job:

File What it represents
prompt.txt What the agent was asked to do
agent_output.json What the agent declared it did (structured actions)
aw-*.patch What the agent actually changed in code (present if changes were made)

The detection job runs on a fresh runner with no repository checkout. This means when a patch is present, the detection engine sees code changes entirely without context — it cannot see the surrounding codebase, existing patterns, imported modules, project structure, or how the changed code fits into the larger system.

Problem

When a patch is provided (has_patch == 'true'), the detection engine must judge whether changes are malicious — e.g. suspicious web service calls, backdoors, encoded strings, or newly introduced dependencies — but it has no access to:

  • The surrounding source files that the patch modifies
  • Existing code patterns (to distinguish novel vs. already-established patterns)
  • Project dependency manifests (go.mod, package.json, requirements.txt, etc.) that would clarify whether a new import is a known trusted package
  • The repository directory structure that would reveal project type and conventions

This creates a significant gap: a dependency addition might look suspicious in isolation but be completely legitimate given the project's established patterns. Conversely, a patch that appears benign without codebase context could be obviously malicious when the surrounding code is visible.

Current State (ground truth from main)

Detection job generation: pkg/workflow/threat_detection.go

buildDetectionJob() (lines 559–636) constructs the detection job. It:

  1. Optionally adds generateCheckoutActionsFolder() (dev/script mode only — this checks out only actions/ from github/gh-aw, not the target repository)
  2. Downloads the agent output artifact via buildAgentOutputDownloadSteps()
  3. Builds all threat detection steps via buildDetectionJobSteps()

There is no step that checks out the target repository workspace.

Detection job YAML (from archie.lock.yml)

detection:
  needs: agent
  if: always() && needs.agent.result != 'skipped'
  runs-on: ubuntu-latest
  permissions:
    contents: read
  steps:
    - name: Checkout actions folder        # ← only in dev/script mode; checks out gh-aw's actions/, NOT the target repo
      uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
      with:
        repository: github/gh-aw
        sparse-checkout: |
          actions
        persist-credentials: false
    - name: Setup Scripts
      uses: ./actions/setup
    - name: Download agent output artifact  # ← artifact contains prompt.txt, agent_output.json, aw-*.patch
      id: download-agent-output
      continue-on-error: true
      uses: actions/download-artifact@...
      with:
        name: agent
        path: /tmp/gh-aw/
    # ... then threat detection steps
    - name: Check if detection needed
      id: detection_guard
      if: always()
      env:
        OUTPUT_TYPES: ${{ needs.agent.outputs.output_types }}
        HAS_PATCH: ${{ needs.agent.outputs.has_patch }}        # ← has_patch is already available!
      run: |
        if [[ -n "$OUTPUT_TYPES" || "$HAS_PATCH" == "true" ]]; then
          echo "run_detection=true" >> "$GITHUB_OUTPUT"
        fi
    - name: Prepare threat detection files
      if: always() && steps.detection_guard.outputs.run_detection == 'true'
      run: |
        cp /tmp/gh-aw/aw-prompts/prompt.txt /tmp/gh-aw/threat-detection/aw-prompts/prompt.txt
        cp /tmp/gh-aw/agent_output.json /tmp/gh-aw/threat-detection/agent_output.json
        for f in /tmp/gh-aw/aw-*.patch; do
          [ -f "$f" ] && cp "$f" /tmp/gh-aw/threat-detection/
        done

has_patch output

Already defined in pkg/workflow/compiler_main_job.go:

outputs["has_patch"] = "${{ steps.collect_output.outputs.has_patch }}"

And already referenced in the detection guard step env:

"          HAS_PATCH: ${{ needs.agent.outputs.has_patch }}\n",

So needs.agent.outputs.has_patch is already wired through; we just need to act on it.

Workspace checkout infrastructure

pkg/workflow/checkout_manager.go already contains GenerateDefaultCheckoutStep() (line 403), which generates a full actions/checkout step with persist-credentials: false, supports sparse checkout, ref overrides, token overrides, etc. The detection job can reuse this infrastructure.

Threat detection prompt template (actions/setup/md/threat_detection.md)

The current template tells the detection engine:

  • Where prompt.txt is located
  • Where agent_output.json is located
  • Where aw-*.patch file(s) are located (by filename and byte size)

It does not reference or direct the engine to look at $GITHUB_WORKSPACE source files at all. This will need updating too.

Permissions

The detection job already has contents: read in dev/script mode (lines 617–622 of threat_detection.go). In production mode, this permission needs to be added when a patch is present.

Proposed Solution

Add a conditional actions/checkout step to the detection job that runs only when a patch is present (needs.agent.outputs.has_patch == 'true').

Something like:

  if: needs.agent.outputs.has_patch == 'true'
  uses: actions/checkout@...
  with:
    persist-credentials: false
    fetch-depth: 1

Step placement

Insert the workspace checkout step after the artifact download (download-agent-output) and before the detection guard step. This ensures the workspace is available before the detection files are prepared, so the detection engine sees the codebase as context when analyzing the patch.

Step condition

Use an if: expression on the step itself:

if: ${{ needs.agent.outputs.has_patch == 'true' }}

This means the step is skipped entirely when there's no patch, keeping the detection job fast for output-only runs.

Permissions

When has_patch is true, the detection job needs contents: read. Since the flag is only known at runtime (it's a job output expression), the permission should always be granted on the detection job. This is already granted in dev/script mode; it should also be granted in production mode, unconditionally, because:

  1. contents: read is a minimal read-only permission
  2. The detection job already uses persist-credentials: false in the actions folder checkout
  3. We cannot conditionally grant permissions based on job outputs in GitHub Actions

Update buildDetectionJob() to always emit contents: read permission (not just in dev/script mode when generateCheckoutActionsFolder is non-empty).

New step to add in buildDetectionJob()

// buildWorkspaceCheckoutForDetectionStep creates a checkout step for the detection job.
// It runs only when the agent job produced a patch, so the detection engine can
// see code changes in the context of the surrounding codebase.
func (c *Compiler) buildWorkspaceCheckoutForDetectionStep(data *WorkflowData) []string {
    steps := []string{
        "      - name: Checkout repository for patch context\n",
        fmt.Sprintf("        if: needs.%s.outputs.has_patch == 'true'\n", constants.AgentJobName),
        fmt.Sprintf("        uses: %s\n", GetActionPin("actions/checkout")),
        "        with:\n",
        "          persist-credentials: false\n",
    }
    return steps
}

This step uses actions/checkout without a specific ref: or token: override — it defaults to checking out the SHA that triggered the workflow, which is the pre-agent-changes state of the repository. This is correct: the patch represents what changed from that base, so the workspace should be at that base.

Threat detection prompt template update

Update actions/setup/md/threat_detection.md (and its copy at pkg/workflow/prompts/threat_detection.md) to include a new section directing the engine to use the workspace when a patch is available:

## Codebase Context (when patch is present)

When a patch file is provided above, the full repository is available at `$GITHUB_WORKSPACE`.
Use it to understand the broader context of the changes:
- Review the files modified by the patch in their surrounding context
- Check existing dependency manifests (e.g. `go.mod`, `package.json`, `requirements.txt`) for
  whether newly introduced packages are already trusted in the project
- Inspect calling code and module structure to distinguish legitimate patterns from novel ones

Files to modify

  1. pkg/workflow/threat_detection.go

    • Add buildWorkspaceCheckoutForDetectionStep() function
    • Call it from buildDetectionJob() after buildAgentOutputDownloadSteps()
    • Update the permissions logic: always grant contents: read (not only in dev/script mode)
  2. actions/setup/md/threat_detection.md and pkg/workflow/prompts/threat_detection.md

    • Add codebase context section (keep both files in sync — they are copies)
  3. pkg/workflow/threat_detection_test.go (and/or a new threat_detection_workspace_test.go)

    • Verify the workspace checkout step is present in the detection job when has_patch is set
    • Verify the step has the correct if: condition
    • Verify the step uses persist-credentials: false
    • Verify contents: read permission is present on the detection job in production mode
    • Verify the checkout step is NOT present (or does not run) when has_patch is false/absent

Tests to verify

Unit tests (in pkg/workflow/)

Add table-driven tests to threat_detection_test.go (or a new threat_detection_workspace_test.go) covering:

Test case Expected outcome
Workflow with safe-outputs.threat-detection enabled (standard case) Detection job YAML includes workspace checkout step with if: needs.agent.outputs.has_patch == 'true'
Detection job YAML Workspace checkout step has persist-credentials: false
Detection job YAML permissions.contents: read is present in production mode
Detection job YAML Checkout step uses actions/checkout (pinned)
Detection job YAML Checkout step is named "Checkout repository for patch context" (or similar)
Workflow with threat-detection: engine: false (engine disabled, custom steps only) No workspace checkout step (no engine, no patch analysis needed)

The existing test helper pattern to use:

data := buildTestWorkflowData(t)
data.SafeOutputs = &SafeOutputsConfig{
    ThreatDetection: &ThreatDetectionConfig{},
}
compiler := newTestCompiler()
job, err := compiler.buildDetectionJob(data)
require.NoError(t, err)
require.NotNil(t, job)
stepsString := strings.Join(job.Steps, "")
assert.Contains(t, stepsString,
    fmt.Sprintf("if: needs.%s.outputs.has_patch == 'true'", constants.AgentJobName),
    "detection job should include workspace checkout step conditional on has_patch")
assert.Contains(t, stepsString, "persist-credentials: false",
    "workspace checkout step should disable credential persistence")
assert.Contains(t, job.Permissions, "contents: read",
    "detection job should have contents: read permission in production mode")

Recompile

After code changes:

make recompile

Verify that affected lock files (e.g. .github/workflows/archie.lock.yml) now contain a Checkout repository for patch context step in their detection: job section with the correct if: condition and persist-credentials: false.

Validation

make agent-finish

Acceptance Criteria

  • Detection job YAML contains a workspace checkout step (actions/checkout with persist-credentials: false)
  • The step has an if: condition: needs.agent.outputs.has_patch == 'true'
  • The detection job has contents: read permission in all modes (not just dev/script)
  • Threat detection prompt template instructs the engine to use $GITHUB_WORKSPACE for context when a patch is present
  • Existing tests continue to pass (make test-unit)
  • New unit tests cover the above cases
  • All compiled lock files are regenerated (make recompile) and reflect the new step

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions