-
Notifications
You must be signed in to change notification settings - Fork 328
feat: Add conditional workspace checkout to detection job for patch context #23191
Description
Background
The Detection Job runs as a separate GitHub Actions job after the Agent job completes. Its purpose is to analyze what the agent was asked to do, what it declared it did, and what it actually changed in code. Currently, it operates only on three artifact files downloaded from the agent job:
| File | What it represents |
|---|---|
prompt.txt |
What the agent was asked to do |
agent_output.json |
What the agent declared it did (structured actions) |
aw-*.patch |
What the agent actually changed in code (present if changes were made) |
The detection job runs on a fresh runner with no repository checkout. This means when a patch is present, the detection engine sees code changes entirely without context — it cannot see the surrounding codebase, existing patterns, imported modules, project structure, or how the changed code fits into the larger system.
Problem
When a patch is provided (has_patch == 'true'), the detection engine must judge whether changes are malicious — e.g. suspicious web service calls, backdoors, encoded strings, or newly introduced dependencies — but it has no access to:
- The surrounding source files that the patch modifies
- Existing code patterns (to distinguish novel vs. already-established patterns)
- Project dependency manifests (
go.mod,package.json,requirements.txt, etc.) that would clarify whether a new import is a known trusted package - The repository directory structure that would reveal project type and conventions
This creates a significant gap: a dependency addition might look suspicious in isolation but be completely legitimate given the project's established patterns. Conversely, a patch that appears benign without codebase context could be obviously malicious when the surrounding code is visible.
Current State (ground truth from main)
Detection job generation: pkg/workflow/threat_detection.go
buildDetectionJob() (lines 559–636) constructs the detection job. It:
- Optionally adds
generateCheckoutActionsFolder()(dev/script mode only — this checks out onlyactions/fromgithub/gh-aw, not the target repository) - Downloads the agent output artifact via
buildAgentOutputDownloadSteps() - Builds all threat detection steps via
buildDetectionJobSteps()
There is no step that checks out the target repository workspace.
Detection job YAML (from archie.lock.yml)
detection:
needs: agent
if: always() && needs.agent.result != 'skipped'
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout actions folder # ← only in dev/script mode; checks out gh-aw's actions/, NOT the target repo
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
repository: github/gh-aw
sparse-checkout: |
actions
persist-credentials: false
- name: Setup Scripts
uses: ./actions/setup
- name: Download agent output artifact # ← artifact contains prompt.txt, agent_output.json, aw-*.patch
id: download-agent-output
continue-on-error: true
uses: actions/download-artifact@...
with:
name: agent
path: /tmp/gh-aw/
# ... then threat detection steps
- name: Check if detection needed
id: detection_guard
if: always()
env:
OUTPUT_TYPES: ${{ needs.agent.outputs.output_types }}
HAS_PATCH: ${{ needs.agent.outputs.has_patch }} # ← has_patch is already available!
run: |
if [[ -n "$OUTPUT_TYPES" || "$HAS_PATCH" == "true" ]]; then
echo "run_detection=true" >> "$GITHUB_OUTPUT"
fi
- name: Prepare threat detection files
if: always() && steps.detection_guard.outputs.run_detection == 'true'
run: |
cp /tmp/gh-aw/aw-prompts/prompt.txt /tmp/gh-aw/threat-detection/aw-prompts/prompt.txt
cp /tmp/gh-aw/agent_output.json /tmp/gh-aw/threat-detection/agent_output.json
for f in /tmp/gh-aw/aw-*.patch; do
[ -f "$f" ] && cp "$f" /tmp/gh-aw/threat-detection/
donehas_patch output
Already defined in pkg/workflow/compiler_main_job.go:
outputs["has_patch"] = "${{ steps.collect_output.outputs.has_patch }}"And already referenced in the detection guard step env:
" HAS_PATCH: ${{ needs.agent.outputs.has_patch }}\n",So needs.agent.outputs.has_patch is already wired through; we just need to act on it.
Workspace checkout infrastructure
pkg/workflow/checkout_manager.go already contains GenerateDefaultCheckoutStep() (line 403), which generates a full actions/checkout step with persist-credentials: false, supports sparse checkout, ref overrides, token overrides, etc. The detection job can reuse this infrastructure.
Threat detection prompt template (actions/setup/md/threat_detection.md)
The current template tells the detection engine:
- Where
prompt.txtis located - Where
agent_output.jsonis located - Where
aw-*.patchfile(s) are located (by filename and byte size)
It does not reference or direct the engine to look at $GITHUB_WORKSPACE source files at all. This will need updating too.
Permissions
The detection job already has contents: read in dev/script mode (lines 617–622 of threat_detection.go). In production mode, this permission needs to be added when a patch is present.
Proposed Solution
Add a conditional actions/checkout step to the detection job that runs only when a patch is present (needs.agent.outputs.has_patch == 'true').
Something like:
if: needs.agent.outputs.has_patch == 'true'
uses: actions/checkout@...
with:
persist-credentials: false
fetch-depth: 1Step placement
Insert the workspace checkout step after the artifact download (download-agent-output) and before the detection guard step. This ensures the workspace is available before the detection files are prepared, so the detection engine sees the codebase as context when analyzing the patch.
Step condition
Use an if: expression on the step itself:
if: ${{ needs.agent.outputs.has_patch == 'true' }}This means the step is skipped entirely when there's no patch, keeping the detection job fast for output-only runs.
Permissions
When has_patch is true, the detection job needs contents: read. Since the flag is only known at runtime (it's a job output expression), the permission should always be granted on the detection job. This is already granted in dev/script mode; it should also be granted in production mode, unconditionally, because:
contents: readis a minimal read-only permission- The detection job already uses
persist-credentials: falsein the actions folder checkout - We cannot conditionally grant permissions based on job outputs in GitHub Actions
Update buildDetectionJob() to always emit contents: read permission (not just in dev/script mode when generateCheckoutActionsFolder is non-empty).
New step to add in buildDetectionJob()
// buildWorkspaceCheckoutForDetectionStep creates a checkout step for the detection job.
// It runs only when the agent job produced a patch, so the detection engine can
// see code changes in the context of the surrounding codebase.
func (c *Compiler) buildWorkspaceCheckoutForDetectionStep(data *WorkflowData) []string {
steps := []string{
" - name: Checkout repository for patch context\n",
fmt.Sprintf(" if: needs.%s.outputs.has_patch == 'true'\n", constants.AgentJobName),
fmt.Sprintf(" uses: %s\n", GetActionPin("actions/checkout")),
" with:\n",
" persist-credentials: false\n",
}
return steps
}This step uses actions/checkout without a specific ref: or token: override — it defaults to checking out the SHA that triggered the workflow, which is the pre-agent-changes state of the repository. This is correct: the patch represents what changed from that base, so the workspace should be at that base.
Threat detection prompt template update
Update actions/setup/md/threat_detection.md (and its copy at pkg/workflow/prompts/threat_detection.md) to include a new section directing the engine to use the workspace when a patch is available:
## Codebase Context (when patch is present)
When a patch file is provided above, the full repository is available at `$GITHUB_WORKSPACE`.
Use it to understand the broader context of the changes:
- Review the files modified by the patch in their surrounding context
- Check existing dependency manifests (e.g. `go.mod`, `package.json`, `requirements.txt`) for
whether newly introduced packages are already trusted in the project
- Inspect calling code and module structure to distinguish legitimate patterns from novel onesFiles to modify
-
pkg/workflow/threat_detection.go- Add
buildWorkspaceCheckoutForDetectionStep()function - Call it from
buildDetectionJob()afterbuildAgentOutputDownloadSteps() - Update the permissions logic: always grant
contents: read(not only in dev/script mode)
- Add
-
actions/setup/md/threat_detection.mdandpkg/workflow/prompts/threat_detection.md- Add codebase context section (keep both files in sync — they are copies)
-
pkg/workflow/threat_detection_test.go(and/or a newthreat_detection_workspace_test.go)- Verify the workspace checkout step is present in the detection job when
has_patchis set - Verify the step has the correct
if:condition - Verify the step uses
persist-credentials: false - Verify
contents: readpermission is present on the detection job in production mode - Verify the checkout step is NOT present (or does not run) when
has_patchis false/absent
- Verify the workspace checkout step is present in the detection job when
Tests to verify
Unit tests (in pkg/workflow/)
Add table-driven tests to threat_detection_test.go (or a new threat_detection_workspace_test.go) covering:
| Test case | Expected outcome |
|---|---|
Workflow with safe-outputs.threat-detection enabled (standard case) |
Detection job YAML includes workspace checkout step with if: needs.agent.outputs.has_patch == 'true' |
| Detection job YAML | Workspace checkout step has persist-credentials: false |
| Detection job YAML | permissions.contents: read is present in production mode |
| Detection job YAML | Checkout step uses actions/checkout (pinned) |
| Detection job YAML | Checkout step is named "Checkout repository for patch context" (or similar) |
Workflow with threat-detection: engine: false (engine disabled, custom steps only) |
No workspace checkout step (no engine, no patch analysis needed) |
The existing test helper pattern to use:
data := buildTestWorkflowData(t)
data.SafeOutputs = &SafeOutputsConfig{
ThreatDetection: &ThreatDetectionConfig{},
}
compiler := newTestCompiler()
job, err := compiler.buildDetectionJob(data)
require.NoError(t, err)
require.NotNil(t, job)
stepsString := strings.Join(job.Steps, "")
assert.Contains(t, stepsString,
fmt.Sprintf("if: needs.%s.outputs.has_patch == 'true'", constants.AgentJobName),
"detection job should include workspace checkout step conditional on has_patch")
assert.Contains(t, stepsString, "persist-credentials: false",
"workspace checkout step should disable credential persistence")
assert.Contains(t, job.Permissions, "contents: read",
"detection job should have contents: read permission in production mode")Recompile
After code changes:
make recompileVerify that affected lock files (e.g. .github/workflows/archie.lock.yml) now contain a Checkout repository for patch context step in their detection: job section with the correct if: condition and persist-credentials: false.
Validation
make agent-finishAcceptance Criteria
- Detection job YAML contains a workspace checkout step (
actions/checkoutwithpersist-credentials: false) - The step has an
if:condition:needs.agent.outputs.has_patch == 'true' - The detection job has
contents: readpermission in all modes (not just dev/script) - Threat detection prompt template instructs the engine to use
$GITHUB_WORKSPACEfor context when a patch is present - Existing tests continue to pass (
make test-unit) - New unit tests cover the above cases
- All compiled lock files are regenerated (
make recompile) and reflect the new step