Skip to content

feat(policy): check workflow, process, and network surfaces (#207, rollout PR 8/12)#220

Merged
EffortlessSteven merged 1 commit into
mainfrom
feat/xtask-workflow-process-network-checks-20260511
May 12, 2026
Merged

feat(policy): check workflow, process, and network surfaces (#207, rollout PR 8/12)#220
EffortlessSteven merged 1 commit into
mainfrom
feat/xtask-workflow-process-network-checks-20260511

Conversation

@EffortlessSteven

Copy link
Copy Markdown
Member

Summary

Eighth PR in the 12-PR file-policy rollout. High-risk half. Three new xtask subcommands cover which workflows are authorized, what commands they may invoke, and what endpoints they may contact.

Issue

Closes #207. Depends on #203 (workflow/process/network ledgers, merged), #212 (xtask scaffold), #206 (gen/exec/dep checks for the model). Refines #180. Tracks #109.

Commands

Command Universe Ledger Result on main
cargo xtask check-workflow-surfaces tracked .github/workflows/*.yml policy/workflow-allowlist.toml 9 workflows, 10 entries, 0 findings
cargo xtask check-process-policy receipted workflow contents policy/process-allowlist.toml profiles 9 workflows, 10 unknown tokens (all false positives — see below)
cargo xtask check-network-policy receipted workflow contents policy/network-allowlist.toml profiles 9 workflows, 0 unknown endpoints

All three accept --mode advisory|blocking-allowlist|blocking-strict. Reports land at target/policy/<name>-report.{md,json}.

Decisions

  • Grep heuristics, not YAML/AST parsing. The PR spec was explicit: "start simple." The recognition vocabulary for check-process-policy is a fixed list of well-known shell commands; tokens outside that list are silently ignored. Refining is a follow-up.
  • kind = "dependabot_config" skip in all three workflow-content checks. The .github/dependabot.yml entry rides along in the workflow allowlist for catalog purposes (see chore(policy): receipt high-risk non-Rust surfaces (#203, rollout PR 3/12) #215 PR body) but it is not a workflow file and has no shell commands or URLs to scan. Its receipt is still validated (missing fields, expired, stale).
  • Subdomain endpoint matching. allowed_endpoints = ["github.com"] covers api.github.com, objects.githubusercontent.com, etc. Tighter posture remains available via explicit-only lists.
  • New direct dep: regex = "1" — used only by check-network-policy for host extraction.

On the 10 "unknown commands"

All ten are false positives at the recognition layer. Examples:

Workflow Profile Flagged Real meaning
ci.yml ci install, shipper, sudo "shipper" / "install" appear as build-target / cargo-install args, not as command invocations. sudo appears in a comment or echoed string.
coverage.yml ci mkdir, python3, shipper Tokens appear inside cargo-invoked subcommands or output, not as direct commands.
release.yml release mkdir The release profile lists the right commands; mkdir appears as a build-step name.

This is exactly the OPEN question the user flagged for PR 8:

"false-positive handling for command words inside heredocs/comments"

Advisory mode is the right home for these. A follow-up can refine the detector (require token to appear in a run: step's content, not in name: / description: / comment lines).

Acceptance

  • cargo check --workspace --locked passes.
  • cargo clippy -p xtask --all-targets --locked -- -D warnings clean.
  • cargo fmt --all -- --check clean.
  • All three commands run cleanly in advisory mode.
  • Each produces target/policy/<name>-report.{md,json}.

Follow-ups

Eighth PR in the 12-PR file-policy rollout. Three new subcommands that
together cover the high-risk half of file policy: which workflows are
authorized, what commands they may invoke, and what endpoints they may
contact.

## Commands

- `cargo xtask check-workflow-surfaces`
    Universe: tracked `.github/workflows/*.yml`. Each must have a
    `[[workflow]]` entry in `policy/workflow-allowlist.toml`. Each entry
    must name a `process_policy` and `network_policy` that exist in
    their respective ledgers. Entries with `kind = "dependabot_config"`
    are catalog-only and exempt from the workflow-file reconciliation;
    their receipts are still validated (missing fields, expired, stale).

- `cargo xtask check-process-policy`
    For each receipted workflow (excluding `kind = "dependabot_config"`),
    scan its file content for known shell-command tokens; flag tokens
    not present in the workflow's declared `process` profile's
    `allowed_processes`.

- `cargo xtask check-network-policy`
    For each receipted workflow, regex-scan for `https?://<host>` URLs
    and reconcile hostnames against the declared `network` profile's
    `allowed_endpoints`. Subdomain coverage is recognized (e.g.,
    `raw.githubusercontent.com` covers `objects.githubusercontent.com`).

All three accept `--mode advisory|blocking-allowlist|blocking-strict`.

## Decisions

- **Grep heuristics, not YAML/AST parsing.** The PR spec explicitly
  said "start simple". The recognition vocabulary for `check-process-
  policy` is a fixed list of well-known shell commands; tokens outside
  that list are silently ignored. This produces false positives where
  a command name appears as a build target or argument (e.g., the word
  "shipper" in `cargo build -p shipper`). Advisory mode reports these;
  CI does not block on them. Refining the detector (e.g., requiring
  the token to appear inside a `run:` block) is a follow-up.

- **`dependabot_config` skip** in workflow-surfaces / process / network
  checks. The `.github/dependabot.yml` entry rides along in the
  workflow allowlist for catalog purposes (see #215 PR body), but it
  is not a workflow file and has no shell commands or URLs to scan.

- **Subdomain endpoint matching.** `allowed_endpoints = ["github.com"]`
  covers `api.github.com`, `objects.githubusercontent.com`, etc.,
  rather than requiring every host be enumerated. Tighter modes
  remain available via the explicit-only `allowed_endpoints`.

- **New direct dep: `regex = "1"`.** Used only by `check-network-policy`
  for the host extraction pattern. Could be replaced by hand-rolled
  parsing if dep weight becomes an issue.

## Acceptance

- `cargo check --workspace --locked` passes.
- `cargo clippy -p xtask --all-targets --locked -- -D warnings` clean.
- `cargo fmt --all -- --check` clean.
- Live results on current main:
    check-workflow-surfaces: workflows=9 entries=10 unreceipted=0 missing=0 expired=0 stale=0 unused=0 invalid_refs=0
    check-process-policy:    workflows=9 unknown_total=10  (all false positives — see PR body)
    check-network-policy:    workflows=9 unknown_total=0

## Out of scope

- Unified policy-report (PR 9 / #208)
- CI advisory wiring (PR 10 / #209)
- Promotion to blocking-allowlist (PR 11 / #210, PR 12 / #211)
- Refining the process-detector to suppress build-target false positives.
- DRY-up of the shared `workspace_root`, `today_iso`, `date_is_past`,
  `tracked_workflow_files` helpers across xtask modules.

Closes #207.
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown

Review Change Stack

Summary by CodeRabbit

  • Chores
    • Enhanced internal development tooling with new validation and policy-checking capabilities.

Walkthrough

This PR introduces workflow policy checking to the xtask CLI with three validations: reconciling tracked workflows against an allowlist, scanning for undeclared process commands, and scanning for undeclared network endpoints. Each check loads YAML workflows and TOML policies, generates reports, and conditionally fails in blocking modes.

Changes

Workflow Policy Checking Feature

Layer / File(s) Summary
CLI Command Wiring & Dependencies
xtask/Cargo.toml, xtask/src/main.rs
Added regex dependency (version "1"). Declared mod workflow_checks, extended Command enum with three subcommands, introduced WorkflowModeArgs struct with --mode flag (defaults to Advisory), and routed subcommands through main() to workflow_checks functions.
Policy Check Interface & Data Models
xtask/src/workflow_checks.rs
Defined public Mode enum (Advisory, BlockingAllowlist, BlockingStrict). Added TOML deserialization structs for workflow allowlist entries, process profiles, and network profiles. Defined report structures for workflow-surface checks (unreceipted workflows, unused entries, missing fields, expired/stale entries, invalid policy references).
Workflow Surface Policy Check
xtask/src/workflow_checks.rs
Implemented check_workflow_surfaces(): loads tracked workflows and allowlist, separates dependabot catalog from surface reconciliation, validates field completeness and policy name references, checks entry expiry and staleness dates, aggregates findings, writes JSON/Markdown reports, and fails in blocking modes if violation count exceeds threshold.
Process Policy Check
xtask/src/workflow_checks.rs
Defined report types and static "known command tokens" vocabulary for heuristic shell-command detection. Implemented check_process_policy(): loads workflows and process profiles, scans non-dependabot workflows for known tokens via word-boundary matching, flags tokens absent from declared allowed processes, writes reports, and fails in non-advisory modes on unknown commands.
Network Policy Check
xtask/src/workflow_checks.rs
Implemented check_network_policy(): loads workflows and network profiles, extracts http/https URL hostnames using regex, marks hosts unknown when endpoints are not covered by the declared profile's allowed endpoints (including subdomain/superset matching), writes JSON/Markdown reports, and fails in non-advisory modes on unknown endpoints.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Three checks in workflows, a guardian's quest,
Surfaces aligned, processes blessed,
Networks now watched with regex so keen,
Policy compliance, a pristine scene!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main addition of three workflow/process/network policy-checking subcommands to the xtask CLI as part of a multi-PR rollout.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the three new commands, their functionality, design decisions, and acceptance criteria.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/xtask-workflow-process-network-checks-20260511

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@factory-droid

factory-droid Bot commented May 12, 2026

Copy link
Copy Markdown

Droid finished @EffortlessSteven's task —— View job


Droid review complete — PR 220 validated.

No actionable findings emitted.

Inspected surfaces:

  • xtask/src/workflow_checks.rs (new file, 823 lines) — check_workflow_surfaces, check_process_policy, check_network_policy
  • xtask/src/main.rs — new subcommand registration and argument parsing
  • xtask/Cargo.toml — regex = "1" dependency added

Checks performed:

  • Word-boundary token detection (word_present / is_word_char) — sound
  • Date comparison (date_is_past / NaiveDate parsing) — handles bad input gracefully
  • Subdomain endpoint matching (endpoint_covered) — correct logic for github.com vs api.github.com
  • Regex compilation with context error — correct
  • fs::read_to_string with with_context — no silent failures
  • BTreeSet usage throughout — determinism preserved
  • dependabot_config skip logic in process/network checks — consistent with spec

Why no comments:
Pure additive xtask feature; no publish-state, registry-truth, token-redaction, or state-contract surfaces touched. The three commands are independent policy checkers with no effect on shipper-core publish/reconcile behavior.

Residual risk:
The process-token detector produces known false positives (e.g., "cargo" appearing in cargo invocation args, "mkdir" as a build-step name). These are acknowledged in the PR description as OPEN for follow-up. Advisory mode is the appropriate home.

Validation signal:
Observed: diff is additive only; no modifications to shipper-core engine, state, or CLI entry points
Reported: PR description states all three commands run cleanly in advisory mode
Not verified: actual runtime behavior of the new xtask commands against the real policy ledger files

Review summary: approved — no issues found.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@xtask/src/workflow_checks.rs`:
- Around line 546-572: Detected hostnames are compared case-sensitively
currently; normalize to lowercase at the boundary so matching works (e.g., when
host_re captures in the loop and when building the allowed set). Change the
insertion in the capture loop (where detected.insert(host.as_str().to_string())
is called) to insert host.as_str().to_lowercase(), and when constructing the
allowed BTreeSet from profiles_by_name (allowed_endpoints), map each endpoint to
.to_lowercase() so endpoint_covered and subsequent comparisons operate on
lowercased strings; the same normalization should be applied in the other
capture loop referenced around lines 617-625.
- Around line 156-160: The Vec dependabot_entries is allocated from all_entries
via filter+cloned but never used; either remove its creation entirely or
actually consume it—e.g., compute dependabot_entries.len() and store that count
in WorkflowSummary (update the WorkflowSummary struct and where it’s
constructed) or emit the count somewhere relevant; to fix, delete the
dependabot_entries binding and any let _ = dependabot_entries; lines if you
choose removal, or replace the unused Vec with a usize count (use
is_dependabot_config on all_entries to compute the count) and propagate that
value into WorkflowSummary construction and types where necessary.
- Around line 5-9: The module doc promises `.github/dependabot.yml` is included
but the collection predicate in tracked_workflow_files only matches paths
starting with ".github/workflows/" and ending with ".yml", so dependabot is
never added to the workflows set; update the predicate used where
tracked_workflow_files is built (the p.starts_with(...) && p.ends_with(...)
check) to also include the literal ".github/dependabot.yml" (e.g. OR p ==
".github/dependabot.yml" or equivalent), and apply the same change to the other
occurrence of the same filter so the unreceipted reconciliation will flag
missing dependabot receipts; alternatively, if you prefer behavior change
instead of code, edit the module doc string to remove the claim about dependabot
instead of changing tracked_workflow_files.
- Around line 617-625: The function endpoint_covered currently allows an allowed
entry to be a subdomain of the detected host (via the condition
a.ends_with(&format!(".{}", host))), which reverses the intended policy; change
endpoint_covered to only return true when allowed contains host or when host is
a subdomain of an allowed entry (i.e., keep allowed.contains(host) and the
host.ends_with(&format!(".{}", a)) check) and remove the a.ends_with(...) clause
so allowed child subdomains cannot incorrectly cover parent domains.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2dd5e4eb-857e-45a4-a525-24576bbc12fb

📥 Commits

Reviewing files that changed from the base of the PR and between 8e45af0 and a7d56a6.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • xtask/Cargo.toml
  • xtask/src/main.rs
  • xtask/src/workflow_checks.rs

Comment on lines +5 to +9
//! - `cargo xtask check-workflow-surfaces` — every `.github/workflows/*.yml`
//! (and `.github/dependabot.yml`) must be receipted in
//! `policy/workflow-allowlist.toml`. Each entry must name a
//! `process_policy` and `network_policy` that exist in their respective
//! ledgers.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Module doc claims .github/dependabot.yml is part of receipted workflows, but it is never collected.

The header doc says every .github/workflows/*.yml and .github/dependabot.yml must be receipted, but tracked_workflow_files filters to p.starts_with(".github/workflows/") && p.ends_with(".yml"), so .github/dependabot.yml is never in the workflows set and the unreceipted reconciliation cannot flag a missing dependabot receipt. Either tighten the filter to also include .github/dependabot.yml, or amend the doc so it describes the actual behavior (catalog-only validation via the dependabot_config entry).

Also applies to: 629-654

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@xtask/src/workflow_checks.rs` around lines 5 - 9, The module doc promises
`.github/dependabot.yml` is included but the collection predicate in
tracked_workflow_files only matches paths starting with ".github/workflows/" and
ending with ".yml", so dependabot is never added to the workflows set; update
the predicate used where tracked_workflow_files is built (the p.starts_with(...)
&& p.ends_with(...) check) to also include the literal ".github/dependabot.yml"
(e.g. OR p == ".github/dependabot.yml" or equivalent), and apply the same change
to the other occurrence of the same filter so the unreceipted reconciliation
will flag missing dependabot receipts; alternatively, if you prefer behavior
change instead of code, edit the module doc string to remove the claim about
dependabot instead of changing tracked_workflow_files.

Comment on lines +156 to +160
let dependabot_entries: Vec<RawWorkflowEntry> = all_entries
.iter()
.filter(|e| is_dependabot_config(e))
.cloned()
.collect();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Drop dependabot_entries or actually consume it; today it is allocated only to be discarded.

dependabot_entries is built by a full filter+clone over all_entries and is then explicitly silenced via let _ = dependabot_entries;. The comment says it is preserved "for future per-kind audits", but until that audit exists this is dead allocation and obscures intent — every reader has to figure out why a populated Vec is being thrown away. Either remove it now and reintroduce when the audit lands, or use it (e.g., emit a dependabot_entries count in WorkflowSummary).

Also applies to: 271-271

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@xtask/src/workflow_checks.rs` around lines 156 - 160, The Vec
dependabot_entries is allocated from all_entries via filter+cloned but never
used; either remove its creation entirely or actually consume it—e.g., compute
dependabot_entries.len() and store that count in WorkflowSummary (update the
WorkflowSummary struct and where it’s constructed) or emit the count somewhere
relevant; to fix, delete the dependabot_entries binding and any let _ =
dependabot_entries; lines if you choose removal, or replace the unused Vec with
a usize count (use is_dependabot_config on all_entries to compute the count) and
propagate that value into WorkflowSummary construction and types where
necessary.

Comment on lines +546 to +572
let host_re =
Regex::new(r"https?://([A-Za-z0-9.\-]+)").context("compiling network hostname regex")?;

let mut per_workflow = Vec::new();
let mut unknown_total = 0usize;
for e in &entries {
if is_dependabot_config(e) {
// dependabot.yml is configuration, not a script — no URLs to scan.
continue;
}
let path = match &e.path {
Some(p) => p,
None => continue,
};
let profile = e.network_policy.clone().unwrap_or_default();
let allowed: BTreeSet<String> = profiles_by_name
.get(&profile)
.map(|p| p.allowed_endpoints.iter().cloned().collect())
.unwrap_or_default();

let content = read_workflow_content(&workspace_root, path).unwrap_or_default();
let mut detected: BTreeSet<String> = BTreeSet::new();
for caps in host_re.captures_iter(&content) {
if let Some(host) = caps.get(1) {
detected.insert(host.as_str().to_string());
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hostname comparison is case-sensitive; normalize to lowercase.

DNS hostnames are case-insensitive, but detected.insert(host.as_str().to_string()) and the comparisons inside endpoint_covered are byte-exact. A workflow that contains https://GitHub.com/... will be flagged as an unknown endpoint even if github.com is in the profile. Normalize both sides at the boundary so the check matches operator expectations.

🛡️ Proposed fix
         let allowed: BTreeSet<String> = profiles_by_name
             .get(&profile)
-            .map(|p| p.allowed_endpoints.iter().cloned().collect())
+            .map(|p| {
+                p.allowed_endpoints
+                    .iter()
+                    .map(|s| s.to_ascii_lowercase())
+                    .collect()
+            })
             .unwrap_or_default();

         let content = read_workflow_content(&workspace_root, path).unwrap_or_default();
         let mut detected: BTreeSet<String> = BTreeSet::new();
         for caps in host_re.captures_iter(&content) {
             if let Some(host) = caps.get(1) {
-                detected.insert(host.as_str().to_string());
+                detected.insert(host.as_str().to_ascii_lowercase());
             }
         }

Also applies to: 617-625

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@xtask/src/workflow_checks.rs` around lines 546 - 572, Detected hostnames are
compared case-sensitively currently; normalize to lowercase at the boundary so
matching works (e.g., when host_re captures in the loop and when building the
allowed set). Change the insertion in the capture loop (where
detected.insert(host.as_str().to_string()) is called) to insert
host.as_str().to_lowercase(), and when constructing the allowed BTreeSet from
profiles_by_name (allowed_endpoints), map each endpoint to .to_lowercase() so
endpoint_covered and subsequent comparisons operate on lowercased strings; the
same normalization should be applied in the other capture loop referenced around
lines 617-625.

Comment on lines +617 to +625
fn endpoint_covered(host: &str, allowed: &BTreeSet<String>) -> bool {
// Exact match, or `host` is a subdomain of an allowed entry.
if allowed.contains(host) {
return true;
}
allowed.iter().any(|a| {
host == a || host.ends_with(&format!(".{}", a)) || a.ends_with(&format!(".{}", host))
})
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

endpoint_covered lets allowed child subdomains cover detected parent domains, weakening the policy.

The third condition — a.ends_with(&format!(".{}", host)) — treats an allowed entry as covering a detected host whenever the allowed entry is a subdomain of the detected host. So if the network profile lists api.github.com, a workflow that talks to bare github.com is silently accepted, even though authorization for api.github.com clearly does not authorize the parent. This contradicts the documented intent ("github.com covers api.github.com") and is asymmetric in the wrong direction for an allowlist.

🔒 Proposed fix
 fn endpoint_covered(host: &str, allowed: &BTreeSet<String>) -> bool {
-    // Exact match, or `host` is a subdomain of an allowed entry.
+    // Exact match, or `host` is a subdomain of an allowed entry.
+    // Note: an allowed *child* must NOT cover a detected *parent*; coverage
+    // only flows from broader (allowed) to narrower (detected).
     if allowed.contains(host) {
         return true;
     }
     allowed
         .iter()
-        .any(|a| {
-            host == a || host.ends_with(&format!(".{}", a)) || a.ends_with(&format!(".{}", host))
-        })
+        .any(|a| host == a || host.ends_with(&format!(".{}", a)))
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@xtask/src/workflow_checks.rs` around lines 617 - 625, The function
endpoint_covered currently allows an allowed entry to be a subdomain of the
detected host (via the condition a.ends_with(&format!(".{}", host))), which
reverses the intended policy; change endpoint_covered to only return true when
allowed contains host or when host is a subdomain of an allowed entry (i.e.,
keep allowed.contains(host) and the host.ends_with(&format!(".{}", a)) check)
and remove the a.ends_with(...) clause so allowed child subdomains cannot
incorrectly cover parent domains.

@EffortlessSteven EffortlessSteven merged commit fa74193 into main May 12, 2026
26 checks passed
@EffortlessSteven EffortlessSteven deleted the feat/xtask-workflow-process-network-checks-20260511 branch May 12, 2026 00:39
EffortlessSteven added a commit that referenced this pull request May 12, 2026
Ninth PR in the 12-PR file-policy rollout. Adds `cargo xtask
policy-report` which runs every advisory check and aggregates the
results into target/policy/policy-report.{md,json}.

## Behavior

policy-report invokes each of the seven check functions in advisory
mode:

  check_file_policy::check
  checks::check_generated
  checks::check_executable_files
  checks::check_dependency_surfaces
  workflow_checks::check_workflow_surfaces
  workflow_checks::check_process_policy
  workflow_checks::check_network_policy

Each sub-check writes its own target/policy/*-report.json (as in
previous PRs). policy-report then reads back the seven JSON artifacts,
lifts each one's `summary` block, picks a headline metric per area
(unreceipted | invalid_policy_refs | unknown_total | missing_fields |
expired | stale | unused — first non-zero wins; falls back to a
universe-size-keyed "clean" row), and emits a unified MD + JSON.

## Decisions

- **Advisory only.** No --mode flag. Promotion to blocking is the job
  of PRs 10/11/12; they tighten CI's invocation of the individual
  checks. Adding modes here would duplicate state.

- **Re-run vs read-only.** policy-report always re-runs the sub-checks
  so the aggregated report reflects current state, not stale artifacts
  from a prior session. Cost is small — each sub-check is a few
  hundred ms.

- **Untyped JSON re-read for aggregation.** Each sub-report has a
  different schema. Lifting the `summary` block as `serde_json::Value`
  is simpler and more robust than maintaining seven typed deserializers
  in xtask. The headline metric is computed from the same Value.

## Acceptance

- cargo check --workspace --locked passes.
- cargo clippy -p xtask --all-targets --locked -- -D warnings clean.
- cargo fmt --all -- --check clean.
- `cargo xtask policy-report` ran on this branch:
    wrote unified policy-report (7 areas, 7 headline rows)
- target/policy/policy-report.md and target/policy/policy-report.json
  both produced.

Headline on current main:

  | Non-Rust file policy | unreceipted    | 945 |
  | Generated files      | _clean_        | 0 (universe) |
  | Executable files     | _clean_        | 0 (universe) |
  | Dependency surfaces  | _clean_        | 16 (universe) |
  | Workflow surfaces    | _clean_        | 9 (universe) |
  | Process policy       | unknown_total  | 10 |
  | Network policy       | _clean_        | 9 (universe) |

The two non-clean rows match the documented expected state: 945
unreceipted files because PR 2's seeded receipts intentionally cover
only a slice of the workspace, and 10 false-positive process tokens
because the PR 8 detector is grep-style (see #220 PR body OPEN
question).

## Out of scope

- PR 10 (#209) — wire the advisory checks into CI as a job and upload
  target/policy/ as an artifact.
- PR 11 (#210) — promote file/generated/executable/dependency/workflow
  checks to blocking.
- PR 12 (#211) — promote process/network checks to blocking.

Closes #208.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(policy): check workflow, process, and network surfaces (rollout PR 8/12)

1 participant