feat(policy): propose non-Rust allowlist entries (#205, rollout PR 6/12)#218
Conversation
Sixth PR in the 12-PR file-policy rollout. Adds
`cargo xtask non-rust propose`, which finds tracked non-Rust files
that are not yet covered by `policy/non-rust-allowlist.toml` and
writes draft entries to `target/policy/`:
target/policy/non-rust-proposed-allowlist.toml
target/policy/non-rust-proposal.md
The proposer NEVER mutates `policy/non-rust-allowlist.toml`. The
draft TOML is a starting point; a human (or follow-up agent) reviews,
edits, and intentionally copies entries into the real ledger.
## Behavior
- Walks `git ls-files -z` minus `*.rs` for the tracked non-Rust set.
- Loads `policy/non-rust-allowlist.toml` and the `[[file]]` / `[[glob]]`
selectors.
- Subtracts the covered set; the remainder is the unreceipted list.
- Writes one `[[file]]` entry per unreceipted path with TODO fields:
kind = "TODO"
surface = "TODO"
classification = "TODO"
owner = "TODO"
reason = "TODO: explain why this non-Rust surface remains. If
scheduled for conversion to Rust/xtask, say so and add
an `expires` date."
created = "<today>"
review_after = "<today + 90 days>"
- Writes a Markdown proposal that groups entries by top-level directory
and explains the review workflow.
## Decisions
- **Default review window: 90 days.** Honest enough for a first-pass
receipt; a reviewer can shorten or extend per entry. Matches what the
hand-seeded receipts in #214 used.
- **TOML rendered by hand, not via `toml::to_string`.** Keeps comment
blocks at the top and the order deterministic. The proposed-allowlist
schema is documented inline so reviewers see the explanation in the
file itself.
- **Duplicated allowlist deserialization + tracked-file enumeration
from check_file_policy.rs.** The shared helpers fit in ~30 lines
each; a DRY refactor is a small follow-up PR rather than scope creep
here.
## Acceptance
- `cargo check --workspace --locked` passes.
- `cargo clippy -p xtask --all-targets --locked -- -D warnings` clean.
- `cargo fmt --all -- --check` clean.
- `cargo xtask non-rust propose` ran on this branch:
`wrote 945 proposed entries to target/policy/non-rust-proposed-
allowlist.toml (and matching non-rust-proposal.md)`
- Both artifacts produced; the TOML parses as valid TOML.
- `policy/non-rust-allowlist.toml` is unchanged (proposer never
touches the real ledger).
Closes #205.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary by CodeRabbit
WalkthroughThis PR introduces ChangesNon-Rust Allowlist Proposal
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Droid finished @EffortlessSteven's task —— View job Droid validation complete for PR #218. Review summary: Findings:
Validation signal: |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@xtask/src/propose.rs`:
- Around line 207-213: The loop that generates [[file]] entries currently writes
kind = "TODO" for every path; change it to compute a kind by inspecting the file
extension (e.g., call or implement an extension heuristic helper like
infer_kind_from_extension(path) or match on Path::extension()) and write that
value instead of the hardcoded "TODO"; update the block around the for path in
unreceipted loop (the out.push_str calls that write kind = ...) to use the
computed kind string (escaped/quoted like the other fields) so generated entries
are prefilled by extension-based heuristics.
- Around line 98-101: Change the allowlist loading to validate and compile glob
patterns once in load_allowlist instead of storing raw strings: in the loop that
currently pushes Selector::Pattern(p) (the code around load_allowlist/for raw in
doc.glob), call Glob::new(&p).with_context(|| format!("invalid glob pattern in
allowlist: {}", p))? and push a Selector::Pattern with the resulting GlobMatcher
(adjust the Selector::Pattern variant to hold a GlobMatcher instead of String).
Update any users of Selector::Pattern (e.g., any_match()) to use the precompiled
matcher rather than re-compiling, so invalid patterns surface as errors at load
time rather than being treated as no-match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c78a0828-3dd1-414f-8c0e-4bcc71486f09
📒 Files selected for processing (2)
xtask/src/main.rsxtask/src/propose.rs
| for raw in doc.glob { | ||
| if let Some(p) = raw.pattern { | ||
| entries.push(Selector::Pattern(p)); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the Rust globsetcrate, doesGlob::new(...)return an error for malformed patterns, and wouldGlob::new(pat).map(...).unwrap_or(false) effectively suppress that error and treat it as a non-match?
💡 Result:
Yes. In globset, Glob::new(&str) returns a Result, so it can fail when parsing a glob pattern (e.g., malformed patterns). The docs show Glob::new returns Result<Glob, Error> [1]. The source confirms Glob::new(glob: &str) -> Result<Glob, Error> and delegates to GlobBuilder::build [2]. Given that, Glob::new(pat).map(...).unwrap_or(false) does suppress the error, because unwrap_or(false) converts an Err into the provided default value false. In other words, a malformed pattern causes the Result to be Err, and the expression evaluates to false rather than propagating/panicking. Note: this only covers errors from Glob::new(pat). If you later do compile_matcher or matching, those are separate operations (and compile_matcher is not fallible per the API) [1].
Citations:
- 1: https://docs.rs/globset/latest/globset/struct.Glob.html
- 2: https://docs.rs/globset/0.4.6/src/globset/glob.rs.html
🏁 Script executed:
# First, let's locate and examine the file
find . -name "propose.rs" -type fRepository: EffortlessMetrics/shipper
Length of output: 92
🏁 Script executed:
# Read the propose.rs file to see the code structure
cat -n ./xtask/src/propose.rs | head -150Repository: EffortlessMetrics/shipper
Length of output: 5999
🏁 Script executed:
# Check for all uses of the Selector enum
rg "Selector::" ./xtask/src/propose.rs -nRepository: EffortlessMetrics/shipper
Length of output: 266
🏁 Script executed:
# Check what's imported from globset
rg "use globset" ./xtask/src/propose.rs -nRepository: EffortlessMetrics/shipper
Length of output: 91
Compile glob selectors at load time; fail early on invalid patterns instead of silently treating them as non-matches.
Malformed glob patterns in the allowlist are currently stored as strings and re-compiled on every any_match() call via Glob::new(pat).map(...).unwrap_or(false). This suppresses compilation errors, silently treating invalid patterns as no-match, which can incorrectly inflate the unreceipted set.
Validate and compile globs once in load_allowlist (lines 98–102), storing GlobMatcher instead of String in the Selector::Pattern variant. Add error context via .with_context() to surface which pattern is malformed.
Suggested fix
-use globset::Glob;
+use globset::{Glob, GlobMatcher};
@@
enum Selector {
Path(String),
- Pattern(String),
+ Pattern(GlobMatcher),
}
@@
for raw in doc.glob {
if let Some(p) = raw.pattern {
- entries.push(Selector::Pattern(p));
+ let matcher = Glob::new(&p)
+ .with_context(|| format!("invalid glob pattern in {}: {}", path.display(), p))?
+ .compile_matcher();
+ entries.push(Selector::Pattern(matcher));
}
}
@@
fn any_match(entries: &[Selector], path: &str) -> bool {
entries.iter().any(|sel| match sel {
Selector::Path(p) => p == path,
- Selector::Pattern(pat) => Glob::new(pat)
- .map(|g| g.compile_matcher().is_match(path))
- .unwrap_or(false),
+ Selector::Pattern(matcher) => matcher.is_match(path),
})
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@xtask/src/propose.rs` around lines 98 - 101, Change the allowlist loading to
validate and compile glob patterns once in load_allowlist instead of storing raw
strings: in the loop that currently pushes Selector::Pattern(p) (the code around
load_allowlist/for raw in doc.glob), call Glob::new(&p).with_context(||
format!("invalid glob pattern in allowlist: {}", p))? and push a
Selector::Pattern with the resulting GlobMatcher (adjust the Selector::Pattern
variant to hold a GlobMatcher instead of String). Update any users of
Selector::Pattern (e.g., any_match()) to use the precompiled matcher rather than
re-compiling, so invalid patterns surface as errors at load time rather than
being treated as no-match.
| for path in unreceipted { | ||
| out.push_str("[[file]]\n"); | ||
| out.push_str(&format!("path = \"{}\"\n", toml_escape(path))); | ||
| out.push_str("kind = \"TODO\"\n"); | ||
| out.push_str("surface = \"TODO\"\n"); | ||
| out.push_str("classification = \"TODO\"\n"); | ||
| out.push_str("owner = \"TODO\"\n"); |
There was a problem hiding this comment.
Populate kind using extension heuristics in generated entries.
Each proposed [[file]] entry currently hardcodes kind = "TODO", which misses the proposal objective to prefill kind from file-extension heuristics and leaves most of the triage burden manual.
Suggested direction
fn render_toml(unreceipted: &[String], today: &str, review_after: &str) -> String {
@@
for path in unreceipted {
+ let kind = infer_kind(path);
out.push_str("[[file]]\n");
out.push_str(&format!("path = \"{}\"\n", toml_escape(path)));
- out.push_str("kind = \"TODO\"\n");
+ out.push_str(&format!("kind = \"{}\"\n", kind));
out.push_str("surface = \"TODO\"\n");
@@
}
+
+fn infer_kind(path: &str) -> &'static str {
+ let ext = Path::new(path).extension().and_then(|s| s.to_str());
+ match ext {
+ Some("md" | "rst" | "txt") => "documentation",
+ Some("toml" | "yaml" | "yml" | "json") => "configuration",
+ Some("sh" | "ps1" | "bat") => "script",
+ _ => "TODO",
+ }
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for path in unreceipted { | |
| out.push_str("[[file]]\n"); | |
| out.push_str(&format!("path = \"{}\"\n", toml_escape(path))); | |
| out.push_str("kind = \"TODO\"\n"); | |
| out.push_str("surface = \"TODO\"\n"); | |
| out.push_str("classification = \"TODO\"\n"); | |
| out.push_str("owner = \"TODO\"\n"); | |
| for path in unreceipted { | |
| let kind = infer_kind(path); | |
| out.push_str("[[file]]\n"); | |
| out.push_str(&format!("path = \"{}\"\n", toml_escape(path))); | |
| out.push_str(&format!("kind = \"{}\"\n", kind)); | |
| out.push_str("surface = \"TODO\"\n"); | |
| out.push_str("classification = \"TODO\"\n"); | |
| out.push_str("owner = \"TODO\"\n"); | |
| } | |
| fn infer_kind(path: &str) -> &'static str { | |
| let ext = Path::new(path).extension().and_then(|s| s.to_str()); | |
| match ext { | |
| Some("md" | "rst" | "txt") => "documentation", | |
| Some("toml" | "yaml" | "yml" | "json") => "configuration", | |
| Some("sh" | "ps1" | "bat") => "script", | |
| _ => "TODO", | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@xtask/src/propose.rs` around lines 207 - 213, The loop that generates
[[file]] entries currently writes kind = "TODO" for every path; change it to
compute a kind by inspecting the file extension (e.g., call or implement an
extension heuristic helper like infer_kind_from_extension(path) or match on
Path::extension()) and write that value instead of the hardcoded "TODO"; update
the block around the for path in unreceipted loop (the out.push_str calls that
write kind = ...) to use the computed kind string (escaped/quoted like the other
fields) so generated entries are prefilled by extension-based heuristics.
| out.push_str(&format!( | ||
| "Generated by `cargo xtask non-rust propose` on {}.\n\n", | ||
| today | ||
| )); | ||
| out.push_str("## How to use\n\n"); | ||
| out.push_str(&format!( |
There was a problem hiding this comment.
[P1] Malformed Markdown link — {PROPOSED_TOML} is a literal string, not a variable interpolation
The format argument passed to push_str uses {PROPOSED_TOML} inside the string literal, but push_str does not perform format! interpolation. This means the generated non-rust-proposal.md will contain the literal text {PROPOSED_TOML} instead of non-rust-proposed-allowlist.toml, producing a broken Markdown reference like [non-rust-proposed-allowlist.toml](./{PROPOSED_TOML}). This is a one-line fix: either remove the braces entirely (since the string is a literal, not a format string) or change push_str to push_str(&format!(...)) with proper {} interpolation.
| out.push_str(&format!( | |
| "Generated by `cargo xtask non-rust propose` on {}.\n\n", | |
| today | |
| )); | |
| out.push_str("## How to use\n\n"); | |
| out.push_str(&format!( | |
| out.push_str("1. Review each draft entry in [`non-rust-proposed-allowlist.toml`](./non-rust-proposed-allowlist.toml).\n"); |
Summary
Sixth PR in the 12-PR file-policy rollout. Adds
cargo xtask non-rust propose— finds tracked non-Rust files not yet receipted inpolicy/non-rust-allowlist.tomland writes draft entries totarget/policy/. Never mutates the real ledger.Issue
Closes #205. Depends on #202 (ledger, merged), #212 (xtask scaffold, merged), #204 (file-policy checker, merged). Refines #180. Tracks #109.
Behavior
git ls-files -zminus*.rs(same source as inventory + check-file-policy).policy/non-rust-allowlist.toml's[[file]]and[[glob]]selectors.[[file]]entry with TODO fields andcreated = today,review_after = today + 90d.target/policy/non-rust-proposed-allowlist.toml— draft TOML.target/policy/non-rust-proposal.md— review workflow + entries grouped by top-level directory.Decisions
review_afterwindow: 90 days. Matches the hand-seeded receipts in chore(policy): add non-Rust policy allowlist ledgers (#202, rollout PR 2/12) #214. Reviewer can adjust per entry.check_file_policy.rs. Each helper is ~30 lines. A DRY refactor is a small follow-up rather than scope creep here.Live result on this branch
945 matches the unreceipted count
check-file-policy --mode advisorysurfaced in #217. The proposal is broken down by top-level dir for scannability —.githubandcratesdominate.Acceptance
cargo check --workspace --lockedpasses.cargo clippy -p xtask --all-targets --locked -- -D warningsclean.cargo fmt --all -- --checkclean.cargo xtask non-rust proposeproduces both artifacts.policy/non-rust-allowlist.tomlis not touched bygit statusafter the run.Follow-ups
check-generated,check-executable-files,check-dependency-surfaces.check-workflow-surfaces,check-process-policy,check-network-policy.policy-reportaggregates inventory + all checkers + this proposal status.workspace_root/enumerate_non_rusthelpers into acommon.rsmodule after this PR lands.