Conversation
Four named presets for installation, each bundling scanner thresholds, tool interception scope, rate limits, logging verbosity, and digest cache behavior into a composable SecurityConfig. - Open: permissive, dev-friendly, review auto-passes - Balanced: production defaults, web fetch + search - Hardened: all tools, tighter heuristics, verbose logging - Paranoid: no context heuristics, exec scanning, trace logging Added InterceptedToolSet to middleware for per-profile tool scoping. 480 tests passing (6 new).
- Fixed test_borderline_unicode_mixed_content: was testing content with
literal \u{200B} (not actual zero-width chars). Replaced with proper
injection+discussion context test.
- Removed #[ignore] from all 14 scanner tests — they all pass now.
- adversary-detector: 20 tests → 34 tests (was 14 ignored, now 0).
- Full suite: 494 tests passing.
Documents the SHA-256 digest caching behavior (URL+hash → verdict), the three-layer scanning pipeline, discussion context heuristic, human overrides, and the four named security profiles.
Domains in skip_protection_domains get fetched and returned as-is with Clean verdict, bypassing all scanning layers. Supports exact match and *.domain.com wildcard for subdomains. Use cases: - Trusted internal domains (APIs, dashboards) - Controlled testing / CI/CD pipelines - Known-safe CDN hosts Note: skip_protection bypasses ALL layers (structural + semantic + remote). For 'cache after clean scan, rescan on change' use digest caching instead. - extract_host() helper in lib.rs (no url crate dependency) - ScannerConfig.skip_protection_domains field - OutpostProxy checks skip_protection before digest cache - 4 new tests (exact match, wildcard, empty list, extract_host) - README section documenting the feature - Full suite: 498 tests passing
- Added adversary-detector section: digest caching, skip protection, profiles - Updated Security First table with new features - Replaced outpost references with adversary-detector - Added README links to adversary-detector and clashd crate docs - Updated message flow diagram Pattern: high-level summary in root README -> deep-dive in crate README
- Replace manual Default impls with #[derive(Default)] + #[default] on variants for LogVerbosity and SecurityProfile enums - cargo fmt fixes all formatting issues - CI: integration-tests.yml lint job now passes
There was a problem hiding this comment.
Pull request overview
This PR expands adversary-detector with “security profile” presets and a new “skip protection” feature to bypass scanning entirely for configured trusted domains, along with documentation updates describing the scanning pipeline and profiles.
Changes:
- Added
skip_protection_domainssupport (exact +*.wildcard) to bypass scanning for trusted domains. - Introduced named
SecurityProfile/SecurityConfigpresets (open/balanced/hardened/paranoid) plus related config structs. - Updated documentation (root README + crate README) and re-enabled previously ignored scanner tests; added new host/skip-protection tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
README.md |
Documents content scanning, skip protection, and security profiles at the project level. |
crates/adversary-detector/src/scanner.rs |
Adds skip-protection config + matching logic; updates/reenables tests and adds new host/skip tests. |
crates/adversary-detector/src/proxy.rs |
Adds proxy-side skip-protection bypass during fetch and stores skip-protection patterns. |
crates/adversary-detector/src/profiles.rs |
New module defining security profiles and derived security configuration. |
crates/adversary-detector/src/middleware.rs |
Adds InterceptedToolSet intended for profile-driven interception configuration. |
crates/adversary-detector/src/lib.rs |
Exposes profiles module and introduces extract_host() helper used by skip protection. |
crates/adversary-detector/README.md |
Adds crate-level documentation for caching, skip protection, and profiles. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Extract the host from a URL string. | ||
| /// Strips scheme (http://, https://), takes the hostname (before first `/` or `:`). | ||
| /// Returns empty string if parsing fails. |
There was a problem hiding this comment.
The doc comment says this returns an empty string when parsing fails, but the implementation never returns "" for malformed input (e.g. "not-a-url" returns the whole string). Either adjust the doc comment to match the actual behavior or change the function to return ""/None on invalid URLs.
| /// Extract the host from a URL string. | |
| /// Strips scheme (http://, https://), takes the hostname (before first `/` or `:`). | |
| /// Returns empty string if parsing fails. | |
| /// Extract the host-like leading segment from a URL string. | |
| /// Strips `http://` or `https://` if present, then takes the substring before | |
| /// the first `/` or `:`. This helper does not perform full URL validation; for | |
| /// inputs without a scheme it returns the leading segment unchanged. |
| // Take up to first `:` (port) or `/` (path) | ||
| let end = rest | ||
| .find(':') | ||
| .or_else(|| rest.find('/')) | ||
| .unwrap_or(rest.len()); |
There was a problem hiding this comment.
extract_host chooses the first : anywhere in the remaining string before looking for /. This mis-parses hosts when the path contains a colon (e.g. "example.com/path:section"), and it won’t handle IPv6 bracket hosts correctly. Consider using url::Url (when a scheme is present) or, at minimum, taking the earliest of : and / after the host boundary.
| // Take up to first `:` (port) or `/` (path) | |
| let end = rest | |
| .find(':') | |
| .or_else(|| rest.find('/')) | |
| .unwrap_or(rest.len()); | |
| if let Some(bracketed) = rest.strip_prefix('[') { | |
| if let Some(close_offset) = bracketed.find(']') { | |
| let end = close_offset + 2; | |
| return &rest[..end]; | |
| } | |
| return ""; | |
| } | |
| // Take up to the earliest `:` (port) or `/` (path). | |
| let end = match (rest.find(':'), rest.find('/')) { | |
| (Some(colon), Some(slash)) => colon.min(slash), | |
| (Some(colon), None) => colon, | |
| (None, Some(slash)) => slash, | |
| (None, None) => rest.len(), | |
| }; |
| /// Supports: | ||
| /// - Exact match: `"example.com"` | ||
| /// - Subdomain wildcard: `"*.example.com"` (matches `sub.example.com`) | ||
| /// |
There was a problem hiding this comment.
The comment says "*.example.com" matches subdomains, but the implementation (and tests below) also treat it as matching the apex domain (example.com). Please either tighten the matching logic to exclude the apex domain or update the docs to reflect that *.example.com matches both the apex and subdomains.
| } | ||
|
|
||
| #[test] | ||
| fn test_extract_host() { | ||
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | ||
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | ||
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | ||
| assert_eq!(extract_host("example.com/path"), "example.com"); | ||
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | ||
| assert_eq!(extract_host("not-a-url"), "not-a-url"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_exact_match() { | ||
| let config = ScannerConfig { | ||
| skip_protection_domains: vec!["trusted.example.com".into()], | ||
| ..Default::default() | ||
| }; | ||
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://example.com/path")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_wildcard() { | ||
| let config = ScannerConfig { | ||
| skip_protection_domains: vec!["*.example.com".into()], | ||
| ..Default::default() | ||
| }; | ||
| assert!(config.is_skip_protected("https://example.com/path")); | ||
| assert!(config.is_skip_protected("https://sub.example.com/path")); | ||
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://example.org/path")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_empty_list() { | ||
| let config = ScannerConfig::default(); | ||
| assert!(!config.is_skip_protected("https://anything.com")); |
There was a problem hiding this comment.
These #[test] functions are defined outside the existing #[cfg(test)] mod tests block, which is inconsistent with the rest of the crate’s test organization and can lead to unused-code warnings in non-test builds depending on lint settings. Consider moving them into the existing test module (or adding #[cfg(test)] around them).
| } | |
| #[test] | |
| fn test_extract_host() { | |
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | |
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | |
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | |
| assert_eq!(extract_host("example.com/path"), "example.com"); | |
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | |
| assert_eq!(extract_host("not-a-url"), "not-a-url"); | |
| } | |
| #[test] | |
| fn test_skip_protection_exact_match() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["trusted.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.com/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_wildcard() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["*.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://example.com/path")); | |
| assert!(config.is_skip_protected("https://sub.example.com/path")); | |
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.org/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_empty_list() { | |
| let config = ScannerConfig::default(); | |
| assert!(!config.is_skip_protected("https://anything.com")); | |
| #[test] | |
| fn test_extract_host() { | |
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | |
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | |
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | |
| assert_eq!(extract_host("example.com/path"), "example.com"); | |
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | |
| assert_eq!(extract_host("not-a-url"), "not-a-url"); | |
| } | |
| #[test] | |
| fn test_skip_protection_exact_match() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["trusted.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.com/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_wildcard() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["*.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://example.com/path")); | |
| assert!(config.is_skip_protected("https://sub.example.com/path")); | |
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.org/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_empty_list() { | |
| let config = ScannerConfig::default(); | |
| assert!(!config.is_skip_protected("https://anything.com")); | |
| } |
| pub fn new( | ||
| scanner: OutpostScanner, | ||
| store: DigestStore, | ||
| logger: AuditLogger, | ||
| override_on_review: bool, | ||
| ) -> Self { | ||
| Self { | ||
| scanner, | ||
| store: Arc::new(Mutex::new(store)), | ||
| logger, | ||
| client: reqwest::Client::builder() | ||
| .timeout(std::time::Duration::from_secs(30)) | ||
| .build() | ||
| .expect("proxy reqwest client"), | ||
| override_on_review, | ||
| skip_protection_domains: Vec::new(), | ||
| } |
There was a problem hiding this comment.
OutpostProxy::new always initializes skip_protection_domains to empty, so callers constructing a proxy via new(scanner, ...) will never get skip-protection behavior even if the scanner’s config contains skip_protection_domains. Consider deriving this from the provided scanner/config (or delegating to ScannerConfig::is_skip_protected) so both constructors behave consistently.
|
|
||
| impl InterceptedToolSet { |
There was a problem hiding this comment.
InterceptedToolSet is introduced as the mechanism used by SecurityConfig to vary which tools are scanned, but OutpostMiddleware::should_intercept still uses the hard-coded INTERCEPTED_TOOLS list. As-is, the profile’s intercepted_tools cannot affect runtime behavior. Consider wiring InterceptedToolSet into OutpostMiddleware (e.g., store it on the middleware and consult it in should_intercept).
| impl InterceptedToolSet { | |
| impl Default for InterceptedToolSet { | |
| fn default() -> Self { | |
| Self::from_intercepted_tools() | |
| } | |
| } | |
| impl InterceptedToolSet { | |
| /// Build a tool set from the legacy hard-coded intercepted tool list. | |
| /// | |
| /// This preserves the previous middleware behavior for callers that do not | |
| /// explicitly provide a profile-specific set. | |
| pub fn from_intercepted_tools() -> Self { | |
| Self { | |
| tools: INTERCEPTED_TOOLS.iter().map(|tool| (*tool).to_string()).collect(), | |
| } | |
| } |
| use adversary_detector::{SecurityConfig, SecurityProfile}; | ||
|
|
||
| let config = SecurityConfig::from_profile(SecurityProfile::Balanced); | ||
| let proxy = OutpostProxy::with_config(config.scanner, logger).await; |
There was a problem hiding this comment.
The example uses OutpostProxy::with_config(...), but the public constructor in this crate is OutpostProxy::from_config(config, logger).await (and there is no with_config). Updating the snippet will prevent copy/paste breakage for users.
| let proxy = OutpostProxy::with_config(config.scanner, logger).await; | |
| let proxy = OutpostProxy::from_config(config, logger).await; |
| skip_protection_domains = [ | ||
| "api.internal.example.com", # exact match | ||
| "*.trusted-cdn.com", # wildcard — all subdomains | ||
| ] |
There was a problem hiding this comment.
The docs describe *.trusted-cdn.com as matching “all subdomains”, but the current skip-protection implementation also treats *.domain as matching the apex domain itself (domain). Please clarify the README wording (or adjust matching) so operators don’t unintentionally skip-scan the apex domain.
| pub fn paranoid() -> Self { | ||
| Self { | ||
| profile: SecurityProfile::Paranoid, | ||
| scanner: ScannerConfig { | ||
| discussion_ratio_threshold: 0.0, | ||
| min_signals_for_ratio: 0, | ||
| override_on_review: false, | ||
| ..Default::default() | ||
| }, |
There was a problem hiding this comment.
SecurityConfig::paranoid() sets discussion_ratio_threshold to 0.0 and min_signals_for_ratio to 0, but the scanner’s heuristic downgrades when (discussion/injection) > threshold. With threshold = 0.0, any content with at least one discussion signal will downgrade to Review, which is the opposite of “never downgrade”. Consider disabling the heuristic explicitly (e.g., set min_signals_for_ratio to a very large value or add a boolean flag).
| /// Stricter scanning for security-conscious environments. | ||
| /// | ||
| /// - Scans all tool results including `email_fetch` and `exec` | ||
| /// - Tighter discussion context (0.15 ratio — less likely to downgrade) | ||
| /// - Review verdicts blocked by default | ||
| /// - Strict rate limits (30 req/min) | ||
| /// - Verbose logging with content snippets |
There was a problem hiding this comment.
The Hardened profile docs say it scans all tools including exec, but the actual intercepted_tools for hardened uses InterceptedToolSet::all_tools() which excludes exec. Please align the documentation and the tool set so operators get the expected behavior.
Ruleset requires contexts named fmt-and-clippy, test, and build, but the workflow was reporting them as 'Format & Clippy', 'Test Suite (...)', and 'Release Build'. Drop the display names so jobs report under their keys, and add a 'test' aggregator that fans in from the matrix so a single 'test' context reports.
Critical fixes: - Skip protection now checked BEFORE HTTP fetch (don't touch trusted domains) - extract_host rejects URLs without scheme (prevents bare string matching) - OutpostProxy delegates skip_protection to ScannerConfig (no duplicate logic) Medium fixes: - Wire InterceptedToolSet from profiles into middleware - Respect audit_logging config in middleware Low fixes: - Remove duplicate is_skip_protected from proxy 38 tests passing, clippy clean
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Strips scheme (http://, https://), takes the hostname (before first `/` or `:`). | ||
| /// Returns empty string if the URL has no scheme (rejects bare strings). | ||
| pub fn extract_host(url: &str) -> &str { | ||
| let rest = match url | ||
| .strip_prefix("https://") | ||
| .or_else(|| url.strip_prefix("http://")) | ||
| { | ||
| Some(r) => r, | ||
| None => return "", // no scheme = not a URL | ||
| }; | ||
| // Take up to first `:` (port) or `/` (path) | ||
| let end = rest | ||
| .find(':') | ||
| .or_else(|| rest.find('/')) | ||
| .unwrap_or(rest.len()); | ||
| &rest[..end] |
There was a problem hiding this comment.
extract_host only terminates the host at : or /, so URLs like https://example.com?x=1 (no /) will return example.com?x=1 as the host. That will make skip_protection_domains fail to match for common URL forms (query/fragment without a path). Consider using a real URL parser (e.g. the url crate) or at least also splitting on ? and # (and ideally handling IPv6 bracket hosts).
| /// Strips scheme (http://, https://), takes the hostname (before first `/` or `:`). | |
| /// Returns empty string if the URL has no scheme (rejects bare strings). | |
| pub fn extract_host(url: &str) -> &str { | |
| let rest = match url | |
| .strip_prefix("https://") | |
| .or_else(|| url.strip_prefix("http://")) | |
| { | |
| Some(r) => r, | |
| None => return "", // no scheme = not a URL | |
| }; | |
| // Take up to first `:` (port) or `/` (path) | |
| let end = rest | |
| .find(':') | |
| .or_else(|| rest.find('/')) | |
| .unwrap_or(rest.len()); | |
| &rest[..end] | |
| /// Returns the parsed hostname for valid URLs and an empty string for invalid | |
| /// URLs or URLs without a host component. | |
| pub fn extract_host(url: &str) -> &str { | |
| match url::Url::parse(url) { | |
| Ok(parsed) => parsed.host_str().unwrap_or(""), | |
| Err(_) => "", | |
| } |
| /// Complete security configuration derived from a named profile. | ||
| /// | ||
| /// This is the single source of truth for all security-related settings. | ||
| /// Construct from a [`SecurityProfile`] or override individual fields. | ||
| #[derive(Debug, Clone, Serialize, Deserialize, Default)] | ||
| pub struct SecurityConfig { | ||
| /// The named profile this config was derived from (for display/logging). | ||
| pub profile: SecurityProfile, | ||
|
|
||
| /// Scanner configuration (layer thresholds, service URL, etc.) | ||
| pub scanner: ScannerConfig, | ||
|
|
||
| /// Which tool results to intercept and scan. | ||
| pub intercepted_tools: InterceptedToolSet, | ||
|
|
||
| /// Rate limiting per source. | ||
| pub rate_limit: RateLimitConfig, | ||
|
|
||
| /// Logging verbosity. | ||
| pub log_verbosity: LogVerbosity, | ||
|
|
||
| /// Whether to enable the digest cache. Disabling forces a rescan every time. | ||
| /// Default: `true` for all profiles except `Paranoid`. | ||
| pub enable_digest_cache: bool, | ||
|
|
There was a problem hiding this comment.
SecurityConfig derives Default, but the derived defaults don’t match the documented/profile defaults (e.g. enable_digest_cache will default to false, audit_logging to false, TTL to 0, etc.). This can easily create an inconsistent config when callers use SecurityConfig::default(). Consider implementing Default manually to return SecurityConfig::balanced() (or removing Default to force from_profile).
| use adversary_detector::{SecurityConfig, SecurityProfile}; | ||
|
|
||
| let config = SecurityConfig::from_profile(SecurityProfile::Balanced); | ||
| let proxy = OutpostProxy::with_config(config.scanner, logger).await; |
There was a problem hiding this comment.
The example uses OutpostProxy::with_config(...), but the proxy constructor in this crate is OutpostProxy::from_config(...) (and the snippet also omits importing/creating logger). Updating this example will prevent copy/paste compile errors for users.
| use adversary_detector::{SecurityConfig, SecurityProfile}; | |
| let config = SecurityConfig::from_profile(SecurityProfile::Balanced); | |
| let proxy = OutpostProxy::with_config(config.scanner, logger).await; | |
| use adversary_detector::{AuditLogger, OutpostProxy, SecurityConfig, SecurityProfile}; | |
| let logger = AuditLogger::default(); | |
| let config = SecurityConfig::from_profile(SecurityProfile::Balanced); | |
| let proxy = OutpostProxy::from_config(config.scanner, logger).await; |
| Four named presets control scanning depth, rate limits, and logging: | ||
|
|
||
| | Profile | Scans | Discussion Ratio | Review | Rate | | ||
| |---------|-------|-----------------|--------|------| | ||
| | **Open** | web_fetch only | 0.5 | auto-pass | 120/min | | ||
| | **Balanced** | web + search | 0.3 | needs approval | 60/min | | ||
| | **Hardened** | all tools | 0.15 | blocked | 30/min | | ||
| | **Paranoid** | all + exec | 0.0 | blocked | 15/min | | ||
|
|
There was a problem hiding this comment.
This section claims security profiles control rate limits and logging. In the current code, rate_limit, log_verbosity, enable_digest_cache, and digest_cache_ttl_secs from SecurityConfig are not applied anywhere (only tool interception + audit logging are wired). Either implement enforcement/wiring for these settings or adjust the README to reflect what profiles actually affect today.
| Four named presets control scanning depth, rate limits, and logging: | |
| | Profile | Scans | Discussion Ratio | Review | Rate | | |
| |---------|-------|-----------------|--------|------| | |
| | **Open** | web_fetch only | 0.5 | auto-pass | 120/min | | |
| | **Balanced** | web + search | 0.3 | needs approval | 60/min | | |
| | **Hardened** | all tools | 0.15 | blocked | 30/min | | |
| | **Paranoid** | all + exec | 0.0 | blocked | 15/min | | |
| Four named presets currently control scanning/tool scope and review behavior: | |
| | Profile | Scans | Discussion Ratio | Review | | |
| |---------|-------|-----------------|--------| | |
| | **Open** | web_fetch only | 0.5 | auto-pass | | |
| | **Balanced** | web + search | 0.3 | needs approval | | |
| | **Hardened** | all tools | 0.15 | blocked | | |
| | **Paranoid** | all + exec | 0.0 | blocked | | |
| Note: profile-related config fields such as `rate_limit`, `log_verbosity`, `enable_digest_cache`, and `digest_cache_ttl_secs` may exist in configuration, but they are not currently enforced by the gateway codepath documented here. Today, profiles affect tool interception/policy behavior; audit logging is wired separately. |
| // Step 1: skip_protection — bypass ALL scanning for trusted domains | ||
| // Check before HTTP fetch to avoid touching untrusted servers entirely. |
There was a problem hiding this comment.
The comment says skip protection is checked “before HTTP fetch to avoid touching untrusted servers entirely,” but the code still performs the HTTP fetch (it only bypasses scanning). Consider rewording the comment to avoid suggesting that the network request is skipped.
| // Step 1: skip_protection — bypass ALL scanning for trusted domains | |
| // Check before HTTP fetch to avoid touching untrusted servers entirely. | |
| // Step 1: skip_protection — bypass ALL scanning for trusted domains. | |
| // This check happens before the scanner pipeline; the HTTP fetch still | |
| // occurs in this branch for URLs that are explicitly skip-protected. |
| } | ||
|
|
||
| #[test] | ||
| fn test_extract_host() { | ||
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | ||
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | ||
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | ||
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | ||
| // URLs without scheme are rejected (prevents bare string matching) | ||
| assert_eq!(extract_host("example.com/path"), ""); | ||
| assert_eq!(extract_host("not-a-url"), ""); | ||
| assert_eq!(extract_host("random-text-not-a-url"), ""); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_exact_match() { | ||
| let config = ScannerConfig { | ||
| skip_protection_domains: vec!["trusted.example.com".into()], | ||
| ..Default::default() | ||
| }; | ||
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://example.com/path")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_wildcard() { | ||
| let config = ScannerConfig { | ||
| skip_protection_domains: vec!["*.example.com".into()], | ||
| ..Default::default() | ||
| }; | ||
| assert!(config.is_skip_protected("https://example.com/path")); | ||
| assert!(config.is_skip_protected("https://sub.example.com/path")); | ||
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://example.org/path")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_empty_list() { | ||
| let config = ScannerConfig::default(); | ||
| assert!(!config.is_skip_protected("https://anything.com")); |
There was a problem hiding this comment.
The new unit tests are placed outside the existing #[cfg(test)] mod tests block. While they’ll run under cargo test, keeping tests inside the dedicated test module is more consistent with the rest of the file and avoids compiling these functions into non-test builds if they aren’t eliminated. Consider moving them under the existing mod tests.
| } | |
| #[test] | |
| fn test_extract_host() { | |
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | |
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | |
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | |
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | |
| // URLs without scheme are rejected (prevents bare string matching) | |
| assert_eq!(extract_host("example.com/path"), ""); | |
| assert_eq!(extract_host("not-a-url"), ""); | |
| assert_eq!(extract_host("random-text-not-a-url"), ""); | |
| } | |
| #[test] | |
| fn test_skip_protection_exact_match() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["trusted.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.com/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_wildcard() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["*.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://example.com/path")); | |
| assert!(config.is_skip_protected("https://sub.example.com/path")); | |
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.org/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_empty_list() { | |
| let config = ScannerConfig::default(); | |
| assert!(!config.is_skip_protected("https://anything.com")); | |
| #[test] | |
| fn test_extract_host() { | |
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | |
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | |
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | |
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | |
| // URLs without scheme are rejected (prevents bare string matching) | |
| assert_eq!(extract_host("example.com/path"), ""); | |
| assert_eq!(extract_host("not-a-url"), ""); | |
| assert_eq!(extract_host("random-text-not-a-url"), ""); | |
| } | |
| #[test] | |
| fn test_skip_protection_exact_match() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["trusted.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.com/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_wildcard() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["*.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://example.com/path")); | |
| assert!(config.is_skip_protected("https://sub.example.com/path")); | |
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.org/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_empty_list() { | |
| let config = ScannerConfig::default(); | |
| assert!(!config.is_skip_protected("https://anything.com")); | |
| } |
| @@ -15,7 +15,6 @@ jobs: | |||
| # Check formatting and linting | |||
| # ───────────────────────────────────────────────────────────────────────────── | |||
| fmt-and-clippy: | |||
There was a problem hiding this comment.
Removing explicit job name: fields changes the generated check-run names in GitHub Actions (they’ll fall back to the job id). If branch protection / rulesets require the previous check names (e.g. “Format & Clippy”, “Release Build”), this could break required status checks. Consider keeping the name: fields unless the change is intentional and rulesets are updated.
| fmt-and-clippy: | |
| fmt-and-clippy: | |
| name: Format & Clippy |
- Rate limiting: Added token-bucket rate limiter to OutpostProxy - Outbound scanning: Added on_outbound_message to ToolHook and implemented it in OutpostMiddleware - Digest TTL: Implemented TTL check in DigestStore::get and wired it through ScannerConfig - Fixed associated tests and imports
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| #[test] | ||
| fn test_extract_host() { | ||
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | ||
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | ||
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | ||
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | ||
| // URLs without scheme are rejected (prevents bare string matching) | ||
| assert_eq!(extract_host("example.com/path"), ""); | ||
| assert_eq!(extract_host("not-a-url"), ""); | ||
| assert_eq!(extract_host("random-text-not-a-url"), ""); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_exact_match() { | ||
| let config = ScannerConfig { | ||
| skip_protection_domains: vec!["trusted.example.com".into()], | ||
| ..Default::default() | ||
| }; | ||
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://example.com/path")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_wildcard() { | ||
| let config = ScannerConfig { | ||
| skip_protection_domains: vec!["*.example.com".into()], | ||
| ..Default::default() | ||
| }; | ||
| assert!(config.is_skip_protected("https://example.com/path")); | ||
| assert!(config.is_skip_protected("https://sub.example.com/path")); | ||
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | ||
| assert!(!config.is_skip_protected("https://example.org/path")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_skip_protection_empty_list() { | ||
| let config = ScannerConfig::default(); | ||
| assert!(!config.is_skip_protected("https://anything.com")); |
There was a problem hiding this comment.
These new unit tests are outside the #[cfg(test)] mod tests block. That means they'll also be compiled into non-test builds (even though the #[test] harness only runs in test mode), which is non-idiomatic and can bloat release builds. Move these #[test] functions inside the existing #[cfg(test)] test module (or wrap them in #[cfg(test)]).
| } | |
| #[test] | |
| fn test_extract_host() { | |
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | |
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | |
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | |
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | |
| // URLs without scheme are rejected (prevents bare string matching) | |
| assert_eq!(extract_host("example.com/path"), ""); | |
| assert_eq!(extract_host("not-a-url"), ""); | |
| assert_eq!(extract_host("random-text-not-a-url"), ""); | |
| } | |
| #[test] | |
| fn test_skip_protection_exact_match() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["trusted.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.com/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_wildcard() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["*.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://example.com/path")); | |
| assert!(config.is_skip_protected("https://sub.example.com/path")); | |
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.org/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_empty_list() { | |
| let config = ScannerConfig::default(); | |
| assert!(!config.is_skip_protected("https://anything.com")); | |
| #[test] | |
| fn test_extract_host() { | |
| assert_eq!(extract_host("https://example.com/path"), "example.com"); | |
| assert_eq!(extract_host("http://example.com:8080/path"), "example.com"); | |
| assert_eq!(extract_host("https://sub.example.com"), "sub.example.com"); | |
| assert_eq!(extract_host("https://localhost:3000"), "localhost"); | |
| // URLs without scheme are rejected (prevents bare string matching) | |
| assert_eq!(extract_host("example.com/path"), ""); | |
| assert_eq!(extract_host("not-a-url"), ""); | |
| assert_eq!(extract_host("random-text-not-a-url"), ""); | |
| } | |
| #[test] | |
| fn test_skip_protection_exact_match() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["trusted.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://trusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://untrusted.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.com/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_wildcard() { | |
| let config = ScannerConfig { | |
| skip_protection_domains: vec!["*.example.com".into()], | |
| ..Default::default() | |
| }; | |
| assert!(config.is_skip_protected("https://example.com/path")); | |
| assert!(config.is_skip_protected("https://sub.example.com/path")); | |
| assert!(config.is_skip_protected("https://deep.sub.example.com/path")); | |
| assert!(!config.is_skip_protected("https://example.org/path")); | |
| } | |
| #[test] | |
| fn test_skip_protection_empty_list() { | |
| let config = ScannerConfig::default(); | |
| assert!(!config.is_skip_protected("https://anything.com")); | |
| } |
| let v = s | ||
| .scan("https://security-blog.com", content, ScanContext::WebFetch) | ||
| .await; | ||
| // Should be Review (not Unsafe) due to discussion context | ||
| matches!(v, OutpostVerdict::Review { .. }); | ||
| } |
There was a problem hiding this comment.
In test_discussion_context_suppression, the final matches!(v, OutpostVerdict::Review { .. }); expression is not asserted, so the test will always pass regardless of the verdict. Change this to an assert!(matches!(...)) (or equivalent) so the test actually verifies behavior, especially now that the test is no longer #[ignore]d.
| @@ -311,7 +345,6 @@ | |||
| } | |||
There was a problem hiding this comment.
Similarly, test_base64_blob_review ends with a bare matches!(v, OutpostVerdict::Review { .. }); that isn't asserted. This makes the test a no-op and it will always pass. Please turn it into an assertion now that the test is enabled.
| /// Time until the next request would be allowed (for cooldown messaging). | ||
| fn cooldown_remaining(&self, source: &str) -> Option<Duration> { | ||
| let _bucket = self.buckets.get(source)?; | ||
| let tokens_per_sec = self.config.max_requests_per_minute as f64 / 60.0; | ||
| if tokens_per_sec <= 0.0 { | ||
| return Some(Duration::from_secs(self.config.cooldown_seconds)); | ||
| } | ||
| // Time to accumulate 1 token | ||
| let secs_per_token = 1.0 / tokens_per_sec; | ||
| Some(Duration::from_secs_f64(secs_per_token.ceil())) |
There was a problem hiding this comment.
RateLimitConfig::cooldown_seconds is documented as a cooldown after hitting the limit, but the current token-bucket implementation never enforces a cooldown, and cooldown_remaining() doesn't use cooldown_seconds (except when max_requests_per_minute is 0). This makes both the behavior and the “Try again in …” messaging misleading. Either implement an actual cooldown window (e.g., track last-denied timestamp per source) or remove the cooldown setting/wording and compute a true retry-after based on bucket state.
| /// Time until the next request would be allowed (for cooldown messaging). | |
| fn cooldown_remaining(&self, source: &str) -> Option<Duration> { | |
| let _bucket = self.buckets.get(source)?; | |
| let tokens_per_sec = self.config.max_requests_per_minute as f64 / 60.0; | |
| if tokens_per_sec <= 0.0 { | |
| return Some(Duration::from_secs(self.config.cooldown_seconds)); | |
| } | |
| // Time to accumulate 1 token | |
| let secs_per_token = 1.0 / tokens_per_sec; | |
| Some(Duration::from_secs_f64(secs_per_token.ceil())) | |
| /// Time until the next request would be allowed under the current token-bucket state. | |
| fn cooldown_remaining(&self, source: &str) -> Option<Duration> { | |
| let bucket = self.buckets.get(source)?; | |
| if bucket.0 > 0 { | |
| return Some(Duration::ZERO); | |
| } | |
| let tokens_per_sec = self.config.max_requests_per_minute as f64 / 60.0; | |
| if tokens_per_sec <= 0.0 { | |
| return None; | |
| } | |
| let secs_per_token = 1.0 / tokens_per_sec; | |
| let elapsed = Instant::now().duration_since(bucket.1).as_secs_f64(); | |
| let remaining_secs = (secs_per_token - elapsed).max(0.0); | |
| Some(Duration::from_secs_f64(remaining_secs)) |
| struct RateLimiter { | ||
| /// Configured rate limits | ||
| config: RateLimitConfig, | ||
| /// Per-source state: (tokens available, last refill time) | ||
| buckets: HashMap<String, (u32, Instant)>, | ||
| } |
There was a problem hiding this comment.
The per-host RateLimiter state uses an unbounded HashMap<String, ...> with no eviction/TTL. In a long-running proxy that sees many unique hosts, this can grow without bound and increase memory usage over time. Consider adding eviction (LRU/TTL), bounding the map size, or periodically cleaning out inactive buckets.
| let annotated = format!( | ||
| "[⚠ OUTPOST REVIEW (outbound): {reason}] | ||
| {}", | ||
| content | ||
| ); |
There was a problem hiding this comment.
This outbound annotation format string includes a literal newline plus a leading space before the message body, so outbound content will be prefixed with an extra space on the first line. If that's unintentional, prefer an explicit \n without leading whitespace (and keep formatting consistent with the inbound review annotation).
| let annotated = format!( | |
| "[⚠ OUTPOST REVIEW (outbound): {reason}] | |
| {}", | |
| content | |
| ); | |
| let annotated = | |
| format!("[⚠ OUTPOST REVIEW (outbound): {reason}]\n{}", content); |
| /// Logging verbosity. | ||
| pub log_verbosity: LogVerbosity, | ||
|
|
||
| /// Whether to enable the digest cache. Disabling forces a rescan every time. | ||
| /// Default: `true` for all profiles except `Paranoid`. | ||
| pub enable_digest_cache: bool, | ||
|
|
||
| /// Maximum age of a digest cache entry before it's rescanned (seconds). | ||
| /// `0` = never expires (only content-hash invalidates). | ||
| pub digest_cache_ttl_secs: u64, |
There was a problem hiding this comment.
SecurityConfig includes enable_digest_cache and a top-level digest_cache_ttl_secs, but the proxy currently consults ScannerConfig::digest_cache_ttl_secs directly and never reads enable_digest_cache. This duplication makes it easy for overrides to become inconsistent (e.g., disable digest cache in SecurityConfig but still use it in ScannerConfig). Consider making a single source of truth (either remove the duplicate fields or ensure proxy construction derives scanner/proxy settings from SecurityConfig).
- RateLimiter: burst allowance, per-source isolation, cooldown calculation - DigestStore: TTL expiration behavior, zero TTL (no expiration) - OutpostMiddleware: outbound scanning disabled/enabled, clean/unsafe content 46 tests passing (was 38, +8 new tests)
- Fix bare matches! assertions (now actually asserting) - Move tests inside #[cfg(test)] mod tests block - Fix extract_host for query params without path (?x=1) - Add RATE_LIMITER_MAX_SOURCES constant with LRU eviction Fixes review comments from copilot-pull-request-reviewer[bot]
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Strips scheme (http://, https://), takes the hostname (before first `/`, `:`, or `?`). | ||
| /// Returns empty string if the URL has no scheme (rejects bare strings). | ||
| pub fn extract_host(url: &str) -> &str { | ||
| let rest = match url | ||
| .strip_prefix("https://") | ||
| .or_else(|| url.strip_prefix("http://")) | ||
| { | ||
| Some(r) => r, | ||
| None => return "", // no scheme = not a URL | ||
| }; | ||
| // Take up to first `:` (port), `/` (path), or `?` (query) | ||
| let end = rest | ||
| .find(':') | ||
| .or_else(|| rest.find('/')) | ||
| .or_else(|| rest.find('?')) | ||
| .unwrap_or(rest.len()); | ||
| &rest[..end] |
There was a problem hiding this comment.
extract_host picks the delimiter using find(':').or_else(find('/')).or_else(find('?')), which does not choose the earliest delimiter. This can mis-parse hosts for URLs with : in the path/query or IPv6/userinfo, and it can let callers bypass host-based rate limiting / skip-protection matching by crafting URLs (e.g. https://example.com/a:1). Consider parsing with the url crate, or at minimum computing the minimum index across :, /, and ? (and handling uppercase schemes).
| /// Strips scheme (http://, https://), takes the hostname (before first `/`, `:`, or `?`). | |
| /// Returns empty string if the URL has no scheme (rejects bare strings). | |
| pub fn extract_host(url: &str) -> &str { | |
| let rest = match url | |
| .strip_prefix("https://") | |
| .or_else(|| url.strip_prefix("http://")) | |
| { | |
| Some(r) => r, | |
| None => return "", // no scheme = not a URL | |
| }; | |
| // Take up to first `:` (port), `/` (path), or `?` (query) | |
| let end = rest | |
| .find(':') | |
| .or_else(|| rest.find('/')) | |
| .or_else(|| rest.find('?')) | |
| .unwrap_or(rest.len()); | |
| &rest[..end] | |
| /// Accepts `http://` and `https://` schemes (case-insensitive), isolates the | |
| /// authority, removes optional userinfo, and returns the hostname without a | |
| /// trailing port. Returns empty string if the URL has no supported scheme. | |
| pub fn extract_host(url: &str) -> &str { | |
| let (scheme, rest) = match url.split_once("://") { | |
| Some(parts) => parts, | |
| None => return "", // no scheme = not a URL | |
| }; | |
| if !scheme.eq_ignore_ascii_case("http") && !scheme.eq_ignore_ascii_case("https") { | |
| return ""; | |
| } | |
| // Authority ends before the first path, query, or fragment delimiter. | |
| let authority_end = rest | |
| .find(['/', '?', '#']) | |
| .unwrap_or(rest.len()); | |
| let authority = &rest[..authority_end]; | |
| // Drop optional userinfo (`user:pass@host` -> `host`). | |
| let host_port = authority | |
| .rsplit_once('@') | |
| .map(|(_, host)| host) | |
| .unwrap_or(authority); | |
| // Bracketed IPv6 literal: `[::1]` or `[::1]:8080`. | |
| if let Some(stripped) = host_port.strip_prefix('[') { | |
| return match stripped.find(']') { | |
| Some(end) => &stripped[..end], | |
| None => "", // malformed bracketed host | |
| }; | |
| } | |
| // For non-IPv6 hosts, strip a trailing numeric `:port` if present. | |
| if let Some((host, port)) = host_port.rsplit_once(':') { | |
| if !port.is_empty() && port.bytes().all(|b| b.is_ascii_digit()) { | |
| return host; | |
| } | |
| } | |
| host_port |
| async fn on_outbound_message(&self, _content: &str, _context: &str) -> HookOutcome { | ||
| HookOutcome::PassThrough(_content.to_owned()) |
There was a problem hiding this comment.
The default on_outbound_message implementation uses the _content parameter (_content.to_owned()). With CI running cargo clippy -- -D warnings, this will trigger Clippy's used_underscore_binding lint. Rename the parameter to content (and keep _context if intentionally unused) to avoid failing Clippy.
| async fn on_outbound_message(&self, _content: &str, _context: &str) -> HookOutcome { | |
| HookOutcome::PassThrough(_content.to_owned()) | |
| async fn on_outbound_message(&self, content: &str, _context: &str) -> HookOutcome { | |
| HookOutcome::PassThrough(content.to_owned()) |
- RateLimiter: use configured cooldown_seconds instead of calculated token time - Outbound annotation: single-line format without leading newline/space - SecurityConfig: manual Default impl matching Balanced profile
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Extract the host from a URL string. | ||
| /// Strips scheme (http://, https://), takes the hostname (before first `/`, `:`, or `?`). | ||
| /// Returns empty string if the URL has no scheme (rejects bare strings). | ||
| pub fn extract_host(url: &str) -> &str { | ||
| let rest = match url | ||
| .strip_prefix("https://") | ||
| .or_else(|| url.strip_prefix("http://")) | ||
| { | ||
| Some(r) => r, | ||
| None => return "", // no scheme = not a URL | ||
| }; | ||
| // Take up to first `:` (port), `/` (path), or `?` (query) | ||
| let end = rest | ||
| .find(':') | ||
| .or_else(|| rest.find('/')) | ||
| .or_else(|| rest.find('?')) | ||
| .unwrap_or(rest.len()); | ||
| &rest[..end] | ||
| } |
There was a problem hiding this comment.
extract_host does not correctly parse valid URLs that include userinfo (https://user:pass@host/...) or IPv6 literals (https://[::1]/). Because it stops at the first :///?, it can return incorrect “hosts” (e.g., user), which breaks rate limiting and can make skip_protection_domains matching unreliable / bypassable. Prefer using a real URL parser (e.g., reqwest::Url::parse(url)?.host_str()) and normalize case; add test cases for userinfo and IPv6.
| // Step 1: skip_protection — bypass ALL scanning for trusted domains | ||
| // Check before HTTP fetch to avoid touching untrusted servers entirely. | ||
| if self.scanner.config().is_skip_protected(url) { | ||
| debug!(url, "outpost: skip_protection bypass"); | ||
| let content = match self.http_get(url).await { | ||
| Ok(c) => c, | ||
| Err(e) => { | ||
| return OutpostFetchResult::Blocked { | ||
| reason: format!("HTTP fetch failed: {e}"), | ||
| digest: String::new(), | ||
| url: url.to_owned(), | ||
| }; | ||
| } | ||
| }; | ||
| let digest = sha256_hex(&content); | ||
| self.logger | ||
| .log(ScanContext::WebFetch, url, &OutpostVerdict::Clean, false) | ||
| .await; | ||
| return OutpostFetchResult::Ok { content, digest }; | ||
| } |
There was a problem hiding this comment.
skip_protection_domains changes fetch() behavior (bypasses the entire scan pipeline and forces a Clean verdict), but there’s no proxy-level test asserting this behavior. Since proxy.rs already has integration tests with wiremock, add a test that configures skip_protection_domains, returns malicious content from the mock server, and verifies fetch() returns Ok (and does not persist a scanned verdict).
| fn cooldown_remaining(&self, _source: &str) -> Option<Duration> { | ||
| // Use configured cooldown_seconds when rate limited | ||
| Some(Duration::from_secs(self.config.cooldown_seconds)) |
There was a problem hiding this comment.
RateLimiter::cooldown_remaining always returns Some(Duration::from_secs(...)) and never returns None, which is both misleading (it’s not actually “time until next allowed”) and likely to trigger Clippy’s unnecessary_wraps lint under -D warnings. Consider returning a plain Duration (or computing a real retry-after and returning Option<Duration> only when appropriate).
| fn cooldown_remaining(&self, _source: &str) -> Option<Duration> { | |
| // Use configured cooldown_seconds when rate limited | |
| Some(Duration::from_secs(self.config.cooldown_seconds)) | |
| fn cooldown_remaining(&self, source: &str) -> Option<Duration> { | |
| let (tokens, last_refill, _) = self.buckets.get(source)?; | |
| if *tokens > 0 { | |
| return None; | |
| } | |
| if self.config.max_requests_per_minute == 0 { | |
| return Some(Duration::from_secs(self.config.cooldown_seconds)); | |
| } | |
| let refill_interval = | |
| Duration::from_secs_f64(60.0 / self.config.max_requests_per_minute as f64); | |
| let elapsed = Instant::now().saturating_duration_since(*last_refill); | |
| let retry_after = refill_interval.saturating_sub(elapsed); | |
| if retry_after.is_zero() { | |
| None | |
| } else { | |
| Some(retry_after) | |
| } |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Whether to enable the digest cache. Disabling forces a rescan every time. | ||
| /// Default: `true` for all profiles except `Paranoid`. | ||
| pub enable_digest_cache: bool, | ||
|
|
||
| /// Maximum age of a digest cache entry before it's rescanned (seconds). | ||
| /// `0` = never expires (only content-hash invalidates). | ||
| pub digest_cache_ttl_secs: u64, | ||
|
|
||
| /// Whether to scan outbound messages (agent → user) in addition to inbound. | ||
| /// Default: `false` for `Open`/`Balanced`, `true` for `Hardened`/`Paranoid`. | ||
| pub scan_outbound: bool, | ||
|
|
||
| /// Whether to enable audit logging of all security decisions. | ||
| pub audit_logging: bool, |
There was a problem hiding this comment.
The SecurityConfig fields enable_digest_cache, digest_cache_ttl_secs, and audit_logging are presented as global controls, but they aren’t wired into the proxy/digest-cache or proxy audit logging (only middleware checks audit_logging). This makes the profile-derived config misleading and can cause unexpected behavior. Consider either (1) plumbing these flags into OutpostProxy::fetch (skip store get/set and skip audit logger calls when disabled), or (2) removing/renaming the fields so they clearly apply only to middleware/scanner.
| pub fn paranoid() -> Self { | ||
| let digest_cache_ttl_secs = 0; | ||
| Self { | ||
| profile: SecurityProfile::Paranoid, | ||
| scanner: ScannerConfig { | ||
| discussion_ratio_threshold: 0.0, | ||
| min_signals_for_ratio: 0, | ||
| override_on_review: false, | ||
| digest_cache_ttl_secs, | ||
| ..Default::default() | ||
| }, | ||
| intercepted_tools: InterceptedToolSet::all_including_exec(), | ||
| rate_limit: RateLimitConfig::paranoid(), | ||
| log_verbosity: LogVerbosity::Trace, | ||
| enable_digest_cache: false, | ||
| digest_cache_ttl_secs, |
There was a problem hiding this comment.
SecurityProfile::Paranoid sets enable_digest_cache = false but also sets scanner.digest_cache_ttl_secs = 0, and the proxy interprets TTL=0 as “never expires” (i.e., strongest caching). As written, the paranoid profile won’t actually force rescans as described in the docs/comments. Align the implementation so paranoid truly disables digest caching (e.g., skip digest store reads/writes entirely for that profile) or update the profile semantics/docs.
| // Scan with Outbound context | ||
| let verdict = self | ||
| .scanner | ||
| .scan(context, content, ScanContext::Outbound) | ||
| .await; | ||
|
|
||
| if self.config.audit_logging { | ||
| self.logger | ||
| .log(ScanContext::Outbound, context, &verdict, false) |
There was a problem hiding this comment.
Outbound scanning passes the context string into scanner.scan and audit logging as the url field. Since AuditEntry calls this field url, logs and remote-layer payloads will contain non-URL data, making audits harder to interpret and potentially confusing remote services that expect a URL. Consider introducing a distinct identifier field for outbound (e.g., message_id/channel_context) or prefixing/encoding it so it’s clearly not a URL.
| // Scan with Outbound context | |
| let verdict = self | |
| .scanner | |
| .scan(context, content, ScanContext::Outbound) | |
| .await; | |
| if self.config.audit_logging { | |
| self.logger | |
| .log(ScanContext::Outbound, context, &verdict, false) | |
| // Outbound `context` is not necessarily a URL. Wrap it in an explicit synthetic | |
| // identifier so downstream logs/payloads do not misrepresent it as one. | |
| let outbound_target = format!("outbound-context:{context:?}"); | |
| // Scan with Outbound context | |
| let verdict = self | |
| .scanner | |
| .scan(&outbound_target, content, ScanContext::Outbound) | |
| .await; | |
| if self.config.audit_logging { | |
| self.logger | |
| .log(ScanContext::Outbound, &outbound_target, &verdict, false) |
|
Merged #4 instead |
Pull request was closed
Adds §11–§15 to the draft RFC, broadening from code-correctness to user-story and indirect threat models per review: - §11 Indirect threat models (10 scenarios): substituted-value exfil by upstream, upstream logging, agent-to-agent exfil, pre-substitution artifacts, memory persistence, error-message side-channels, indirect disclosure bypass (chaos-paper #3), adversarial third-party messages, name-leakage as signal, and a mapping of all 8 chaos lessons to our secret-gateway risk surface. - §12 User story failures (10 scenarios): first-run UX, .env migration, key rotation, "I need the value" (HMAC/JWT signing), non-HTTP protocols, blocked legitimate requests, cross-machine secret sync, mobile-without-LAN `!secure request`, request preview/dry-run, and perceived complexity cost. - §13 Legitimate cases we struggle with: HMAC/JWT, binary bodies, streams, WebSocket sessions, OAuth device-flow, mTLS certs, per-user per-request secrets. - §14 Explicitly out of scope: host compromise, fnox root compromise, user-side misuse, compromised model weights, supply chain, timing. - §15 Research pointers: 1Password op:// refs, Doppler, AWS Secrets Manager per-secret destination binding, Vault response-wrapping, SPIFFE/SPIRE, chaos-engineering mindset. Key architectural implication from §11.1 + §11.8: substitution must be bound per-secret to per-destination, and eligibility to substitute must be tagged at the ref site in code the agent wrote, not blindly at every outbound request. Spike item before committing substantial code to §29 (substitution implementation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Security-proxy now shares onecli-client's vault resolver rather than its own parallel startup-time env scan. Per-request, on cache miss, it consults env → fnox → vaultwarden and caches the result. This fixes the silently-broken rotation behaviour documented in `docs/rfcs/consolidation-findings.md` finding #5: previously, a rotated key added to env or fnox after security-proxy started was invisible until restart. Changes: - Add `onecli-client = { path = "../onecli-client" }` as a path dep. - New `CredentialInjector::ensure_cached(provider) -> bool` that returns fast when the DashMap already has the provider, else calls `onecli_client::vault::get_secret(provider)` and inserts on success. - Public wrapper `detect_provider_pub(host)` so the proxy hot path can determine the provider name without duplicating the pattern table. - Proxy.rs resolves + caches before calling inject() on each request. - Two behavior tests covering cache-hit (no resolver call) and nothing-resolves (cache stays clean, no `Bearer ""` footgun). Written in given/when/then doc form as per today's test-quality review discussion. Left for follow-up commits on this branch: - Merge onecli-client's `/vault/:secret` + `/proxy/:provider` routes into security-proxy (move the handlers, then delete main.rs). - Reconcile env-var conventions (ZEROGATE_KEY_* vs <NAME>_API_KEY — finding #1). - Per-provider auth header schemes (finding #3 — onecli only does Bearer; security-proxy has anthropic's x-api-key path). - Remove hardcoded `vault.enjyn.com` fallback (finding #6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(messages): numbered choice rendering + name-or-number matcher
`OutboundMessage::render_text_fallback` now produces a clean numbered
list with an explicit "(reply with name or number)" hint for each
ChoiceControl, instead of the prior shell-style "- Label: \`!command\`"
rendering that exposed internal command syntax. The user's reply is
the entry point — the channel side resolves it to a command via the
new matcher rather than asking the user to type the command verbatim.
Adds `ChoiceControl::match_reply(&str) -> Match`. Match tiers:
1. 1-based number ("2", "#3")
2. Exact label match (case/whitespace/punctuation-insensitive)
3. Unique label prefix
4. Unique label substring (≥ 2 chars)
Returns `Match::Ambiguous` when multiple labels collide so the channel
can re-prompt instead of dispatching the wrong action. Returns
`Match::None` for freeform text that doesn't look like a selection,
so channels can fall through to normal command/agent dispatch.
`Match::OutOfRange` when the user typed a number outside [1, N].
Universal value: every text-fallback channel (Signal, WhatsApp,
Matrix, SMS, mock) gets a better UX immediately. Native-rendering
channels (Telegram inline keyboards) are unaffected.
The matcher is currently allow(dead_code) — wiring it into each
channel's inbound dispatch (with per-identity pending-choice state +
TTL) is a follow-up PR. The matcher itself ships first because the
design is non-trivial and worth landing as a stable utility.
Test count: +8 (existing render test updated; new tests cover hint
text, multiple controls, numeric/label/prefix/substring/ambiguous/
out-of-range/freeform-fallthrough cases).
* fix(messages): keep render_text_fallback shape; ship matcher only
Address reviewer regression concern (PR #114): rendering without the
matcher wired makes ChoiceControls non-actionable on text-only
channels (Signal/WhatsApp/Matrix/SMS/mock). Reverting the render
change here so PR #114 becomes purely additive — adds the matcher
utility but does not change user-visible output.
A follow-up PR will land the proper UX (numbered render + matcher
wiring + per-channel pending-choice state) as one cohesive change.
Also fix the byte-vs-char length check in match_reply per reviewer
feedback: `len()` is byte count, which would let a single multi-byte
Unicode glyph through the substring guard. Use `chars().count()`
for the semantic 'at least 2 visible characters' check.
---------
Co-authored-by: Librarian <librarian@glusman.me>
Description
Brief description of the changes in this PR.
Type of Change
Testing
cargo testpasses for all affected cratescargo clippypasses with no warningscargo fmtis cleanChecklist
Related Issues
Fixes # (issue number)
Breaking Changes
List any breaking changes and migration steps:
Additional Notes
Any additional context or screenshots.