Expand config file poisoning detection and scan infrastructure#40
Expand config file poisoning detection and scan infrastructure#40
Conversation
- Add Severity::Info (below Low, maps to Action::Allow) - Bump schema_version from 2 to 3 - Add HttpiePipeShell and XhPipeShell RuleIds - Add httpie/xh to PATTERN_TABLE, is_source_command, pipe detection - Add 4 golden fixtures and 7 unit tests for new rules - Update docs/compatibility.md for schema v3
… math operators, whitespace Extend scan_bytes() with 4 new detection categories: - UnicodeTags (Critical): hidden ASCII encoding via U+E0000-E007F, with decoder - VariationSelector (Info): VS1-256, low severity per ADR-12 - InvisibleMathOperator (Medium): U+2061-2064 - InvisibleWhitespace (Medium): hair space, thin space, narrow NBSP Also adds CGJ (U+034F), soft hyphen (U+00AD), and word joiner (U+2060) to existing zero-width detection. 8 new test fixtures.
…ector Covers invisible_whitespace, invisible_math_operator, and variation_selector in exec context. Adds VS17-256 (U+E0100) paste fixture for supplementary range coverage.
…ules Allow fixtures with non-empty expected_rules (e.g. variation_selector) should still be validated by tier-1 coverage — only truly clean fixtures (empty expected_rules) should be skipped.
…fusion detection Phase 2 — FP Reduction: - Suppress ZWJ/ZWNJ when both neighbors are in the same joining script (Arabic, Devanagari, Thai, etc.) — prevents false positives on legitimate text - Elevate zero-width in ASCII-only content to Critical severity - Suppress BOM (U+FEFF) at byte offset 0 as file-encoding artifact Phase 3 — OCR Confusion: - New ocr_confusions.tsv (17 entries: multi-char rn→m, cl→d, etc. + single-char digit/letter confusions + smart quotes + typographic dashes) - Build-time compilation with lowercase canonical validation - ocr_normalize() in hostname rules with max 3 consecutive substitutions (ADR-9) - Detects rnicrosoft.com, paypa1.com, g00gle.com as confusable domains - OCR scoped to known-domain comparisons only (not path rules) per ADR-9
Detect dangerous environment variable exports (LD_PRELOAD, BASH_ENV, PYTHONPATH, API keys) and network access to cloud metadata endpoints (169.254.169.254) and private network ranges. Seven new RuleId variants with appropriate severity mapping. 20 new test fixtures, 11 unit tests.
Implement file/directory scanning for AI config file security: - ScanContext::FileScan bypasses tier-1 fast-exit, runs byte scan + configfile rules - Config poisoning detection: invisible Unicode, non-ASCII in JSON, prompt injection (18 patterns), MCP config validation (HTTP scheme, raw IP, shell metachar, duplicate names, wildcard tools) - Centralized scanner in tirith-core with priority sorting, binary skip, dir walking, max_files cap - CLI `tirith scan` subcommand with --file, --stdin, --json, --ci, --fail-on, --ignore - Added file_path and human_view/agent_view to AnalysisContext and Finding - 8 new RuleId variants, 14 golden fixtures, 268 tests passing
Critical: char boundary panic in check_prompt_injection — context slicing with byte offsets from regex can land mid-char on multibyte input. Added floor/ceil_char_boundary helpers. High: McpDuplicateServerName was dead — serde_json deduplicates keys before rule logic. Replaced with raw JSON token scanner that detects duplicates before parsing. High: ConfigNonAscii missed dotfiles (.cursorrules, .mcprc) — Path::extension() returns None for dotfiles. Now checks basename directly for known dotfile names. Medium: expanded known config coverage from 18 to 22 patterns — added .claude/CLAUDE.md, .vscode/settings.json, .cursor/rules, .github/AGENTS.md. Low: scan priority list had duplicate copilot-instructions.md and overly broad generic basenames (settings.json, config.json). Now only prioritizes generic names when found inside known config dirs. 270 tests passing.
The raw JSON token scanner in check_mcp_duplicate_names could panic on unterminated strings: `i += 2` for escape sequences could overshoot bytes.len(), causing out-of-bounds slice at `&content[key_start..i]`. Fix: guard escape skip with bounds check, track whether closing quote was found, and break on unterminated strings instead of slicing. Added test with 5 malformed JSON variants (trailing backslash, unterminated string, empty key, truncated input).
push_segment() incorrectly treated VAR=VALUE as the command token. Now skips leading environment variable assignments to find the real command. Adds pub is_env_assignment() helper for use by engine bypass detection. Fixes: TIRITH=0 curl evil.com now correctly identifies curl as command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add command-aware output-flag skipping for curl (-o/--output) and wget (-O/-OFILE/--output-document). Extract URLs from command+args instead of raw segment text to avoid matching URLs in env-prefix values. Add conservative non-TLD file extensions (.png, .jpg, .mp4, etc.) to schemeless host exclusion list. Fixes issue #33. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add checkpoint system, MCP dispatcher, and cloaking detection modules. Replace silent `let _ =` error suppression with diagnostic eprintln messages in audit logging and CLI output writers.
- Extend configfile rules with comprehensive .env, MCP config, and AI config detection - Add full directory scanning with recursive traversal and file-type routing - Add configfile fixture tests (500+ lines of test cases) - Add benchmark entries for scan and configfile performance - Wire scan context through CLI check/paste/score commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| // Raw IP address in URL | ||
| if let Some(host) = extract_host_from_url(url) { | ||
| if host.parse::<std::net::Ipv4Addr>().is_ok() || host.parse::<std::net::Ipv6Addr>().is_ok() |
There was a problem hiding this comment.
🟡 Medium rules/configfile.rs:1053
IPv6 host handling is incorrect: extract_host_from_url treats : as a delimiter and Ipv6Addr::parse doesn’t accept bracketed hosts. Suggest: if the host starts with [, read to the matching ], strip brackets before parsing, then search for :, / or ? after the ].
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/configfile.rs around line 1053:
IPv6 host handling is incorrect: `extract_host_from_url` treats `:` as a delimiter and `Ipv6Addr::parse` doesn’t accept bracketed hosts. Suggest: if the host starts with `[`, read to the matching `]`, strip brackets before parsing, then search for `:`, `/` or `?` after the `]`.
Evidence trail:
crates/tirith-core/src/rules/configfile.rs lines 1072-1080 (REVIEWED_COMMIT): `extract_host_from_url` function uses `.find(['/', ':', '?'])` as delimiter, which breaks IPv6 addresses that contain colons. Lines 1053-1054 show the parsed host is passed to `host.parse::<std::net::Ipv6Addr>()`. Rust std library documentation confirms `Ipv6Addr::from_str` does not accept bracketed format.
| }; | ||
| // Get host:port (before first /) | ||
| let host_port = after_userinfo.split('/').next().unwrap_or(after_userinfo); | ||
| return Some(strip_port(host_port)); |
There was a problem hiding this comment.
🟢 Low rules/command.rs:453
strip_port misparses IPv6. Unbracketed addresses and unmatched [ are handled via rfind(':'), causing truncation or bogus port stripping. Suggest: if [ is present without ], return unchanged; if there are multiple : and no brackets, treat as IPv6 and skip port stripping (or document this limitation).
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/command.rs around line 453:
`strip_port` misparses IPv6. Unbracketed addresses and unmatched `[` are handled via `rfind(':')`, causing truncation or bogus port stripping. Suggest: if `[` is present without `]`, return unchanged; if there are multiple `:` and no brackets, treat as IPv6 and skip port stripping (or document this limitation).
Evidence trail:
crates/tirith-core/src/rules/command.rs lines 468-482 at REVIEWED_COMMIT: `strip_port` function. Line 469-472: IPv6 bracket handling only covers the case where `]` is found. Line 474-477: falls back to `rfind(':')` for all other cases. Unbracketed IPv6 `2001:db8::1` → `rfind(':')` finds last `:`, `"1".parse::<u16>().is_ok()` → true, returns `2001:db8::` (truncated). Unmatched bracket `[::1` → `starts_with('[')` true, `find(']')` returns None, falls through to `rfind(':')`, returns `[:` (garbage).
| { | ||
| 1 | ||
| } else if result.total_findings() > 0 { | ||
| 2 |
There was a problem hiding this comment.
🟢 Low cli/scan.rs:69
Exit code 2 is used for both incomplete scans and non-Critical/High findings, contradicting the comment which documents findings (exit 1) > clean (exit 0). Consider returning 1 for any findings, or updating the comment to reflect the intended behavior.
| 2 | |
| 1 |
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith/src/cli/scan.rs around line 69:
Exit code 2 is used for both incomplete scans and non-Critical/High findings, contradicting the comment which documents `findings (exit 1) > clean (exit 0)`. Consider returning 1 for any findings, or updating the comment to reflect the intended behavior.
Evidence trail:
crates/tirith/src/cli/scan.rs lines 54-72 at REVIEWED_COMMIT:
- Line 54: Comment states 'Exit code precedence: incomplete scan (exit 2) > findings (exit 1) > clean (exit 0).'
- Line 59: Returns 2 for incomplete scan
- Line 67: Returns 1 only for Critical/High findings
- Line 69: Returns 2 for non-Critical/High findings (total_findings() > 0)
- Line 71: Returns 0 for clean
The comment claims 'findings (exit 1)' but non-Critical/High findings actually exit 2, confirming the discrepancy.
| human_view: None, | ||
| agent_view: None, | ||
| }); | ||
| return; |
There was a problem hiding this comment.
🟡 Medium rules/command.rs:417
The return statements exit the entire function after the first finding, so if a command accesses multiple suspicious endpoints, only the first is reported. Consider removing these return statements to continue checking all arguments and segments.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/command.rs around line 417:
The `return` statements exit the entire function after the first finding, so if a command accesses multiple suspicious endpoints, only the first is reported. Consider removing these `return` statements to continue checking all arguments and segments.
Evidence trail:
crates/tirith-core/src/rules/command.rs lines 385-430 at REVIEWED_COMMIT. Function `check_network_destination` signature at line 385 shows `findings: &mut Vec<Finding>`. Nested loops at lines 386 and 395. `return;` statements at approximately lines 414 and 429 exit the entire function after the first finding is pushed, preventing detection of subsequent suspicious endpoints in remaining args or segments.
| /// File extensions that indicate binary content (skip scanning). | ||
| fn is_binary_extension(name: &str) -> bool { | ||
| let binary_exts = [ | ||
| ".png", ".jpg", ".jpeg", ".gif", ".bmp", ".ico", ".svg", ".webp", ".mp3", ".mp4", ".wav", |
There was a problem hiding this comment.
🟢 Low src/scan.rs:833
.svg files are XML text, not binary, and can contain embedded JavaScript. Consider removing .svg from this list so the scanner can detect scripts in SVG images.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around line 833:
`.svg` files are XML text, not binary, and can contain embedded JavaScript. Consider removing `.svg` from this list so the scanner can detect scripts in SVG images.
Evidence trail:
crates/tirith-core/src/scan.rs lines 830-840 at REVIEWED_COMMIT - The `is_binary_extension` function includes `.svg` in the `binary_exts` array on line 833. The function comment on line 830 states 'File extensions that indicate binary content (skip scanning)'. SVG files are XML text format that can contain embedded JavaScript, making this classification incorrect for a security scanner.
…nges, exit codes - Extract values after '=' in --flag=value args for network destination checks - Handle env -C/-S and time -f/-o flags that consume the next argument - Fix IPv6 bracket handling in extract_host_from_url for MCP config checks - Add link-local, CGNAT, 0.0.0.0/8, and multicast to private IP ranges - Unify --fail-on exit code logic so threshold is always respected - Fix trailing-slash bypass in schemeless host file extension filter - Use static Lazy<Regex> for intervening-verb pattern in is_negated - Apply negation post-filter to weak injection patterns - Add skipped_config_paths to JSON scan output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| // Find the "mcpServers" or "servers" object, then collect its top-level keys. | ||
| // We use serde_json::Deserializer::from_str to get raw token positions. | ||
| // Simpler approach: find the servers object brace, then extract top-level string keys. | ||
| let servers_key_pos = content |
There was a problem hiding this comment.
🟡 Medium rules/configfile.rs:932
Suggestion: Avoid ad‑hoc substring parsing of structured data in this module. Substring scans can misparse JSON keys and URLs (e.g., keys inside strings; userinfo before host). Prefer structured parsers (e.g., a JSON token stream for the servers object and a URL parser that strips userinfo), or add strict post‑checks if a parser isn’t feasible.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/configfile.rs around line 932:
Suggestion: Avoid ad‑hoc substring parsing of structured data in this module. Substring scans can misparse JSON keys and URLs (e.g., keys inside strings; userinfo before host). Prefer structured parsers (e.g., a JSON token stream for the servers object and a URL parser that strips userinfo), or add strict post‑checks if a parser isn’t feasible.
Evidence trail:
crates/tirith-core/src/rules/configfile.rs lines 927-1020 (check_mcp_duplicate_names function using raw find for JSON key detection), lines 1074-1087 (extract_host_from_url function), lines 1054-1060 (how extract_host_from_url is used for IP detection)
|
|
||
| /// Run a file scan operation. | ||
| pub fn scan(config: &ScanConfig) -> ScanResult { | ||
| let matcher = ConfigPathMatcher::new(&config.path, vec![]); |
There was a problem hiding this comment.
🟢 Low src/scan.rs:140
When config.path is a file, using it as repo_root for ConfigPathMatcher breaks classification. Consider using the file's parent directory instead: config.path.parent().unwrap_or(&config.path).
| let matcher = ConfigPathMatcher::new(&config.path, vec![]); | |
| let repo_root = if config.path.is_file() { | |
| config.path.parent().unwrap_or(&config.path) | |
| } else { | |
| &config.path | |
| }; | |
| let matcher = ConfigPathMatcher::new(repo_root, vec![]); |
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around line 140:
When `config.path` is a file, using it as `repo_root` for `ConfigPathMatcher` breaks classification. Consider using the file's parent directory instead: `config.path.parent().unwrap_or(&config.path)`.
Evidence trail:
crates/tirith-core/src/scan.rs:140 - `ConfigPathMatcher::new(&config.path, vec![])` uses config.path as repo_root
crates/tirith-core/src/scan.rs:482-484 - when path is_file(), it's pushed directly to files list
crates/tirith-core/src/rules/configfile.rs:239-244 - `path.strip_prefix(&self.repo_root)` normalizes absolute paths
crates/tirith-core/src/rules/configfile.rs:262 - `if components.is_empty() { return ConfigMatch::NotConfig; }` - empty components after stripping causes incorrect NotConfig result
| .file_name() | ||
| .and_then(|n| n.to_str()) | ||
| .map(|s| s.to_string()); | ||
| let (scanned, dir_skipped) = scan_config_dir_recursive( |
There was a problem hiding this comment.
🟡 Medium src/scan.rs:718
is_valid_config_extension_for_dir doesn't handle .roo/rules-{slug}/*.md paths (pattern 7 in is_known). Files like vendor/pkg/.roo/rules-myslug/evil.md won't be detected by the excluded-tree probe. Consider adding the rules-{slug} pattern check to that function.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around line 718:
`is_valid_config_extension_for_dir` doesn't handle `.roo/rules-{slug}/*.md` paths (pattern 7 in `is_known`). Files like `vendor/pkg/.roo/rules-myslug/evil.md` won't be detected by the excluded-tree probe. Consider adding the `rules-{slug}` pattern check to that function.
Evidence trail:
crates/tirith-core/src/rules/configfile.rs:168-228 (is_valid_config_extension_for_dir implementation - no rules-{slug} pattern), crates/tirith-core/src/rules/configfile.rs:340-363 (is_known pattern 7 handling .roo/rules-{slug}/*.md), crates/tirith-core/src/rules/configfile.rs:56-79 (KNOWN_CONFIG_DEEP_DIRS constant - only has '.roo/rules' not '.roo/rules-{slug}'), crates/tirith-core/src/scan.rs:790-794 (excluded-tree probe uses is_valid_config_extension_for_dir)
| let name = f.to_string_lossy(); | ||
| if config | ||
| .ignore_patterns | ||
| .iter() |
There was a problem hiding this comment.
🟡 Medium src/scan.rs:170
Ignore filtering is inconsistent: ignore_patterns are documented as globs but implemented as substring matches, and applied to basenames here while full paths elsewhere. Suggest implementing glob matching on a single target (e.g., repo‑relative full paths) and updating the docs.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around line 170:
Ignore filtering is inconsistent: `ignore_patterns` are documented as globs but implemented as substring matches, and applied to basenames here while full paths elsewhere. Suggest implementing glob matching on a single target (e.g., repo‑relative full paths) and updating the docs.
Evidence trail:
crates/tirith-core/src/scan.rs line 18: `/// Glob patterns to ignore.` (documentation claims glob patterns)
crates/tirith-core/src/scan.rs line 521: `let name = path.file_name().and_then(|n| n.to_str()).unwrap_or("");` (name is basename in collect_recursive_with_probing)
crates/tirith-core/src/scan.rs lines 593-595: `if ignore_patterns.iter().any(|pat| name.contains(pat.as_str()))` (substring match on basename)
crates/tirith-core/src/scan.rs line 167: `let name = f.to_string_lossy();` (name is full path in main scan)
crates/tirith-core/src/scan.rs lines 168-171: Same substring match but on full path
| || octets[0] == 0 // 0.0.0.0/8 | ||
| || octets[0] >= 224; // 224.0.0.0/4 multicast + reserved | ||
| } | ||
| false |
There was a problem hiding this comment.
🟡 Medium rules/command.rs:505
Private network detection only checks IPv4 literals, so hostnames (e.g., localhost) and IPv6 (::1, fe80::/10, fc00::/7) bypass it. Suggest expanding is_private_ip (and, if needed, extract_host_from_arg) to handle these cases, or document the limitation.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/command.rs around line 505:
Private network detection only checks IPv4 literals, so hostnames (e.g., `localhost`) and IPv6 (`::1`, `fe80::/10`, `fc00::/7`) bypass it. Suggest expanding `is_private_ip` (and, if needed, `extract_host_from_arg`) to handle these cases, or document the limitation.
Evidence trail:
crates/tirith-core/src/rules/command.rs lines 486-499 (is_private_ip function definition showing IPv4-only parsing), line 418 (usage in PrivateNetworkAccess detection). The function uses `host.parse::<std::net::Ipv4Addr>()` which fails for hostnames and IPv6, causing the function to return `false` and bypassing the security check.
- Fix redundant closures in golden_fixtures.rs and cli_integration.rs (clippy) - Fix uninlined format args in cli_integration.rs and scan.rs (clippy 1.93) - Fix Windows test: use cfg(windows) paths for absolute path tests - Remove dead path_str variable in check_mcp_config - Use .expect() instead of .ok() for compile-time regex patterns - Handle -p/-v/-V flags in resolve_command_wrapper - Use exact match for TIRITH=0 bypass detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused ci parameter from run_stdin/run_single_file (--fail-on now handles all threshold logic unconditionally) - Use find_iter() for weak patterns so negated first matches don't skip the entire pattern, matching strong/legacy pattern behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Merge origin/main (glibc build fix) - Fix single & segment boundary in split_raw_words (security) - Use exact match == TIRITH=0 (prevents false bypass) - Skip flags in resolve_command_wrapper - Remove dead code in is_tirith_command - Remove quote-strip from is_env_assignment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ible char support Takes main's bypass detection (PowerShell, quoted values, command-word requirement, command -v/-V, time -f/-o, quote stripping, digit-prefix rejection) and integrates PR's unique additions: FileScan context with configfile rules, and invisible character tier-1/tier-3 gating for unicode tags, variation selectors, math operators, and invisible whitespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| while let Some((current, depth)) = stack.pop() { | ||
| if depth > max_depth { | ||
| skipped += 1; | ||
| continue; | ||
| } | ||
| if file_count >= CONFIG_DIR_FILE_CAP { |
There was a problem hiding this comment.
🟢 Low src/scan.rs:754
The depth > max_depth check on line 755 appears unreachable—entries are only pushed when depth < max_depth (line 785-786), so popped entries never exceed max_depth. Consider removing this dead code or clarifying if the bounds logic was intended differently.
while let Some((current, depth)) = stack.pop() {
- if depth > max_depth {
- skipped += 1;
- continue;
- }
if file_count >= CONFIG_DIR_FILE_CAP {🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around lines 754-759:
The `depth > max_depth` check on line 755 appears unreachable—entries are only pushed when `depth < max_depth` (line 785-786), so popped entries never exceed `max_depth`. Consider removing this dead code or clarifying if the bounds logic was intended differently.
Evidence trail:
crates/tirith-core/src/scan.rs lines 745-800 at REVIEWED_COMMIT. Key lines: 750 (initial stack with depth 0), 754 (pop loop), 755 (the unreachable check `depth > max_depth`), 785-786 (push condition `if depth < max_depth { stack.push((path, depth + 1)); }`). The push condition ensures max pushed depth is `max_depth`, so the `> max_depth` check is unreachable.
| let path = entry.path(); | ||
| let name = path.file_name().and_then(|n| n.to_str()).unwrap_or(""); |
There was a problem hiding this comment.
🟢 Low src/scan.rs:520
If file_name() returns None or contains non-UTF8, name becomes "", bypassing is_excluded_dir, starts_with('.'), and is_binary_extension checks. Consider using continue for these cases instead of falling back to empty string.
- let path = entry.path();
- let name = path.file_name().and_then(|n| n.to_str()).unwrap_or("");
+ let path = entry.path();
+ let name = match path.file_name().and_then(|n| n.to_str()) {
+ Some(n) => n,
+ None => continue,
+ };🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around lines 520-521:
If `file_name()` returns `None` or contains non-UTF8, `name` becomes `""`, bypassing `is_excluded_dir`, `starts_with('.')`, and `is_binary_extension` checks. Consider using `continue` for these cases instead of falling back to empty string.
Evidence trail:
crates/tirith-core/src/scan.rs:520 - `let name = path.file_name().and_then(|n| n.to_str()).unwrap_or("");`
crates/tirith-core/src/scan.rs:538 - `if is_excluded_dir(name)` check uses the potentially empty name
crates/tirith-core/src/scan.rs:562 - `if name.starts_with('.')` check uses the potentially empty name
crates/tirith-core/src/scan.rs:586 - `if is_binary_extension(name)` check uses the potentially empty name
crates/tirith-core/src/scan.rs:819-820 - `is_excluded_dir` checks `DEFAULT_EXCLUDE_DIRS.contains(&name)` - empty string not in list
crates/tirith-core/src/scan.rs:831-838 - `is_binary_extension` checks `ends_with(ext)` - empty string won't match any extension
| }; | ||
| // Get host:port (before first /) | ||
| let host_port = after_userinfo.split('/').next().unwrap_or(after_userinfo); | ||
| return Some(strip_port(host_port)); |
There was a problem hiding this comment.
🟢 Low rules/command.rs:460
URLs like http:// or http:///path return Some("") instead of None. Consider checking if the host is empty before returning.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/command.rs around line 460:
URLs like `http://` or `http:///path` return `Some("")` instead of `None`. Consider checking if the host is empty before returning.
Evidence trail:
File: crates/tirith-core/src/rules/command.rs lines 447-474 (REVIEWED_COMMIT) - `extract_host_from_arg` function shows line 460: `return Some(strip_port(host_port));` with no check for empty host. Lines 477-490 - `strip_port` function returns the input unchanged if empty string. For `http://`: after_scheme="", split('/').next()="", returns Some(""). For `http:///path`: split '/path' by '/' → ["", "path"], first element is "", returns Some("").
| "medium" => Severity::Medium, | ||
| "high" => Severity::High, | ||
| "critical" => Severity::Critical, | ||
| _ => Severity::Critical, |
There was a problem hiding this comment.
🟢 Low cli/scan.rs:143
Unrecognized severity values silently default to Critical, which can mask typos (e.g., --fail-on hig). Consider returning a Result or panicking with a helpful error message listing valid options.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith/src/cli/scan.rs around line 143:
Unrecognized severity values silently default to `Critical`, which can mask typos (e.g., `--fail-on hig`). Consider returning a `Result` or panicking with a helpful error message listing valid options.
Evidence trail:
crates/tirith/src/cli/scan.rs lines 137-146 at REVIEWED_COMMIT: `parse_severity` function uses a wildcard `_` match arm that returns `Severity::Critical` for any unrecognized input string, confirming silent defaulting behavior.
|
|
||
| /// Scan a single file and return its results. | ||
| /// | ||
| /// If `skipped_config_paths` is provided and the file is a config file that |
There was a problem hiding this comment.
🟢 Low src/scan.rs:257
Doc comment mentions skipped_config_paths parameter that doesn't exist in this function's signature. Consider removing that sentence since it describes internal behavior not exposed to callers.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around line 257:
Doc comment mentions `skipped_config_paths` parameter that doesn't exist in this function's signature. Consider removing that sentence since it describes internal behavior not exposed to callers.
Evidence trail:
crates/tirith-core/src/scan.rs lines 256-274 at REVIEWED_COMMIT. Doc comment at lines 257-259 mentions `skipped_config_paths` parameter. Function signature at line 260 shows only `file_path: &Path` and `matcher: &ConfigPathMatcher` parameters.
Summary
Test plan
tirith scan .on a repo with .env and MCP configs🤖 Generated with Claude Code
Note
Add file/tree scan with config-file poisoning detection, expand invisible Unicode and command/network rules, and introduce MCP stdio dispatcher with a 10 MiB line limit
Introduce a repository and file scanning API and CLI subcommand; add config-file path matching and OCR-confusable hostname checks; extend byte scanning and terminal rules to detect Unicode tags, variation selectors, invisible math operators, invisible whitespace, and context-aware ZWJ/ZWNJ; add environment-variable, HTTPie/xh pipe-to-shell, and network destination rules; bump JSON schema to 3 and add
INFOseverity; implement a stdio MCP dispatcher with a 10 MiB per-line cap.📍Where to Start
Start with the scan entrypoint in
scan::scanin crates/tirith-core/src/scan.rs, then review integration inengine::analyzein crates/tirith-core/src/engine.rs and CLI wiring in crates/tirith/src/cli/scan.rs.Macroscope summarized 5646cc2.