Skip to content

Add Pro-tier enrichment pipeline and verdict extensions#41

Merged
sheeki03 merged 23 commits intomainfrom
feat/part8-pro-enrichments
Feb 24, 2026
Merged

Add Pro-tier enrichment pipeline and verdict extensions#41
sheeki03 merged 23 commits intomainfrom
feat/part8-pro-enrichments

Conversation

@sheeki03
Copy link
Owner

@sheeki03 sheeki03 commented Feb 24, 2026

Summary

  • Add dual-view enrichment for rendered content findings (human vs AI agent perspective)
  • Add paranoia-level filtering with tier-gated depth control
  • Extend verdict structs with human_view, agent_view, and enrichment fields
  • Improve URL handling in extract.rs for enrichment context

Test plan

  • All existing tests pass
  • Manual: verify enriched output on rendered content findings with Pro tier

🤖 Generated with Claude Code

Note

Add Pro-tier enrichment pipeline, paranoia filtering, and clipboard HTML detection with rules::terminal::check_clipboard_html and CLI paste --html (50+ char discrepancy threshold)

Introduce rendered-content and PDF checks, tier-gated enrichment and early-access filtering in engine::analyze, new MCP server/resources/tools, and a Unix fetch cloaking command; extend Finding with mitre_id/custom_rule_id, add clipboard HTML to AnalysisContext, and expose new RuleId variants.

📍Where to Start

Start with engine::analyze in engine.rs, then review rendered detection in rules::rendered::check in rendered.rs and clipboard handling in rules::terminal::check_clipboard_html in terminal.rs.

Macroscope summarized eb70868.

sheeki03 and others added 14 commits February 9, 2026 23:32
Implements tirith as an MCP server over stdio, making all detection
capabilities available to AI agents (Claude Code, Cursor, Windsurf).

MCP module (crates/tirith-core/src/mcp/):
- dispatcher.rs: State machine (AwaitingInit → Initialized → Ready),
  pre-init enforcement (-32002), version negotiation (2025-11-25 primary,
  2025-06-18 fallback), parse error handling (-32700), 9 unit tests
- types.rs: JSON-RPC 2.0 + MCP protocol structs, version negotiation
- tools.rs: 7 tools (6 cross-platform + tirith_fetch_cloaking Unix-only
  stub), structured + text responses, directory scan capped at 50 files
- resources.rs: tirith://project-safety resource with cwd scanning
- CLI: tirith mcp-server subcommand (thin stdio wrapper)

Community-only — no tier gating. License gates added in Part 7.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…validation

Critical: notifications/initialized before initialize no longer transitions
to Ready — only honored when state is already Initialized, preventing
clients from bypassing pre-init enforcement.

High: Two-phase parsing separates true parse errors (-32700, null id) from
structurally invalid requests (-32600, preserves id when recoverable).
Missing "method" field now correctly returns -32600 instead of -32700.

Medium: Validates jsonrpc == "2.0" and rejects non-string/number/null id
types (object, array, bool) with -32600. Removes unused JsonRpcRequest
struct from types.rs.

9 new tests covering all three fixes plus string id, null id, and
missing jsonrpc field edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces the paid tier infrastructure (Community/Pro/Team/Enterprise)
with offline license key validation. Removes all file scan caps — detection
is always free with full coverage per ADR-13.

Key changes:
- license.rs: Tier enum, base64 JSON key decoder, env/file resolution
- rule_metadata.rs: Time-boxed early-access gates with Critical bypass
- engine.rs: Enrichment stubs (enrich_pro/enrich_team) + early-access filter
- verdict.rs: LicenseRequired RuleId variant
- Removed 50-file caps from scan.rs, cli/scan.rs, mcp/tools.rs, mcp/resources.rs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mark decode_tier() and current_tier() as unsigned bootstrap format,
  explicitly not a security boundary (finding #1)
- Remove last remaining max_files: Some(50) in read_project_safety() (finding #2)
- Extract filter_early_access_with() taking a metadata slice parameter,
  rewrite tests to exercise actual filter logic with injected rule meta:
  critical bypass, medium suppression, pro pass-through, no-metadata (finding #3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ranoia tiers (Part 8)

Phase 6: Rendered content scanning (CSS hiding, WCAG 2.0 color hiding,
HTML hidden/aria-hidden attributes, markdown/HTML comments) with 8 new
RuleIds. Detection runs for all tiers; dual-view enrichment (human_view/
agent_view) populated for Pro.

Phase 10: Server-side cloaking detection (Unix-only, reqwest). Fetches
URL with 6 user-agents, normalizes HTML, word-level diff with >10 char
threshold. Pro gets full diff text; Free gets detection boolean.

Phase 11: Clipboard rich-text analysis via --html flag on paste command.
Detects CSS hiding and length discrepancy in clipboard HTML.

Phase 13: Checkpoint/rollback system with auto-trigger on destructive
commands (rm -rf, git reset --hard, etc). SHA-256 file hashing,
configurable retention. Pro-gated in core per ADR-6.

Phase 15: Paranoia tiers 3-4 (Low/Info findings) gated to Pro. Filter
applied in core scan module and wired into CLI check/paste and MCP tools.

Phase 19: PDF hidden text detection via lopdf. Flags sub-pixel scale
transforms.

Additional fixes from review:
- UTF-8 safe truncation in rendered.rs and cloaking.rs (char-based, not
  byte-based) preventing panics on multibyte boundaries
- Pro enrichment implemented for 7 finding types with dual-view output
- Paranoia filtering applied in core scan_single_file/scan_stdin so all
  consumers (CLI, MCP, resources) benefit automatically
- License gates in core checkpoint module, not CLI (ADR-6)
- LicenseRequired finding emitted when scan truncated by tier cap
- Scan file cap: Community 50 files, Pro unlimited (CLI + MCP + resources)
- Doctor reports cloaking_available/webhooks_available from cfg(unix)
- checkpoint list functional for all tiers with license hint for Free

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
push_segment() incorrectly treated VAR=VALUE as the command token. Now
skips leading environment variable assignments to find the real command.
Adds pub is_env_assignment() helper for use by engine bypass detection.

Fixes: TIRITH=0 curl evil.com now correctly identifies curl as command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bypass detection now finds TIRITH=0 in inline env prefixes and env
wrappers (env -i TIRITH=0, /usr/bin/env TIRITH=0), not just process
env. Handles -u value-taking flag and -- option terminator.

Self-invocation guard allows tirith's own commands (tirith diff, etc.)
in single-segment inputs. Resolves through env/command/time wrappers.
Uses canonicalized path comparison for path-form invocations with
fallback to literal-only matching when canonicalization fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add command-aware output-flag skipping for curl (-o/--output) and wget
(-O/-OFILE/--output-document). Extract URLs from command+args instead
of raw segment text to avoid matching URLs in env-prefix values.

Add conservative non-TLD file extensions (.png, .jpg, .mp4, etc.) to
schemeless host exclusion list. Fixes issue #33.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add checkpoint ID validation, file count/size limits, and SSRF
protection for cloaking URL fetches. Add 10 MiB line limit to MCP
dispatcher. Replace silent error suppression with diagnostic messages
in audit logging and CLI output writers.
- Add dual-view enrichment (human vs AI agent perspective) for rendered content findings
- Add paranoia-level filtering with tier-gated depth control
- Extend verdict structs with human_view, agent_view, and enrichment fields
- Improve extract.rs URL handling for enrichment context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +383 to +390
for entry in &manifest {
if entry.is_dir {
continue;
}
if seen_paths.contains(&entry.original_path) {
continue;
}
let backup = files_dir.join(&entry.sha256);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/checkpoint.rs:383

The second loop skips files already in seen_paths, so backup integrity isn't validated for deleted/modified files. Consider checking backup existence for all files, or at minimum documenting that backup corruption is only detected for unchanged files.

 for entry in &manifest {
         if entry.is_dir {
             continue;
         }
-        if seen_paths.contains(&entry.original_path) {
-            continue;
-        }
         let backup = files_dir.join(&entry.sha256);
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/checkpoint.rs around lines 383-390:

The second loop skips files already in `seen_paths`, so backup integrity isn't validated for deleted/modified files. Consider checking backup existence for all files, or at minimum documenting that backup corruption is only detected for unchanged files.

Evidence trail:
crates/tirith-core/src/checkpoint.rs lines 350-379 (first loop adds deleted/modified files to seen_paths), lines 383-398 (second loop skips seen_paths when checking backup integrity), lines 388-389 (explicit skip: `if seen_paths.contains(&entry.original_path) { continue; }`), lines 310-318 (restore function only prints to stderr and continues if backup missing).


// Raw IP address in URL
if let Some(host) = extract_host_from_url(url) {
if host.parse::<std::net::Ipv4Addr>().is_ok() || host.parse::<std::net::Ipv6Addr>().is_ok()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium rules/configfile.rs:500

Suggestion: extract_host_from_url mis-parses when userinfo or bracketed IPv6 is present, and keeping [] causes Ipv6Addr::parse to fail. Consider skipping past userinfo (@), treating [] as a single host and stripping the brackets, then bounding the host by /, :, or ?.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/configfile.rs around line 500:

Suggestion: `extract_host_from_url` mis-parses when userinfo or bracketed IPv6 is present, and keeping `[]` causes `Ipv6Addr::parse` to fail. Consider skipping past userinfo (`@`), treating `[`…`]` as a single host and stripping the brackets, then bounding the host by `/`, `:`, or `?`.

Evidence trail:
crates/tirith-core/src/rules/configfile.rs lines 520-526 (extract_host_from_url function), lines 500-501 (Ipv6Addr::parse call) at REVIEWED_COMMIT. The function uses `find(['/', ':', '?'])` which incorrectly matches colons inside IPv6 brackets and colons in userinfo passwords. No `@` handling for userinfo. Rust std::net::Ipv6Addr does not parse bracketed addresses.

CI fixes:
- Fix redundant closure clippy lint in golden_fixtures.rs
- Fix approx_constant clippy lint in rendered.rs test (3.14 -> 3.25)
- Fix uninlined_format_args clippy lints across test files
- Pin time crate <0.3.47 in lockfile for MSRV 1.83 compat

Review fixes (Critical/High/Medium):
- checkpoint.rs: path traversal validation on manifest entries in restore
- checkpoint.rs: reject "." as checkpoint ID
- checkpoint.rs: propagate create_dir_all and sha256_file errors
- engine.rs: env -C/-S options take args, skip 2 positions like -u
- engine.rs: recalculate verdict action after paranoia filtering
- rendered.rs: validate ASCII hex before slicing, clamp RGB 0-255
- terminal.rs: strip script/style block content before tag removal
- cloaking.rs: skip comparison when baseline fetch returns empty body

Review fixes (Low):
- mcp/tools.rs: shell-quote URL in synthetic curl command
- mcp/tools.rs: use is_file() instead of exists() for file inputs
- extract.rs: handle trailing slash in schemeless host detection
"tirith mcp-server: line exceeds {MAX_LINE_BYTES} byte limit, dropping"
);
// Drain remainder of this oversized line (up to next newline)
let mut discard = Vec::new();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High mcp/dispatcher.rs:40

When draining oversized lines, read_until has no size limit, so an attacker can exhaust memory with a newline-free stream. Consider using a bounded read loop (e.g., read fixed chunks until newline or limit reached) instead of unbounded read_until.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/mcp/dispatcher.rs around line 40:

When draining oversized lines, `read_until` has no size limit, so an attacker can exhaust memory with a newline-free stream. Consider using a bounded read loop (e.g., read fixed chunks until newline or limit reached) instead of unbounded `read_until`.

Evidence trail:
crates/tirith-core/src/mcp/dispatcher.rs lines 22-44 at REVIEWED_COMMIT: Line 40-42 shows `let mut discard = Vec::new(); ... let _ = input.read_until(b'\n', &mut discard);` - this `read_until` reads directly from `input` with no size bound, while the main read path at lines 29-31 uses `.take(MAX_LINE_BYTES as u64 + 1)` to limit reads.

static SCRIPT_STYLE: Lazy<Regex> =
Lazy::new(|| Regex::new(r"(?is)<(?:script|style)[^>]*>.*?</(?:script|style)>").unwrap());
static TAGS: Lazy<Regex> = Lazy::new(|| Regex::new(r"<[^>]*>").unwrap());
static ENTITIES: Lazy<Regex> = Lazy::new(|| Regex::new(r"&[a-zA-Z]+;|&#\d+;").unwrap());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low rules/terminal.rs:408

The ENTITIES regex doesn't match hex entities like A. Consider adding &#x[a-fA-F0-9]+; to the pattern to avoid inflated visible_len calculations.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/terminal.rs around line 408:

The `ENTITIES` regex doesn't match hex entities like `A`. Consider adding `&#x[a-fA-F0-9]+;` to the pattern to avoid inflated `visible_len` calculations.

Evidence trail:
crates/tirith-core/src/rules/terminal.rs:408 (REVIEWED_COMMIT) - The ENTITIES regex is `r"&[a-zA-Z]+;|&#\d+;"` which matches named entities and decimal numeric entities but not hexadecimal entities (`&#x[0-9a-fA-F]+;`). HTML spec supports hex numeric character references (e.g., https://www.w3.org/TR/html52/syntax.html#character-references). The function doc comment states it extracts "approximate visible text content" and ENTITIES.replace_all is used to strip entities for length calculation.

Comment on lines +378 to +380
let current_sha = sha256_file(current_path)
.map_err(|e| format!("diff hash {}: {e}", entry.original_path))?;
if current_sha != entry.sha256 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/checkpoint.rs:378

Race condition: file could be deleted between exists() check and sha256_file(). Consider catching NotFound errors from sha256_file and treating them as Deleted rather than failing the entire operation.

-        let current_sha = sha256_file(current_path)
-            .map_err(|e| format!("diff hash {}: {e}", entry.original_path))?;
-        if current_sha != entry.sha256 {
+        let current_sha = match sha256_file(current_path) {
+            Ok(sha) => sha,
+            Err(e) if e.contains("No such file") => {
+                seen_paths.insert(entry.original_path.clone());
+                diffs.push(DiffEntry {
+                    path: entry.original_path.clone(),
+                    status: DiffStatus::Deleted,
+                    checkpoint_sha256: entry.sha256.clone(),
+                    current_sha256: None,
+                });
+                continue;
+            }
+            Err(e) => return Err(format!("diff hash {}: {e}", entry.original_path)),
+        };
+        if current_sha != entry.sha256 {
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/checkpoint.rs around lines 378-380:

Race condition: file could be deleted between `exists()` check and `sha256_file()`. Consider catching `NotFound` errors from `sha256_file` and treating them as `Deleted` rather than failing the entire operation.

Evidence trail:
crates/tirith-core/src/checkpoint.rs lines 367-378 (REVIEWED_COMMIT): shows the exists() check on line 367 followed by sha256_file() call on line 377-378 with error propagation via ?. crates/tirith-core/src/checkpoint.rs lines 593-605 (REVIEWED_COMMIT): sha256_file() implementation shows it uses fs::File::open(path) which returns NotFound error if file doesn't exist.

}

ScanResult {
scanned_count: file_results.len(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/scan.rs:111

scanned_count only counts files where scan_single_file returned Some. Files that fail or return None are neither in scanned_count nor skipped_count. Consider setting scanned_count to files.len() (attempted) or tracking failed files separately.

Suggested change
scanned_count: file_results.len(),
scanned_count: files.len(),
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/scan.rs around line 111:

`scanned_count` only counts files where `scan_single_file` returned `Some`. Files that fail or return `None` are neither in `scanned_count` nor `skipped_count`. Consider setting `scanned_count` to `files.len()` (attempted) or tracking failed files separately.

Evidence trail:
crates/tirith-core/src/scan.rs lines 103-115 (REVIEWED_COMMIT): loop pushes to `file_results` only on `Some`, `scanned_count: file_results.len()`. Lines 93-99: `skipped_count` only set for truncation. Lines 118-128: `scan_single_file` returns `None` on metadata/read failures via `.ok()?`.

@sheeki03 sheeki03 force-pushed the feat/part8-pro-enrichments branch from 19a5029 to 40c0b2d Compare February 24, 2026 16:39
time 0.3.47 fixes the CVE but requires Rust 1.88. Pin to 0.3.46
and ignore the advisory — lopdf (our only consumer) doesn't parse
user-provided RFC 2822 dates, so the DoS is not exploitable.
@sheeki03 sheeki03 force-pushed the feat/part8-pro-enrichments branch from 40c0b2d to 3fefd7d Compare February 24, 2026 16:46
A single & (background operator) should break command segments just like
|, ;, &&, and newline. Without this, `safe && malicious` could slip the
malicious part past analysis when only the first segment is checked.
Some(c) => c,
None => return tool_error("Missing required parameter: command"),
};
let shell = match args
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low mcp/tools.rs:172

The shell match only handles "powershell" explicitly—"fish" and "pwsh" fall through to Posix. Consider using ShellType::from_str which already handles all variants.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/mcp/tools.rs around line 172:

The `shell` match only handles `"powershell"` explicitly—`"fish"` and `"pwsh"` fall through to `Posix`. Consider using `ShellType::from_str` which already handles all variants.

Evidence trail:
crates/tirith-core/src/mcp/tools.rs lines 173-178 (REVIEWED_COMMIT) - shows manual match only handling "powershell" explicitly with catch-all to Posix
crates/tirith-core/src/tokenize.rs lines 6-21 (REVIEWED_COMMIT) - shows ShellType enum with Fish variant and FromStr implementation that handles "fish" and "pwsh" correctly

sheeki03 and others added 2 commits February 24, 2026 22:55
- Merge origin/main (glibc build fix)
- Use exact match == TIRITH=0 (prevents false bypass)
- Skip flags in resolve_command_wrapper
- Remove dead code in is_tirith_command
- Remove quote-stripping from is_env_assignment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +183 to +186
"tools/call" => {
let result = handle_tools_call(&params);
JsonRpcResponse::ok(id, serde_json::to_value(result).unwrap_or(json!({})))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low mcp/dispatcher.rs:183

Serialization errors are being swallowed, causing invalid or missing JSON‑RPC responses. Suggest handling serde failures explicitly: log them and surface an -32603 error instead of returning {} or nothing.

-                "tools/call" => {
-                    let result = handle_tools_call(¶ms);
-                    JsonRpcResponse::ok(id, serde_json::to_value(result).unwrap_or(json!({})))
-                }
+                "tools/call" => {
+                    let result = handle_tools_call(¶ms);
+                    match serde_json::to_value(result) {
+                        Ok(v) => JsonRpcResponse::ok(id, v),
+                        Err(e) => JsonRpcResponse::err(id, JsonRpcError {
+                            code: -32603,
+                            message: format!("Internal error: {e}"),
+                            data: None,
+                        }),
+                    }
+                }
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/mcp/dispatcher.rs around lines 183-186:

Serialization errors are being swallowed, causing invalid or missing JSON‑RPC responses. Suggest handling serde failures explicitly: log them and surface an `-32603` error instead of returning `{}` or nothing.

Evidence trail:
crates/tirith-core/src/mcp/dispatcher.rs lines 183-186 (REVIEWED_COMMIT): The `tools/call` handler contains `JsonRpcResponse::ok(id, serde_json::to_value(result).unwrap_or(json!({})))` which uses `.unwrap_or(json!({}))` to silently swallow serialization errors and return an empty object as a success response.

// If we truncated (didn't consume all chars), add marker
if needs_truncation {
// Trim any incomplete bracket marker at the end
if let Some(last_open) = out.rfind('[') {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/engine.rs:761

The incomplete bracket trimming logic (lines 761-764) appears to be dead code. Since escape sequences like [LBRACK] are only appended when they fit completely (line 752-755), and verbatim characters never include [, there's no scenario where out ends with an unmatched [. Consider removing this block or documenting why it's kept as defensive code.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/engine.rs around line 761:

The incomplete bracket trimming logic (lines 761-764) appears to be dead code. Since escape sequences like `[LBRACK]` are only appended when they fit completely (line 752-755), and verbatim characters never include `[`, there's no scenario where `out` ends with an unmatched `[`. Consider removing this block or documenting why it's kept as defensive code.

Evidence trail:
crates/tirith-core/src/engine.rs lines 726, 741-748, 752-755, 761-764 at REVIEWED_COMMIT. Line 726 shows `[` is explicitly matched and escaped to `[LBRACK]`. Lines 752-755 show escape sequences are only added if they fit completely (`if out.len() + escaped.len() > content_cap { break; }`). Lines 741-748 show the verbatim branch excludes `[` since it's handled earlier. Lines 761-764 show the dead code that trims incomplete brackets - a condition that can never be true given the implementation.

sheeki03 and others added 2 commits February 24, 2026 23:46
- Use JSON-RPC -32603 (Internal error) for resources/read failures
  instead of -32602 (Invalid params) — uri validation is separate
- Skip duplicate audit log_verdict in check.rs and paste.rs when
  bypass was honored (analyze() already logged it)
- Fix "chars" → "bytes" in audit redact message (Rust .len() returns
  byte count)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ogical test

- engine.rs: Use sanitized length (full_len) in truncation marker instead
  of raw input bytes (total), and document defensive bracket-trimming code
- license.rs: Reject tokens with non-string exp values (e.g. numeric
  timestamps) that previously bypassed expiry validation entirely
- license.rs: Replace tautological assertion in test_current_tier_defaults_community
  with a simple no-panic check
- license.rs: Add test_decode_numeric_exp_rejected regression test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sheeki03 and others added 2 commits February 24, 2026 23:56
- Quoted TIRITH='0' and TIRITH="0" were not detected as bypass because
  split_raw_words preserves quotes but comparison was literal "TIRITH=0".
  Extract is_tirith_bypass_assignment() helper that strips quotes from
  the value portion before comparing.
- Increase MARKER_RESERVE from 40 to 50 to accommodate the longer
  "bytes sanitized" marker text for very large inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve all conflicts by taking main's improved code as baseline,
then surgically adding PR 41's unique features on top:

- Pro/Team enrichment (enrich_pro, enrich_team, evidence_summary)
- Paranoia filtering (filter_findings_by_paranoia, filter_findings_by_paranoia_vec)
- Rendered content rules in FileScan pipeline
- Clipboard HTML analysis (clipboard_html field + check_clipboard_html)
- Finding struct: mitre_id + custom_rule_id fields
- sanitize_view for enrichment text safety
- ConfigSuspiciousIndicator added to all exhaustive RuleId matches

All 443 tests pass. cargo fmt + clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sheeki03 sheeki03 merged commit 849f625 into main Feb 24, 2026
10 checks passed
@sheeki03 sheeki03 deleted the feat/part8-pro-enrichments branch February 24, 2026 19:48
assert_eq!(strip_html_tags("<p>Hello</p>"), "Hello");
assert_eq!(strip_html_tags("<div><span>A</span> <b>B</b></div>"), "A B");
assert_eq!(strip_html_tags("No tags here"), "No tags here");
assert_eq!(strip_html_tags("&amp; &lt;"), "");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low rules/terminal.rs:517

The expected output "" seems incorrect. & and < are HTML entities, not tags. Consider updating the assertion to expect "& <" (if decoding) or "& <" (if preserving), or rename the function if it intentionally strips entities too.

Suggested change
assert_eq!(strip_html_tags("&amp; &lt;"), "");
assert_eq!(strip_html_tags("& <"), "& <");
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/terminal.rs around line 517:

The expected output `""` seems incorrect. `&` and `<` are HTML entities, not tags. Consider updating the assertion to expect `"& <"` (if decoding) or `"& <"` (if preserving), or rename the function if it intentionally strips entities too.

Evidence trail:
crates/tirith-core/src/rules/terminal.rs:517 - test assertion `assert_eq!(strip_html_tags("&amp; &lt;"), "");`
crates/tirith-core/src/rules/terminal.rs:422-438 - function implementation with doc comment "Strip HTML tags to extract approximate visible text content" but line 430 defines ENTITIES regex and line 435 strips entities
crates/tirith-core/src/rules/terminal.rs:392-395 - usage context showing function is used for length comparison in clipboard hidden content detection

Comment on lines +204 to +206
fn test_decode_case_insensitive() {
let key = make_key("PRO", "2099-12-31");
assert_eq!(decode_tier(&key), Some(Tier::Pro));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/license.rs:204

Test name suggests case insensitivity is verified, but only uppercase "PRO" is tested. Consider adding assertions for "pro" or "Pro" to actually validate case-insensitive behavior.

Suggested change
fn test_decode_case_insensitive() {
let key = make_key("PRO", "2099-12-31");
assert_eq!(decode_tier(&key), Some(Tier::Pro));
fn test_decode_case_insensitive() {
let key = make_key("PRO", "2099-12-31");
assert_eq!(decode_tier(&key), Some(Tier::Pro));
let key_lower = make_key("pro", "2099-12-31");
assert_eq!(decode_tier(&key_lower), Some(Tier::Pro));
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/license.rs around lines 204-206:

Test name suggests case insensitivity is verified, but only uppercase `"PRO"` is tested. Consider adding assertions for `"pro"` or `"Pro"` to actually validate case-insensitive behavior.

Evidence trail:
crates/tirith-core/src/license.rs lines 204-207 (test `test_decode_case_insensitive` at REVIEWED_COMMIT) - test only uses `make_key("PRO", ...)` with uppercase.
crates/tirith-core/src/license.rs lines 113-119 (decode_tier implementation at REVIEWED_COMMIT) - shows `.to_lowercase()` is used, confirming case-insensitivity exists but test doesn't verify multiple case variants.

let d = pdf_operand_to_f64(&op.operands[3]).unwrap_or(1.0);
let scale = a.abs().min(d.abs());
if (0.0..1.0).contains(&scale) {
current_scale = scale;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium rules/rendered.rs:531

The cm operator concatenates matrices, so scales should multiply with the existing CTM rather than replace it. Consider changing line 531 to current_scale *= scale; to properly accumulate transformations.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/rendered.rs around line 531:

The `cm` operator concatenates matrices, so scales should multiply with the existing CTM rather than replace it. Consider changing line 531 to `current_scale *= scale;` to properly accumulate transformations.

Evidence trail:
crates/tirith-core/src/rules/rendered.rs lines 525-535 (REVIEWED_COMMIT) - shows `cm` handler setting `current_scale = scale;` instead of multiplying. PDF specification via pdfjs-docs (https://github.com/MeiKatz/pdfjs-docs): 'Modify the current transformation matrix (CTM) by concatenating the specified matrix' confirms cm concatenates (multiplies) matrices. libharu issue #169 (https://github.com/libharu/libharu/issues/169): 'cm operator is an operator that multiplies LEFT to the current matrix' further confirms matrix multiplication semantics.

id,
JsonRpcError {
code: -32602,
code: -32603, // Internal error (not invalid params — uri validated above)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low mcp/dispatcher.rs:306

The error code -32603 is incorrect for unknown resources. The validation above only checks that uri exists and is a string—not that it refers to a valid resource. When read_content returns "Unknown resource", that's still invalid params (-32602), not an internal error.

Suggested change
code: -32603, // Internal error (not invalid params — uri validated above)
code: -32602,
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/mcp/dispatcher.rs around line 306:

The error code `-32603` is incorrect for unknown resources. The validation above only checks that `uri` exists and is a string—not that it refers to a valid resource. When `read_content` returns "Unknown resource", that's still invalid params (`-32602`), not an internal error.

Evidence trail:
crates/tirith-core/src/mcp/dispatcher.rs lines 280-310 (REVIEWED_COMMIT): shows validation only checks uri exists/is a string (lines 281-297), then uses -32603 for read_content errors (line 306) with misleading comment. crates/tirith-core/src/mcp/resources.rs line 75: shows read_content returns Err("Unknown resource: {uri}") for unrecognized URIs. JSON-RPC 2.0 specification defines -32602 as 'Invalid params' for invalid method parameters and -32603 as 'Internal error' for server-side errors.

"clip:rect(0...)",
),
(
Regex::new(r#"(?i)position\s*:\s*(?:absolute|fixed)[^;]*(?:left|top)\s*:\s*-9999"#)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium rules/rendered.rs:66

The [^;]* between position:absolute/fixed and left/top:-9999 won't match across semicolons, but valid CSS separates these with ; (e.g., position:absolute;left:-9999px). Consider changing [^;]* to [^}]* or similar to span multiple declarations.

Suggested change
Regex::new(r#"(?i)position\s*:\s*(?:absolute|fixed)[^;]*(?:left|top)\s*:\s*-9999"#)
Regex::new(r#"(?i)position\s*:\s*(?:absolute|fixed)[^}]*(?:left|top)\s*:\s*-9999"#)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/rules/rendered.rs around line 66:

The `[^;]*` between `position:absolute/fixed` and `left/top:-9999` won't match across semicolons, but valid CSS separates these with `;` (e.g., `position:absolute;left:-9999px`). Consider changing `[^;]*` to `[^}]*` or similar to span multiple declarations.

Evidence trail:
crates/tirith-core/src/rules/rendered.rs line 66: `Regex::new(r#"(?i)position\s*:\s*(?:absolute|fixed)[^;]*(?:left|top)\s*:\s*-9999"#)` - The regex uses `[^;]*` which matches any character except semicolons, but valid CSS like `position:absolute;left:-9999px` uses semicolons between declarations, so the pattern fails to match properly formatted CSS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant