Skip to content

Add FP reduction (script-aware ZWJ/ZWNJ) and OCR confusion detection#45

Merged
sheeki03 merged 11 commits intomainfrom
feat/part3-fp-reduction-ocr-confusion
Feb 24, 2026
Merged

Add FP reduction (script-aware ZWJ/ZWNJ) and OCR confusion detection#45
sheeki03 merged 11 commits intomainfrom
feat/part3-fp-reduction-ocr-confusion

Conversation

@sheeki03
Copy link
Owner

@sheeki03 sheeki03 commented Feb 24, 2026

Summary

  • Add false positive reduction: script-aware ZWJ/ZWNJ suppression, BOM handling
  • Add OCR confusion character detection
  • Includes Parts 1-2 (Info severity, HTTPie, invisible char hardening) and bug fixes

Test plan

  • All existing tests pass

🤖 Generated with Claude Code

Note

Add inline bypass detection and self-invocation resolution in crates/tirith-core/src/engine.rs while introducing a duplicate is_tirith_command that breaks the build

Introduce engine.find_inline_bypass, engine.split_raw_words, and wrapper resolvers for env, command, and time, add engine.is_self_invocation, and accidentally duplicate engine.is_tirith_command, causing a compile error in engine.rs.

📍Where to Start

Start with find_inline_bypass and the new tokenization path in engine.rs, then review the duplicate is_tirith_command definition at the bottom of the file.

Macroscope summarized 1e25ba1.

sheeki03 and others added 7 commits February 21, 2026 13:58
push_segment() incorrectly treated VAR=VALUE as the command token. Now
skips leading environment variable assignments to find the real command.
Adds pub is_env_assignment() helper for use by engine bypass detection.

Fixes: TIRITH=0 curl evil.com now correctly identifies curl as command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add command-aware output-flag skipping for curl (-o/--output) and wget
(-O/-OFILE/--output-document). Extract URLs from command+args instead
of raw segment text to avoid matching URLs in env-prefix values.

Add conservative non-TLD file extensions (.png, .jpg, .mp4, etc.) to
schemeless host exclusion list. Fixes issue #33.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
return true;
}
// Path form: try canonicalize and compare to current_exe
if cmd.contains('/') {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium src/engine.rs:281

On Windows, paths may use backslashes. Consider also checking for \ so paths like .\tirith or C:\path\to\tirith.exe are handled correctly.

Suggested change
if cmd.contains('/') {
if cmd.contains('/') || cmd.contains('\\') {
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/engine.rs around line 281:

On Windows, paths may use backslashes. Consider also checking for `\` so paths like `.\tirith` or `C:\path\to\tirith.exe` are handled correctly.

Evidence trail:
crates/tirith-core/src/engine.rs lines 276-293 (REVIEWED_COMMIT): `is_tirith_command` function checks `cmd.contains('/')` at line 281 to determine if cmd is a path. Windows backslash paths would not match this condition.

crates/tirith/src/cli/init.rs line 71 (REVIEWED_COMMIT): `#[cfg(windows)]` confirms the project explicitly supports Windows.

- Run cargo fmt --all
- Fix uninlined_format_args clippy lints in cli_integration tests
- Add .cargo/audit.toml ignoring RUSTSEC-2026-0009 (time crate DoS,
  not exploitable in our usage, fix requires Rust 1.88)
- Add same ignore to deny.toml
sheeki03 and others added 2 commits February 24, 2026 22:57
- Merge origin/main (glibc build fix)
- Fix single & segment boundary in split_raw_words (security)
- Use exact match == TIRITH=0 (prevents false bypass)
- Skip flags in resolve_command_wrapper
- Remove dead code in is_tirith_command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
i += 1;
}
}
'"' => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/engine.rs:134

Quoted strings preserve quote characters (e.g., "TIRITH=0" becomes literal "TIRITH=0"), so the comparison == "TIRITH=0" won't match quoted input. Consider stripping outer quotes from each word before returning.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/tirith-core/src/engine.rs around line 134:

Quoted strings preserve quote characters (e.g., `"TIRITH=0"` becomes literal `"TIRITH=0"`), so the comparison `== "TIRITH=0"` won't match quoted input. Consider stripping outer quotes from each word before returning.

Evidence trail:
crates/tirith-core/src/engine.rs lines 134-149 (REVIEWED_COMMIT): `split_raw_words` function explicitly pushes opening quote at line 135 (`current.push(ch);`) and closing quote at line 148 (`current.push(chars[i]);`), preserving quote characters in the returned string.

crates/tirith-core/src/engine.rs line 46 (REVIEWED_COMMIT): comparison `if words[idx] == "TIRITH=0"` compares against literal string without quotes.

Result: Input `"TIRITH=0" cmd` produces word `"\"TIRITH=0\""` which fails equality test against `"TIRITH=0"`.

Resolve 12 file conflicts by taking main's versions for shell hooks,
deny.toml, doctor.rs, extract.rs, and tokenize.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sheeki03 sheeki03 merged commit 002f621 into main Feb 24, 2026
4 of 9 checks passed
@sheeki03 sheeki03 deleted the feat/part3-fp-reduction-ocr-confusion branch February 24, 2026 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant