Skip to content

fix(approval): de-obfuscate shell idioms in dangerous-command detection#25795

Open
GodsBoy wants to merge 1 commit into
NousResearch:mainfrom
GodsBoy:fix/approval-gate-obfuscation-hardening
Open

fix(approval): de-obfuscate shell idioms in dangerous-command detection#25795
GodsBoy wants to merge 1 commit into
NousResearch:mainfrom
GodsBoy:fix/approval-gate-obfuscation-hardening

Conversation

@GodsBoy

@GodsBoy GodsBoy commented May 14, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Extends the approval gate's pre-match normalization to de-obfuscate shell idioms that hide an otherwise-matched dangerous command, and widens or adds DANGEROUS_PATTERNS entries for idiom classes the regex tables did not model. This is the RG-4 follow-up to advisory GHSA-6cjf-cff6-j9mg, filed per SECURITY.md section 3.2's invitation to harden the in-process heuristic. It does not change the security posture: the boundary remains OS-level isolation (section 2.2); this makes the cooperative-mode catch more complete.

Related Issue

Fixes #25793

Type of Change

  • 🔒 Security fix

Changes Made

tools/approval.py:

  • _normalize_command_for_detection now runs a bounded fixed-point de-obfuscation loop (_deobfuscate) covering ${IFS} / $IFS expansion, ANSI-C $'...' quoting (hex / octal / named / unicode escapes), empty and adjacent quote splits, and eval / command / builtin wrapper-prefix stripping. Non-idempotent rules are drained to a local fixed point per iteration (_drain), so single-rule nesting of any depth collapses regardless of the iteration cap. Input that still does not converge is surfaced for approval rather than trusted as a clean no-match.
  • DANGEROUS_PATTERNS: widened the chmod world-writable pattern to accept octal-prefixed modes (0777 / 0666) and the shell--c pattern to include dash / ash; added base64-decode-piped-to-shell and world-writable-path script-execution patterns. The two new patterns are command-position anchored, so a shell binary name appearing as a non-command token does not false-positive in the tui_gateway shell.exec hard gate.
  • check_all_command_guards passes the normalized command to the smart-approval LLM path, so it assesses the de-obfuscated form.
  • Gave two gateway-protection patterns distinct descriptions. They previously shared one description, and since the description is the approval key, approving one silently approved the other.

tests/tools/test_approval.py:

  • New regression tests for every idiom class, each pairing a true-positive catch with a false-positive guard, plus idempotency, non-convergence, description-uniqueness, and shell.exec-path coverage.

How to Test

  1. scripts/run_tests.sh tests/tools/test_approval.py: 238 tests pass (new suites: TestShellObfuscationBypass, TestProngBPatternWidening, TestProngBStructuralPatterns, TestObfuscationHardeningConsolidation, TestSmartApproveNormalization).
  2. Broader detection-layer regression: scripts/run_tests.sh tests/tools/ tests/test_tui_gateway_server.py tests/gateway/test_approve_deny_commands.py: 595 pass.
  3. Each new test pairs a true-positive (the obfuscated form is now caught) with a false-positive guard (a legitimate command that must not be flagged).

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(approval):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (two files: tools/approval.py, tests/tools/test_approval.py)
  • I've run the test suite via scripts/run_tests.sh (the repo's canonical runner) and all relevant tests pass
  • I've added tests for my changes
  • I've tested on my platform: Ubuntu 24.04

Documentation & Housekeeping

  • I've updated relevant documentation: docstrings on the new functions and detailed pattern comments; no user-facing docs needed
  • N/A: no config keys added or changed
  • N/A: no architecture or workflow change
  • Cross-platform: the change is pure regex and string normalization with no OS-specific behavior
  • N/A: no tool description or schema change

Post-Deploy Monitoring & Validation

This is an in-process detection-heuristic change; there is no runtime service to deploy.

  • What to watch: reports of legitimate commands being newly prompted or blocked (false positives). The two new DANGEROUS_PATTERNS entries are command-position anchored specifically to avoid false-positive hard-blocks in tui_gateway shell.exec, and every idiom class ships a false-positive guard test.
  • Healthy signal: the existing detection-layer tests stay green; no uptick in approval-prompt complaints for previously-unprompted commands.
  • Failure signal / rollback trigger: a confirmed false positive on a common command. The change is a self-contained two-file diff and reverts cleanly.
  • Debug aid: _deobfuscate emits a logger.debug line when the iteration cap is hit without converging; that log indicates pathologically nested input reaching the normalizer.

Notes for reviewers

  • Command substitution is out of scope. Decoding the payload inside $(...) or backticks is deliberately not attempted, consistent with the existing narrow kill $(pgrep patterns.
  • Non-convergence routing is deliberate. When de-obfuscation does not converge within the iteration cap, the input is unidentified (still obfuscated); it is routed to the DANGEROUS approval-prompt tier rather than HARDLINE, because unconditionally blocking an input we cannot classify is more aggressive than the plan's hardline restraint. This is still a strict improvement over the pre-change baseline, where deeply nested obfuscation slipped through entirely.
  • command <hardline-word> consistency. Because wrapper-stripping exposes the inner token at a command-start position, command reboot-helper hardline-matches the same way ; reboot-helper already does today. This is a documented, consistent extension of pre-existing hardline-pattern behavior, not a new defect; tightening the hardline anchors was deliberately kept out of scope (it touches the unconditional-block tier).
  • PoC hygiene: bypass strings appear only in the regression-test fixtures, which is the one legitimate public place for them.

Extend the approval gate's pre-match normalization to de-obfuscate shell
idioms that hide an otherwise-matched dangerous command, and widen/add
dangerous-pattern entries for idiom classes the regex tables did not model.
Follow-up to GHSA-6cjf-cff6-j9mg per SECURITY.md section 3.2's invitation to
harden the in-process heuristic.

Normalization (_normalize_command_for_detection) now runs a bounded
fixed-point de-obfuscation loop covering ${IFS} expansion, ANSI-C $'...'
quoting, empty-quote splits, and eval/command/builtin wrapper-prefix
stripping. Non-idempotent rules are drained to a local fixed point per
iteration so single-rule nesting of any depth collapses regardless of the
iteration cap; input that still does not converge is surfaced for approval
rather than trusted as a clean no-match.

Pattern-table changes: chmod octal-prefixed world-writable modes, alternate
shell binaries (dash/ash) via -c, base64-decode piped to a shell, and shell
execution of a script in a world-writable or transient path. The base64 and
script-indirection patterns are command-position anchored so a shell binary
name appearing as a non-command token does not false-positive in the
tui_gateway shell.exec hard gate.

Also passes the normalized command to the smart-approval LLM path so it
assesses the de-obfuscated form, and gives two gateway-protection patterns
distinct descriptions (a shared description silently cross-approved them).

No change to the security posture or the OS-isolation boundary; this is a
heuristic extension of the cooperative-mode catch. Adds regression tests
covering each idiom class with true-positive and false-positive guards.
@alt-glitch alt-glitch added type/security Security vulnerability or hardening P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets labels May 14, 2026
@GodsBoy

GodsBoy commented May 14, 2026

Copy link
Copy Markdown
Contributor Author

CI note: the test job is red, but the failures are pre-existing on main and unrelated to this PR.

  • This PR touches exactly two files: tools/approval.py and tests/tools/test_approval.py. The CI run shows 22904 passed, 46 failed, 1 error, with zero failures in test_approval.py — every approval test, including the new ones, passes.
  • The 46 failures plus 1 error are all in unrelated modules: test_bedrock_* and test_tts_kittentts fail on missing botocore / numpy in the CI test environment, and there are pre-existing failures in test_auxiliary_client, test_provider_parity, test_dingtalk, test_feishu_bot_admission, test_wecom_callback, test_weixin, test_matrix, test_platform_http_client_limits, test_compression_feasibility, test_context_compressor_summary_continuity, test_switch_model_context, test_registry, test_transcription, test_update_autostash, test_plugin_discovery, and test_background_review. None of them import or exercise tools/approval.py.
  • The same test job is red on the latest main push (run 25867607075), so this is a pre-existing state, not a regression introduced here.
  • Locally, scripts/run_tests.sh tests/tools/test_approval.py is 238/238 green, and the broader detection-layer suite (tests/tools/, tests/test_tui_gateway_server.py, tests/gateway/test_approve_deny_commands.py) is 595/595 green.

All other checks are green: ruff enforcement, ruff + ty diff, check-attribution, build-amd64, build-arm64, nix (macOS + Ubuntu), e2e, Windows footguns, and the supply-chain scan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists type/security Security vulnerability or hardening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

approval gate misses shell-idiom-obfuscated dangerous commands (RG-4 / GHSA-6cjf-cff6-j9mg follow-up)

2 participants