security: extend skill scanner to detect threats in markdown skill definitions by Alex-Alaniz · Pull Request #10705 · openclaw/openclaw

Alex-Alaniz · 2026-02-06T21:37:24Z

Summary

Extends the skill scanner to detect security threats in markdown files (.md), closing a gap where malicious content in SKILL.md skill definitions could bypass code-only scanning.

Motivation: The ClawHavoc advisory revealed 341+ malicious ClawHub skills. While VirusTotal scans code files for known malware signatures, skill metadata lives in markdown (SKILL.md) — which was previously unscanned. Attackers can embed download-and-execute patterns, obfuscated payloads, hidden Unicode (Trojan Source / CVE-2021-42574), and executable data URIs in skill documentation that gets injected into LLM system prompts.

Changes:

Split SCANNABLE_EXTENSIONS into CODE_EXTENSIONS and MARKDOWN_EXTENSIONS to enable file-type-specific rule routing
Added isMarkdown() helper and route scanSource() to use markdown-specific rules for .md files
New markdown line rules: hidden-unicode (zero-width + bidi override characters), markdown-data-uri (executable MIME types)
New markdown source rules: markdown-download-exec (curl|bash patterns), markdown-encoded-payload (large base64 in code blocks), markdown-hex-payload (hex-encoded sequences)
Existing code rules (eval, child_process, etc.) are isolated from markdown files to prevent false positives on documentation examples

Rule isolation: Code-specific rules only fire on code files; markdown-specific rules only fire on .md files. This prevents breaking existing scanner behavior while extending coverage.

Test plan

All 39 tests pass (13 original + 26 new)
New tests cover: zero-width Unicode, RTL overrides, data URIs, curl|bash, wget|sh, large base64 blocks, hex payloads
Rule isolation verified: code rules don't fire on .md, markdown rules don't fire on .ts
Clean SKILL.md produces zero findings
Directory scanning and summary counting work with mixed file types
oxlint passes with 0 warnings/errors
tsgo type checking passes

pnpm vitest run src/security/skill-scanner.test.ts

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Greptile Overview

Greptile Summary

Extends the skill scanner to treat .md as scannable and routes markdown files through a new set of markdown-specific rules.
Adds markdown line rules to detect hidden Unicode/BiDi characters and executable data: URIs.
Adds markdown source rules to flag download-and-execute patterns, large base64 blocks in fenced code, and hex-encoded payload strings.
Updates tests to cover new markdown findings, directory scanning of SKILL.md, summary counting, and rule isolation between code vs markdown.

Confidence Score: 3/5

This PR is likely safe to merge, but a couple of detection rules are brittle and can cause missed detections or noisy false positives.
Core change (routing .md to markdown-specific rules and expanding scannable extensions) is straightforward and well-tested. Main concerns are (1) regex statefulness in scanSource if any rule regex acquires g/y flags, and (2) markdown base64 and download/exec heuristics being either overly broad (false positives) or too narrow (missed common patterns), which undermines scanner correctness.
src/security/skill-scanner.ts

…finitions The skill scanner previously only analyzed JS/TS files for malicious patterns, leaving SKILL.md files completely unscanned. This extends coverage to markdown files with rules targeting embedded technical threats: - Hidden Unicode (zero-width, bidi overrides) used to conceal malicious content - Download-and-execute patterns (curl|wget piped to shell) - Large base64 payloads in code fences (obfuscated content) - Hex-encoded payloads in markdown body - Data URIs with executable MIME types Code rules and markdown rules are applied independently — code patterns (eval, child_process) do not false-positive on markdown documentation, and markdown rules do not fire on JS/TS files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps

_{1 file reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-06T21:39:21Z

+    pattern: /(?:curl|wget)\s+[^\n|]*\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/,
+  },
+  {
+    ruleId: "markdown-encoded-payload",
+    severity: "warn",
+    message: "Large base64 block detected in markdown (possible obfuscated payload)",
+    pattern: /```[^\n]*\n[A-Za-z0-9+/=\s]{400,}\n```/,
+  },


Markdown base64 rule misfires

markdown-encoded-payload matches any 400+ chars of [A-Za-z0-9+/=\s] inside a fenced block (src/security/skill-scanner.ts:186-190). That character class also matches lots of non-base64 content (e.g., long ASCII text, repeated = padding, or whitespace-heavy blocks), so this will reliably generate false positives on large documentation/code samples. If the intent is “base64-looking payload”, it needs a stricter heuristic (e.g., require high base64 density + valid padding + line lengths typical of base64, or require a nearby decode usage marker) to avoid flagging benign docs.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/security/skill-scanner.ts Line: 183:190 Comment: **Markdown base64 rule misfires** `markdown-encoded-payload` matches any 400+ chars of `[A-Za-z0-9+/=\s]` inside a fenced block (`src/security/skill-scanner.ts:186-190`). That character class also matches lots of non-base64 content (e.g., long ASCII text, repeated `=` padding, or whitespace-heavy blocks), so this will reliably generate false positives on large documentation/code samples. If the intent is “base64-looking payload”, it needs a stricter heuristic (e.g., require high base64 density + valid padding + line lengths typical of base64, or require a nearby decode usage marker) to avoid flagging benign docs. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-06T21:39:22Z

+    message: "Download-and-execute pattern detected in markdown content",
+    pattern: /(?:curl|wget)\s+[^\n|]*\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/,
+  },
+  {


Download/exec misses common forms

markdown-download-exec only matches curl|wget ... | <shell> with no newline and requires the pipe in the same line (src/security/skill-scanner.ts:180-184). This will miss very common patterns that are functionally identical in markdown docs, e.g. curl ... | sudo bash, curl ... | bash -s --, bash <(curl ...), or multi-line pipes inside fenced blocks. Given this PR’s goal (closing markdown bypass), this rule as written will fail to flag a large slice of real-world “download and execute” snippets.

This is a correctness gap (missed detections), not just coverage preference.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/security/skill-scanner.ts Line: 182:185 Comment: **Download/exec misses common forms** `markdown-download-exec` only matches `curl|wget ... | <shell>` with no newline and requires the pipe in the same line (`src/security/skill-scanner.ts:180-184`). This will miss very common patterns that are functionally identical in markdown docs, e.g. `curl ... | sudo bash`, `curl ... | bash -s --`, `bash <(curl ...)`, or multi-line pipes inside fenced blocks. Given this PR’s goal (closing markdown bypass), this rule as written will fail to flag a large slice of real-world “download and execute” snippets. This is a correctness gap (missed detections), not just coverage preference. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-06T21:39:24Z

Additional Comments (1)

src/security/skill-scanner.ts
Stateful regex breaks matching

In scanSource, source rules use rule.pattern.test(lines[i]) inside a loop (src/security/skill-scanner.ts:275-281). Several of the new markdown rules (and potentially existing ones) could reasonably be written with g/y flags later; if that happens, repeated .test() calls on the same RegExp instance will advance lastIndex and produce false negatives (including skipping the first matching line). Safer pattern: use a non-stateful regex for per-line checks (e.g., new RegExp(rule.pattern.source, rule.pattern.flags.replace(/g|y/g, ""))) or reset rule.pattern.lastIndex = 0 before each .test()/.exec() call.

Also applies to the earlier line-rule use of rule.pattern.exec(line) (src/security/skill-scanner.ts:231).

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/security/skill-scanner.ts
Line: 275:281

Comment:
**Stateful regex breaks matching**

In `scanSource`, source rules use `rule.pattern.test(lines[i])` inside a loop (`src/security/skill-scanner.ts:275-281`). Several of the new markdown rules (and potentially existing ones) could reasonably be written with `g`/`y` flags later; if that happens, repeated `.test()` calls on the same `RegExp` instance will advance `lastIndex` and produce false negatives (including skipping the first matching line). Safer pattern: use a non-stateful regex for per-line checks (e.g., `new RegExp(rule.pattern.source, rule.pattern.flags.replace(/g|y/g, ""))`) or reset `rule.pattern.lastIndex = 0` before each `.test()`/`.exec()` call.

Also applies to the earlier line-rule use of `rule.pattern.exec(line)` (`src/security/skill-scanner.ts:231`).

How can I resolve this? If you propose a fix, please make it concise.

openclaw-barnacle · 2026-02-21T04:30:29Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

Alex-Alaniz · 2026-02-23T15:51:19Z

@openclaw-bot and @openclaw-barncale @thewilloftheshadow thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f
last week

Alex-Alaniz · 2026-02-23T15:52:20Z

This pull request has been automatically marked as stale due to inactivity. Please add updates or it will be closed.

@thewilloftheshadow thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f
last week

openclaw-barnacle · 2026-03-08T04:10:11Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

openclaw-barnacle · 2026-03-11T04:11:17Z

Closing due to inactivity.
If you believe this PR should be revived, post in #pr-thunderdome-dangerzone on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

greptile-apps Bot reviewed Feb 6, 2026

View reviewed changes

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Feb 21, 2026

openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Feb 24, 2026

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Mar 8, 2026

openclaw-barnacle Bot closed this Mar 11, 2026

Alex-Alaniz mentioned this pull request Mar 11, 2026

security: scan markdown skill definitions for injection threats #43469

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

security: extend skill scanner to detect threats in markdown skill definitions#10705

security: extend skill scanner to detect threats in markdown skill definitions#10705
Alex-Alaniz wants to merge 1 commit into
openclaw:mainfrom
Alex-Alaniz:security/scan-markdown-skill-definitions

Alex-Alaniz commented Feb 6, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Feb 6, 2026

Uh oh!

greptile-apps Bot Feb 6, 2026

Uh oh!

greptile-apps Bot commented Feb 6, 2026

Uh oh!

openclaw-barnacle Bot commented Feb 21, 2026

Uh oh!

Alex-Alaniz commented Feb 23, 2026

Uh oh!

Alex-Alaniz commented Feb 23, 2026

Uh oh!

openclaw-barnacle Bot commented Mar 8, 2026

Uh oh!

openclaw-barnacle Bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Alex-Alaniz commented Feb 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Feb 6, 2026

Uh oh!

openclaw-barnacle Bot commented Feb 21, 2026

Uh oh!

Alex-Alaniz commented Feb 23, 2026

Uh oh!

Alex-Alaniz commented Feb 23, 2026

Uh oh!

openclaw-barnacle Bot commented Mar 8, 2026

Uh oh!

openclaw-barnacle Bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alex-Alaniz commented Feb 6, 2026 •

edited by greptile-apps Bot

Loading