security: scan SKILL.md and HEARTBEAT.md for injection patterns (#17 + #21)#31592
security: scan SKILL.md and HEARTBEAT.md for injection patterns (#17 + #21)#31592let5sne wants to merge 4 commits intoopenclaw:mainfrom
Conversation
Implements the 'rejection log' insight from reflections: every skipped verification is a loan against future trust. This module creates an auditable ledger of deferred security checks. Features: - Track debt by category (security_audit, skill_scan, api_health, etc.) - Risk scoring (1-10) with age multiplier (2x after 1 week) - Resolve + prune workflow - Summary API for dashboard/doctor integration New file: src/security/verification-debt.ts (~150 lines) Fixes improvement item openclaw#28 from openclaw-improvement-ideas.md.
Adds debt score display to doctor command output: - Shows total/unresolved debt count - Highlights high-risk items (score >= 7) - Breaks down by category - Suggests remediation command New file: src/commands/doctor-verification-debt.ts (~60 lines) Modified: src/commands/doctor.ts (import + call)
- workspace.ts: call scanSource() on SKILL.md content during loadSkillEntries, warn on critical findings (dangerous-exec, dynamic-code-execution, etc.) - heartbeat-runner.ts: call detectSuspiciousPatterns() on HEARTBEAT.md content before injecting into agent prompt, warn on suspicious patterns Both checks are warn-only (non-blocking), purely additive. Fixes openclaw#21 (detectSkillPatterns wired into load flow) Fixes openclaw#17 (HEARTBEAT.md injection scan)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21b74f7bdd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } catch { | ||
| return { version: 1, entries: [], lastPruned: Date.now() }; |
There was a problem hiding this comment.
Preserve debt state when load fails
loadVerificationDebt treats every read/parse error as “no debt” by returning an empty ledger, so a malformed verification-debt.json or an EACCES read failure is silently interpreted as a clean state; doctor can then report security debt is up to date, and any later write path (for example addVerificationDebt) will overwrite the existing file from an empty baseline, losing unresolved entries. Please only default on ENOENT and surface other errors.
Useful? React with 👍 / 👎.
Greptile SummaryAdded warn-only security scans for SKILL.md and HEARTBEAT.md files, plus verification debt tracking infrastructure. The scans detect malicious patterns in skill files (
The verification debt tracking infrastructure is fully implemented but not yet wired up—no code actually calls Confidence Score: 4/5
Last reviewed commit: 21b74f7 |
mdlmarkham
left a comment
There was a problem hiding this comment.
🔧 Hephaestus Security Review
Summary
This PR adds security scanning for SKILL.md and HEARTBEAT.md files to detect malicious injection patterns. Overall, this is a positive security improvement with a conservative warn-only approach.
✅ Strengths
-
Uses existing, tested modules -
scanSourceanddetectSuspiciousPatternsare already implemented with rule sets covering:- Shell/exec injection (dangerous-exec)
- Dynamic code execution (eval, Function constructor)
- Crypto-mining patterns
- Environment harvesting + network exfil
- Prompt injection patterns ("ignore previous instructions", etc.)
-
Non-blocking approach - Warn-only is safe for initial rollout, avoids breaking legitimate use cases while gaining visibility.
-
Good integration points - Scanning happens at load time (skills) and preflight (heartbeat), right before content enters agent context.
-
Verification debt concept - Interesting approach to tracking deferred security checks.
⚠️ Questions/Concerns
-
Warn-only is conservative but may miss active attacks - If a malicious skill is loaded, it still loads. Consider a future config option like
security.blockCriticalFindings: truefor high-security deployments. -
Verification debt module is added but not integrated - The
verification-debt.tsmodule defines the debt tracking system, but I don't see it being used to actually record when verifications are skipped. Is this intentional scaffolding for future work? -
Debt file location -
state/verification-debt.jsonis workspace-local. An attacker with workspace file access could tamper with debt records. Consider whether this matters for the threat model. -
No tests for new integration paths - The scanner modules have tests, but the new callsites in
workspace.tsandheartbeat-runner.tsdon't appear to have test coverage. Might be worth adding smoke tests.
🔍 Pattern Coverage Note
The suspicious patterns in external-content.ts cover common prompt injection attempts but attack techniques evolve. Examples not currently matched:
- Roleplay-based injections ("You are DAN...")
- Multi-step injections that split commands across messages
- Unicode homoglyph tricks
This is fine for now - security scanners are defense-in-depth, not perfect defense.
Verdict
Approve with minor suggestions. The warn-only approach is appropriate for initial deployment. Consider:
- Adding tests for the integration points
- Clarifying the verification debt integration path
- Future: consider config option to block on critical findings
Good security-forward change. 👍
Summary
Two warn-only security scans wired into existing load flows:
Fix #21 — SKILL.md scan in
loadSkillEntriessrc/agents/skills/workspace.ts: callscanSource()on SKILL.md content when buildingskillEntries. Warns on critical findings (dangerous-exec, dynamic-code-execution, crypto-mining, env-harvesting). Non-blocking.Fix #17 — HEARTBEAT.md scan in
runHeartbeatPreflightsrc/infra/heartbeat-runner.ts: calldetectSuspiciousPatterns()on HEARTBEAT.md content after reading, before injecting into agent prompt. Warns on suspicious patterns. Non-blocking.Changes
src/agents/skills/workspace.ts: +8 lines (import + scan in map loop)src/infra/heartbeat-runner.ts: +7 lines (import + scan after readFile)Both checks are purely additive and warn-only — no behavior change on clean input.