Skip to content

security: scan SKILL.md and HEARTBEAT.md for injection patterns (#17 + #21)#31592

Open
let5sne wants to merge 4 commits intoopenclaw:mainfrom
let5sne:fix/skill-heartbeat-injection-scan
Open

security: scan SKILL.md and HEARTBEAT.md for injection patterns (#17 + #21)#31592
let5sne wants to merge 4 commits intoopenclaw:mainfrom
let5sne:fix/skill-heartbeat-injection-scan

Conversation

@let5sne
Copy link
Contributor

@let5sne let5sne commented Mar 2, 2026

Summary

Two warn-only security scans wired into existing load flows:

Fix #21 — SKILL.md scan in loadSkillEntries

src/agents/skills/workspace.ts: call scanSource() on SKILL.md content when building skillEntries. Warns on critical findings (dangerous-exec, dynamic-code-execution, crypto-mining, env-harvesting). Non-blocking.

Fix #17 — HEARTBEAT.md scan in runHeartbeatPreflight

src/infra/heartbeat-runner.ts: call detectSuspiciousPatterns() on HEARTBEAT.md content after reading, before injecting into agent prompt. Warns on suspicious patterns. Non-blocking.

Changes

  • src/agents/skills/workspace.ts: +8 lines (import + scan in map loop)
  • src/infra/heartbeat-runner.ts: +7 lines (import + scan after readFile)

Both checks are purely additive and warn-only — no behavior change on clean input.

OpenClaw Explorer added 4 commits March 2, 2026 07:06
Implements the 'rejection log' insight from reflections: every skipped
verification is a loan against future trust. This module creates an
auditable ledger of deferred security checks.

Features:
- Track debt by category (security_audit, skill_scan, api_health, etc.)
- Risk scoring (1-10) with age multiplier (2x after 1 week)
- Resolve + prune workflow
- Summary API for dashboard/doctor integration

New file: src/security/verification-debt.ts (~150 lines)

Fixes improvement item openclaw#28 from openclaw-improvement-ideas.md.
Adds debt score display to doctor command output:
- Shows total/unresolved debt count
- Highlights high-risk items (score >= 7)
- Breaks down by category
- Suggests remediation command

New file: src/commands/doctor-verification-debt.ts (~60 lines)
Modified: src/commands/doctor.ts (import + call)
- workspace.ts: call scanSource() on SKILL.md content during loadSkillEntries,
  warn on critical findings (dangerous-exec, dynamic-code-execution, etc.)
- heartbeat-runner.ts: call detectSuspiciousPatterns() on HEARTBEAT.md content
  before injecting into agent prompt, warn on suspicious patterns

Both checks are warn-only (non-blocking), purely additive.

Fixes openclaw#21 (detectSkillPatterns wired into load flow)
Fixes openclaw#17 (HEARTBEAT.md injection scan)
@openclaw-barnacle openclaw-barnacle bot added commands Command implementations agents Agent runtime and tooling size: M labels Mar 2, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21b74f7bdd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +46 to +47
} catch {
return { version: 1, entries: [], lastPruned: Date.now() };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve debt state when load fails

loadVerificationDebt treats every read/parse error as “no debt” by returning an empty ledger, so a malformed verification-debt.json or an EACCES read failure is silently interpreted as a clean state; doctor can then report security debt is up to date, and any later write path (for example addVerificationDebt) will overwrite the existing file from an empty baseline, losing unresolved entries. Please only default on ENOENT and surface other errors.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Added warn-only security scans for SKILL.md and HEARTBEAT.md files, plus verification debt tracking infrastructure. The scans detect malicious patterns in skill files (dangerous-exec, dynamic-code-execution, crypto-mining, env-harvesting) and injection patterns in heartbeat files (ignore previous instructions, etc.). Both are non-blocking and only log warnings.

  • src/agents/skills/workspace.ts — calls scanSource() on SKILL.md content and logs critical findings
  • src/infra/heartbeat-runner.ts — calls detectSuspiciousPatterns() on HEARTBEAT.md content before injecting into agent prompt
  • src/security/verification-debt.ts — new module providing debt tracking API (functions: addVerificationDebt, resolveVerificationDebt, calculateDebtScore, etc.)
  • src/commands/doctor-verification-debt.ts — integrates debt display into doctor command
  • src/commands/doctor.ts — calls noteVerificationDebt() before outro

The verification debt tracking infrastructure is fully implemented but not yet wired up—no code actually calls addVerificationDebt() to create debt entries, so the doctor command will always report zero debt until future PRs connect it.

Confidence Score: 4/5

  • Safe to merge - additive warn-only features with sound implementation
  • The security scans are correctly implemented as warn-only features that don't alter existing behavior. Code quality is solid with proper error handling. Minor concerns: (1) verification debt infrastructure is added but never called, which means it's dead code until wired up in future PRs, and (2) PR scope is broader than the description suggests. However, these are process issues rather than code quality or safety issues.
  • No files require special attention - all changes are straightforward additive features

Last reviewed commit: 21b74f7

Copy link

@mdlmarkham mdlmarkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Hephaestus Security Review

Summary

This PR adds security scanning for SKILL.md and HEARTBEAT.md files to detect malicious injection patterns. Overall, this is a positive security improvement with a conservative warn-only approach.

✅ Strengths

  1. Uses existing, tested modules - scanSource and detectSuspiciousPatterns are already implemented with rule sets covering:

    • Shell/exec injection (dangerous-exec)
    • Dynamic code execution (eval, Function constructor)
    • Crypto-mining patterns
    • Environment harvesting + network exfil
    • Prompt injection patterns ("ignore previous instructions", etc.)
  2. Non-blocking approach - Warn-only is safe for initial rollout, avoids breaking legitimate use cases while gaining visibility.

  3. Good integration points - Scanning happens at load time (skills) and preflight (heartbeat), right before content enters agent context.

  4. Verification debt concept - Interesting approach to tracking deferred security checks.

⚠️ Questions/Concerns

  1. Warn-only is conservative but may miss active attacks - If a malicious skill is loaded, it still loads. Consider a future config option like security.blockCriticalFindings: true for high-security deployments.

  2. Verification debt module is added but not integrated - The verification-debt.ts module defines the debt tracking system, but I don't see it being used to actually record when verifications are skipped. Is this intentional scaffolding for future work?

  3. Debt file location - state/verification-debt.json is workspace-local. An attacker with workspace file access could tamper with debt records. Consider whether this matters for the threat model.

  4. No tests for new integration paths - The scanner modules have tests, but the new callsites in workspace.ts and heartbeat-runner.ts don't appear to have test coverage. Might be worth adding smoke tests.

🔍 Pattern Coverage Note

The suspicious patterns in external-content.ts cover common prompt injection attempts but attack techniques evolve. Examples not currently matched:

  • Roleplay-based injections ("You are DAN...")
  • Multi-step injections that split commands across messages
  • Unicode homoglyph tricks

This is fine for now - security scanners are defense-in-depth, not perfect defense.

Verdict

Approve with minor suggestions. The warn-only approach is appropriate for initial deployment. Consider:

  1. Adding tests for the integration points
  2. Clarifying the verification debt integration path
  3. Future: consider config option to block on critical findings

Good security-forward change. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Question: Heartbeat behavior with group chat activity

2 participants