Skip to content

Feature: OSS Security Forensics Skill — Supply Chain Investigation, Evidence Recovery, and Forensic Analysis (inspired by RAPTOR) #384

@teknium1

Description

@teknium1

Overview

RAPTOR includes an elaborate OSS Security Forensics system — a multi-agent investigation framework for researching open-source supply chain attacks. It uses GitHub Archive (BigQuery), Wayback Machine, GitHub API, local git analysis, and IOC extraction to reconstruct attack timelines, recover deleted evidence, and produce forensic investigation reports. This is RAPTOR's most sophisticated module, spanning 9 specialized agent definitions, 5 skill files (1800+ lines combined), and a complete Pydantic v2 evidence schema system.

Supply chain attacks are one of the most critical threats in modern software development (SolarWinds, XZ Utils, event-stream, ua-parser-js). Hermes Agent currently has no capability for investigating suspicious activity in open-source repositories, recovering deleted content, or conducting structured forensic analysis. The existing OSINT skill (#355) covers public records investigation (campaign finance, government contracts) — a completely different domain.

This issue proposes an OSS Security Forensics skill that adapts RAPTOR's investigation framework into a Hermes Agent skill, enabling users to investigate suspicious GitHub activity, recover deleted commits and PRs, analyze supply chain compromise indicators, and produce structured forensic reports with evidence chains.


Research Findings

How RAPTOR's OSS Forensics Works

Architecture: Multi-Agent Investigation Framework

RAPTOR defines 9 specialized agents that work together in a 7-phase orchestration:

Phase 0: Initialize — Create investigation working directory, initialize evidence.json
Phase 1: Parse Prompt — Extract repos, actors, dates, URLs, and IOCs from investigation request
Phase 2: Parallel Evidence Collection — Spawn 5 specialist investigators simultaneously:

  1. GH Archive Investigator — Queries tamper-proof GitHub event history via BigQuery. Covers all 12 GitHub event types (PushEvent, PullRequestEvent, IssuesEvent, etc.). Detects force pushes via zero-commit PushEvents, automation abuse via WorkflowRunEvents, and deleted content that remains in the immutable archive.

  2. GitHub API Investigator — Queries current GitHub API for commits, issues, PRs, files, branches, tags, releases, forks. Cross-references with archive data to detect discrepancies (content present in archive but missing from current API = deletion detected).

  3. Wayback Machine Investigator — Uses CDX API (web.archive.org/cdx/search/cdx) to search archived snapshots of GitHub pages. Recovers deleted READMEs, issues, PRs, wiki pages, release notes, and fork network pages.

  4. Local Git Investigator — Analyzes the local git repository including dangling commits (force-pushed but not garbage-collected), reflog, blame, and diff analysis.

  5. IOC Extractor — Extracts Indicators of Compromise from vendor security reports: commit SHAs, file paths, API keys, secrets, IP addresses, domains, package names, actor usernames.

Phase 3: Hypothesis Formation — An oss-hypothesis-former-agent synthesizes evidence from all 5 investigators into testable hypotheses. Each hypothesis must cite evidence by ID. Follow-up evidence requests trigger additional collection rounds (max N iterations).

Phase 4: Evidence Verification — An oss-evidence-verifier-agent validates each piece of evidence against original sources using a ConsistencyVerifier.

Phase 5: Hypothesis Validation — An oss-hypothesis-checker-agent mechanically validates hypotheses:

  • Rejects hypotheses with uncited or unverified evidence
  • Checks for logical consistency
  • Produces rebuttals that feed back into hypothesis refinement (max iterations enforced)

Phase 6: Report Generation — An oss-report-generator-agent produces the final forensic report.

Phase 7: Completion

GitHub Archive Skill (958 lines)

The most detailed skill file. Key capabilities:

  • BigQuery cost optimization: Dry runs before every query, column selection (only query needed fields), date range narrowing, _TABLE_SUFFIX filtering
  • Safe query template: safe_gharchive_query() wrapper with automatic cost controls
  • 12 event type coverage: Push, PR, Issue, IssueComment, Watch, Fork, Create, Delete, Release, Member, Public, WorkflowRun
  • Real-world examples: Amazon Q/aws-toolkit-vscode incident investigation, Istio supply chain token leak
  • Deleted content detection: Cross-reference archive events against current repo state

Evidence Schema System (Pydantic v2)

Complete typed evidence framework:

  • 12 Event types: PushEvent, PullRequestEvent, IssueEvent, etc.
  • 10 Observation types: CommitObservation, IssueObservation, IOC, etc.
  • 15 IOC types: COMMIT_SHA, FILE_PATH, API_KEY, SECRET, IP_ADDRESS, DOMAIN, PACKAGE_NAME, ACTOR_USERNAME, etc.
  • EvidenceSource enum: gharchive, git, github, wayback, security_vendor
  • VerificationInfo: source + URL + BigQuery table reference
  • EvidenceStore: Add/query/filter/save/load/verify evidence collections

Deleted Commit Recovery Skill (303 lines)

Key insight: Force-pushed commits are NEVER deleted from GitHub servers.

3 access methods:

  1. Direct web URL: github.com/<owner>/<repo>/commit/<sha>.patch (append .patch or .diff)
  2. REST API: api.github.com/repos/<owner>/<repo>/git/commits/<sha>
  3. Git fetch: git fetch origin <sha> (works even after force push)

Real-world example: Istio supply chain — recovered a PAT token from a force-pushed commit (worth $25k bounty).

Anti-Hallucination Patterns

  • "STAY IN YOUR LANE": Every agent has explicit role boundaries. E.g., "You are a SPECIALIST INVESTIGATOR for GH Archive BigQuery collection ONLY. You do NOT query GitHub API, recover deleted content, or perform local git forensics."
  • Evidence-first: Every claim must cite evidence by ID. No assertion without citation.
  • Mechanical validation: Hypothesis checker uses grep-like checks to detect fabricated evidence.
  • Proof-required disproval: Can't dismiss a hypothesis without explaining why.

Current State in Hermes Agent

What we have:

  • domain-intel skill — passive DNS/WHOIS/SSL reconnaissance
  • web_extract tool — can fetch archived web pages, GitHub pages
  • terminal tool — can run git commands, curl, API calls
  • execute_code tool — can run Python scripts for data analysis
  • delegate_task tool — can spawn parallel sub-agents (maps to RAPTOR's parallel investigators)
  • github-* skills — repository management, issues, PRs, code review
  • session_search — past session recall (useful for multi-session investigations)

What we don't have:

  • No forensic investigation framework
  • No GitHub Archive / BigQuery integration
  • No deleted content recovery workflow
  • No evidence chain construction or verification
  • No supply chain attack investigation capability
  • No IOC extraction or analysis
  • No hypothesis formation/validation methodology for investigations

Relevant existing issues:


Implementation Plan

Skill vs. Tool Classification

This should be a skill because:

  • All data sources are accessible via existing tools: web_extract (Wayback Machine, GitHub pages), terminal (git commands, curl for GitHub API, BigQuery CLI), execute_code (Python data processing)
  • Evidence management is file-based (JSON evidence stores)
  • The investigation methodology is instructions + reasoning — exactly what skills are for
  • Multi-agent orchestration maps naturally to delegate_task sub-agents
  • No custom Python integration needed in the agent harness
  • No binary data, streaming, or real-time events

Bundled vs. Skills Hub: Recommend Skills Hub. OSS forensics is specialized (security teams, incident responders, open-source maintainers). Full capability requires BigQuery credentials for GitHub Archive access.

Category: security (same category as Code Security Audit and Binary Security Analysis skills)

What We'd Need

  1. SKILL.md — Investigation workflow with trigger conditions, phase descriptions, and agent orchestration instructions
  2. references/github-archive-guide.md — How to query GH Archive via BigQuery (adapted from RAPTOR's 958-line skill, with cost optimization guidance)
  3. references/evidence-types.md — Event types, observation types, IOC types, and evidence source taxonomy
  4. references/recovery-techniques.md — How to recover deleted commits, PRs, issues from GitHub and Wayback Machine
  5. references/investigation-templates.md — Common investigation scenarios (supply chain attack, maintainer compromise, credential leak, typosquatting)
  6. scripts/evidence-store.py — Lightweight evidence management script (add, query, filter, verify, export)
  7. templates/forensic-report.md — Structured report template with executive summary, timeline, evidence citations, findings, recommendations

Phased Rollout

Phase 1: Basic Investigation Framework

  • Git-based analysis: clone repo, analyze commit history, detect force pushes, find deleted branches
  • GitHub API investigation: fetch commit details, PR history, issue timeline, contributor changes
  • Deleted commit recovery: direct URL access, git fetch of orphaned SHAs
  • Basic evidence collection with JSON evidence store
  • Structured investigation report generation (Markdown)
  • IOC extraction from user-provided vendor reports
  • Trigger: "investigate this repo", "check for supply chain compromise", "recover deleted commits"

Phase 2: Advanced Evidence Collection

  • Wayback Machine integration via CDX API for archived GitHub page recovery
  • Google BigQuery / GitHub Archive integration for tamper-proof event history
  • Parallel investigation via delegate_task (spawn multiple evidence collectors)
  • Hypothesis formation and validation loop (with evidence-backed assertions)
  • Cross-referencing between data sources to detect discrepancies
  • Timeline reconstruction with event correlation
  • IOC-based search across related repositories

Phase 3: Full Forensic Workflow


Pros & Cons

Pros

  • Critical capability — Supply chain attacks are a top-3 software security threat; investigation tooling is scarce
  • Unique in the AI agent space — No other agent framework offers structured forensic investigation
  • Uses existing Hermes toolsweb_extract, terminal, delegate_task, execute_code cover all data access needs
  • Real-world validated — RAPTOR's approach was tested on actual incidents (Amazon Q, Istio)
  • Phase 1 has zero new dependencies — Just git + GitHub API + the agent's reasoning
  • MIT-licensed patterns — RAPTOR's investigation methodology is freely adaptable
  • High signal-to-noise — Evidence-backed hypotheses with mechanical validation reduce false conclusions
  • Complements existing skills — Works with github-* skills for repo access, domain-intel for infrastructure OSINT

Cons / Risks

  • Highly specialized audience — Security incident responders, OSS maintainers, security researchers
  • BigQuery costs — GitHub Archive queries cost money ($6.25/TiB). Skill must include cost controls and dry runs.
  • Google Cloud dependency — Full GH Archive access requires GCP credentials
  • Investigation quality — Forensic work requires rigorous reasoning; LLMs may draw false conclusions. Anti-hallucination patterns are essential.
  • Scope creep — Forensic investigation is deep; must stay focused on GitHub/OSS domain
  • Ethical considerations — Investigation tools could be misused for harassment or stalking. Skill should include ethical use guidelines.
  • API rate limits — GitHub API has rate limits (5000/hour authenticated). Investigation of large repos needs throttling.

Open Questions

  1. Should BigQuery integration be required for Phase 1, or can we defer it and still deliver useful forensic capability with just git + GitHub API + Wayback?
  2. How should the evidence store be structured? Simple JSON file per investigation, or something more robust?
  3. Should the hypothesis formation/validation use delegate_task for separate "hypothesis former" and "hypothesis checker" sub-agents, or handle it in a single conversation?
  4. Should the skill include proactive monitoring (scheduled scans of watched repos for suspicious activity)?
  5. How should we handle credentials for BigQuery and GitHub API — rely on user's environment, or integrate with Feature: Agent-Vault Skill — Placeholder-Based Secret Management for Config Files #364 (Agent-Vault)?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions