Feature: OSS Security Forensics Skill — Supply Chain Investigation, Evidence Recovery, and Forensic Analysis (inspired by RAPTOR)

## Overview

[RAPTOR](https://github.com/gadievron/raptor) includes an elaborate OSS Security Forensics system — a multi-agent investigation framework for researching open-source supply chain attacks. It uses GitHub Archive (BigQuery), Wayback Machine, GitHub API, local git analysis, and IOC extraction to reconstruct attack timelines, recover deleted evidence, and produce forensic investigation reports. This is RAPTOR's most sophisticated module, spanning 9 specialized agent definitions, 5 skill files (1800+ lines combined), and a complete Pydantic v2 evidence schema system.

Supply chain attacks are one of the most critical threats in modern software development (SolarWinds, XZ Utils, event-stream, ua-parser-js). Hermes Agent currently has no capability for investigating suspicious activity in open-source repositories, recovering deleted content, or conducting structured forensic analysis. The existing OSINT skill (#355) covers public records investigation (campaign finance, government contracts) — a completely different domain.

This issue proposes an **OSS Security Forensics** skill that adapts RAPTOR's investigation framework into a Hermes Agent skill, enabling users to investigate suspicious GitHub activity, recover deleted commits and PRs, analyze supply chain compromise indicators, and produce structured forensic reports with evidence chains.

---

## Research Findings

### How RAPTOR's OSS Forensics Works

#### Architecture: Multi-Agent Investigation Framework

RAPTOR defines 9 specialized agents that work together in a 7-phase orchestration:

**Phase 0: Initialize** — Create investigation working directory, initialize evidence.json
**Phase 1: Parse Prompt** — Extract repos, actors, dates, URLs, and IOCs from investigation request
**Phase 2: Parallel Evidence Collection** — Spawn 5 specialist investigators simultaneously:

1. **GH Archive Investigator** — Queries tamper-proof GitHub event history via BigQuery. Covers all 12 GitHub event types (PushEvent, PullRequestEvent, IssuesEvent, etc.). Detects force pushes via zero-commit PushEvents, automation abuse via WorkflowRunEvents, and deleted content that remains in the immutable archive.

2. **GitHub API Investigator** — Queries current GitHub API for commits, issues, PRs, files, branches, tags, releases, forks. Cross-references with archive data to detect discrepancies (content present in archive but missing from current API = deletion detected).

3. **Wayback Machine Investigator** — Uses CDX API (`web.archive.org/cdx/search/cdx`) to search archived snapshots of GitHub pages. Recovers deleted READMEs, issues, PRs, wiki pages, release notes, and fork network pages.

4. **Local Git Investigator** — Analyzes the local git repository including dangling commits (force-pushed but not garbage-collected), reflog, blame, and diff analysis.

5. **IOC Extractor** — Extracts Indicators of Compromise from vendor security reports: commit SHAs, file paths, API keys, secrets, IP addresses, domains, package names, actor usernames.

**Phase 3: Hypothesis Formation** — An `oss-hypothesis-former-agent` synthesizes evidence from all 5 investigators into testable hypotheses. Each hypothesis must cite evidence by ID. Follow-up evidence requests trigger additional collection rounds (max N iterations).

**Phase 4: Evidence Verification** — An `oss-evidence-verifier-agent` validates each piece of evidence against original sources using a `ConsistencyVerifier`.

**Phase 5: Hypothesis Validation** — An `oss-hypothesis-checker-agent` mechanically validates hypotheses:
- Rejects hypotheses with uncited or unverified evidence
- Checks for logical consistency
- Produces rebuttals that feed back into hypothesis refinement (max iterations enforced)

**Phase 6: Report Generation** — An `oss-report-generator-agent` produces the final forensic report.

**Phase 7: Completion**

#### GitHub Archive Skill (958 lines)

The most detailed skill file. Key capabilities:
- **BigQuery cost optimization**: Dry runs before every query, column selection (only query needed fields), date range narrowing, `_TABLE_SUFFIX` filtering
- **Safe query template**: `safe_gharchive_query()` wrapper with automatic cost controls
- **12 event type coverage**: Push, PR, Issue, IssueComment, Watch, Fork, Create, Delete, Release, Member, Public, WorkflowRun
- **Real-world examples**: Amazon Q/aws-toolkit-vscode incident investigation, Istio supply chain token leak
- **Deleted content detection**: Cross-reference archive events against current repo state

#### Evidence Schema System (Pydantic v2)

Complete typed evidence framework:
- **12 Event types**: PushEvent, PullRequestEvent, IssueEvent, etc.
- **10 Observation types**: CommitObservation, IssueObservation, IOC, etc.
- **15 IOC types**: COMMIT_SHA, FILE_PATH, API_KEY, SECRET, IP_ADDRESS, DOMAIN, PACKAGE_NAME, ACTOR_USERNAME, etc.
- **EvidenceSource enum**: gharchive, git, github, wayback, security_vendor
- **VerificationInfo**: source + URL + BigQuery table reference
- **EvidenceStore**: Add/query/filter/save/load/verify evidence collections

#### Deleted Commit Recovery Skill (303 lines)

Key insight: **Force-pushed commits are NEVER deleted from GitHub servers.**

3 access methods:
1. Direct web URL: `github.com/<owner>/<repo>/commit/<sha>.patch` (append `.patch` or `.diff`)
2. REST API: `api.github.com/repos/<owner>/<repo>/git/commits/<sha>`
3. Git fetch: `git fetch origin <sha>` (works even after force push)

Real-world example: Istio supply chain — recovered a PAT token from a force-pushed commit (worth $25k bounty).

#### Anti-Hallucination Patterns

- **"STAY IN YOUR LANE"**: Every agent has explicit role boundaries. E.g., "You are a SPECIALIST INVESTIGATOR for GH Archive BigQuery collection ONLY. You do NOT query GitHub API, recover deleted content, or perform local git forensics."
- **Evidence-first**: Every claim must cite evidence by ID. No assertion without citation.
- **Mechanical validation**: Hypothesis checker uses grep-like checks to detect fabricated evidence.
- **Proof-required disproval**: Can't dismiss a hypothesis without explaining why.

---

## Current State in Hermes Agent

**What we have:**
- `domain-intel` skill — passive DNS/WHOIS/SSL reconnaissance
- `web_extract` tool — can fetch archived web pages, GitHub pages
- `terminal` tool — can run `git` commands, `curl`, API calls
- `execute_code` tool — can run Python scripts for data analysis
- `delegate_task` tool — can spawn parallel sub-agents (maps to RAPTOR's parallel investigators)
- `github-*` skills — repository management, issues, PRs, code review
- `session_search` — past session recall (useful for multi-session investigations)

**What we don't have:**
- No forensic investigation framework
- No GitHub Archive / BigQuery integration
- No deleted content recovery workflow
- No evidence chain construction or verification
- No supply chain attack investigation capability
- No IOC extraction or analysis
- No hypothesis formation/validation methodology for investigations

**Relevant existing issues:**
- #355 — OSINT Investigation Skill (different domain: public records/campaign finance, not GitHub/OSS security)
- #382 — Code Security Audit Skill (complementary — finds vulns in code, this investigates compromised repos)
- #383 — Binary Security Analysis Skill (complementary — different analysis domain)
- #346 — Structured Memory System (could store evidence graphs long-term)

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **skill** because:
- All data sources are accessible via existing tools: `web_extract` (Wayback Machine, GitHub pages), `terminal` (git commands, curl for GitHub API, BigQuery CLI), `execute_code` (Python data processing)
- Evidence management is file-based (JSON evidence stores)
- The investigation methodology is instructions + reasoning — exactly what skills are for
- Multi-agent orchestration maps naturally to `delegate_task` sub-agents
- No custom Python integration needed in the agent harness
- No binary data, streaming, or real-time events

**Bundled vs. Skills Hub:** Recommend **Skills Hub**. OSS forensics is specialized (security teams, incident responders, open-source maintainers). Full capability requires BigQuery credentials for GitHub Archive access.

**Category:** `security` (same category as Code Security Audit and Binary Security Analysis skills)

### What We'd Need

1. **SKILL.md** — Investigation workflow with trigger conditions, phase descriptions, and agent orchestration instructions
2. **references/github-archive-guide.md** — How to query GH Archive via BigQuery (adapted from RAPTOR's 958-line skill, with cost optimization guidance)
3. **references/evidence-types.md** — Event types, observation types, IOC types, and evidence source taxonomy
4. **references/recovery-techniques.md** — How to recover deleted commits, PRs, issues from GitHub and Wayback Machine
5. **references/investigation-templates.md** — Common investigation scenarios (supply chain attack, maintainer compromise, credential leak, typosquatting)
6. **scripts/evidence-store.py** — Lightweight evidence management script (add, query, filter, verify, export)
7. **templates/forensic-report.md** — Structured report template with executive summary, timeline, evidence citations, findings, recommendations

### Phased Rollout

**Phase 1: Basic Investigation Framework**
- Git-based analysis: clone repo, analyze commit history, detect force pushes, find deleted branches
- GitHub API investigation: fetch commit details, PR history, issue timeline, contributor changes
- Deleted commit recovery: direct URL access, git fetch of orphaned SHAs
- Basic evidence collection with JSON evidence store
- Structured investigation report generation (Markdown)
- IOC extraction from user-provided vendor reports
- Trigger: "investigate this repo", "check for supply chain compromise", "recover deleted commits"

**Phase 2: Advanced Evidence Collection**
- Wayback Machine integration via CDX API for archived GitHub page recovery
- Google BigQuery / GitHub Archive integration for tamper-proof event history
- Parallel investigation via `delegate_task` (spawn multiple evidence collectors)
- Hypothesis formation and validation loop (with evidence-backed assertions)
- Cross-referencing between data sources to detect discrepancies
- Timeline reconstruction with event correlation
- IOC-based search across related repositories

**Phase 3: Full Forensic Workflow**
- Multi-session investigation support (save/resume investigations using persistent files)
- Integration with #346 (Structured Memory) for persistent evidence graphs
- Integration with #355 (OSINT Skill) for cross-domain investigations
- Automated monitoring: scheduled checks on watched repos for suspicious changes
- Evidence export for legal/compliance use (chain of custody documentation)
- Investigation playbooks for common scenarios (XZ Utils-style attacks, dependency confusion, typosquatting)

---

## Pros & Cons

### Pros
- **Critical capability** — Supply chain attacks are a top-3 software security threat; investigation tooling is scarce
- **Unique in the AI agent space** — No other agent framework offers structured forensic investigation
- **Uses existing Hermes tools** — `web_extract`, `terminal`, `delegate_task`, `execute_code` cover all data access needs
- **Real-world validated** — RAPTOR's approach was tested on actual incidents (Amazon Q, Istio)
- **Phase 1 has zero new dependencies** — Just git + GitHub API + the agent's reasoning
- **MIT-licensed patterns** — RAPTOR's investigation methodology is freely adaptable
- **High signal-to-noise** — Evidence-backed hypotheses with mechanical validation reduce false conclusions
- **Complements existing skills** — Works with `github-*` skills for repo access, `domain-intel` for infrastructure OSINT

### Cons / Risks
- **Highly specialized audience** — Security incident responders, OSS maintainers, security researchers
- **BigQuery costs** — GitHub Archive queries cost money ($6.25/TiB). Skill must include cost controls and dry runs.
- **Google Cloud dependency** — Full GH Archive access requires GCP credentials
- **Investigation quality** — Forensic work requires rigorous reasoning; LLMs may draw false conclusions. Anti-hallucination patterns are essential.
- **Scope creep** — Forensic investigation is deep; must stay focused on GitHub/OSS domain
- **Ethical considerations** — Investigation tools could be misused for harassment or stalking. Skill should include ethical use guidelines.
- **API rate limits** — GitHub API has rate limits (5000/hour authenticated). Investigation of large repos needs throttling.

---

## Open Questions

1. Should BigQuery integration be required for Phase 1, or can we defer it and still deliver useful forensic capability with just git + GitHub API + Wayback?
2. How should the evidence store be structured? Simple JSON file per investigation, or something more robust?
3. Should the hypothesis formation/validation use `delegate_task` for separate "hypothesis former" and "hypothesis checker" sub-agents, or handle it in a single conversation?
4. Should the skill include proactive monitoring (scheduled scans of watched repos for suspicious activity)?
5. How should we handle credentials for BigQuery and GitHub API — rely on user's environment, or integrate with #364 (Agent-Vault)?

---

## References

- [RAPTOR](https://github.com/gadievron/raptor) — Source repo (MIT license)
- [RAPTOR .claude/skills/oss-forensics/](https://github.com/gadievron/raptor/tree/main/.claude/skills/oss-forensics) — OSS Forensics skills
- [RAPTOR .claude/agents/](https://github.com/gadievron/raptor/tree/main/.claude/agents) — Agent definitions (9 forensics agents)
- [GitHub Archive](https://www.gharchive.org/) — Public GitHub event dataset
- [GH Archive BigQuery](https://console.cloud.google.com/marketplace/product/github/github-repos) — BigQuery dataset
- [Wayback Machine CDX API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server) — Archived page search
- [XZ Utils backdoor](https://en.wikipedia.org/wiki/XZ_Utils_backdoor) — Example supply chain attack (2024)
- [Istio supply chain incident](https://istio.io/latest/blog/2024/cve-2024-23322/) — Referenced in RAPTOR
- Hermes Agent #355 — OSINT Investigation Skill (different domain, potential integration point)
- Hermes Agent #382 — Code Security Audit Skill (complementary)
- Hermes Agent #383 — Binary Security Analysis Skill (complementary)
- Hermes Agent #346 — Structured Memory System (potential evidence store integration)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: OSS Security Forensics Skill — Supply Chain Investigation, Evidence Recovery, and Forensic Analysis (inspired by RAPTOR) #384

Overview

Research Findings

How RAPTOR's OSS Forensics Works

Architecture: Multi-Agent Investigation Framework

GitHub Archive Skill (958 lines)

Evidence Schema System (Pydantic v2)

Deleted Commit Recovery Skill (303 lines)

Anti-Hallucination Patterns

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: OSS Security Forensics Skill — Supply Chain Investigation, Evidence Recovery, and Forensic Analysis (inspired by RAPTOR) #384

Description

Overview

Research Findings

How RAPTOR's OSS Forensics Works

Architecture: Multi-Agent Investigation Framework

GitHub Archive Skill (958 lines)

Evidence Schema System (Pydantic v2)

Deleted Commit Recovery Skill (303 lines)

Anti-Hallucination Patterns

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions