Feature: Binary Security Analysis Skill — Exploit Feasibility, Crash Triage, and Protection Analysis (inspired by RAPTOR)

## Overview

[RAPTOR](https://github.com/gadievron/raptor) includes a sophisticated binary security analysis module (`packages/exploit_feasibility/`, ~2500 lines in `api.py` alone) that performs comprehensive exploit feasibility assessment of compiled binaries. It analyzes memory protections, kernel mitigations, glibc defenses, ROP gadget availability, payload constraints, and input handler characteristics to determine whether a vulnerability in a binary is actually exploitable — and if so, what techniques would work.

Hermes Agent has no capability for analyzing compiled binaries, assessing exploit feasibility, triaging crashes, or understanding binary protections. This is a significant gap for users working with C/C++ codebases, embedded systems, CTF challenges, or security research.

This issue proposes a **Binary Security Analysis** skill that wraps standard binary analysis tools (checksec, readelf, objdump, nm, GDB/LLDB, ROPgadget) with LLM-powered interpretation, adapting RAPTOR's analysis patterns and expert personas into a Hermes Agent skill. The skill would also include crash analysis and triage capabilities from RAPTOR's `packages/binary_analysis/` module (1325 lines).

---

## Research Findings

### How RAPTOR's Binary Analysis Works

#### Exploit Feasibility Module (packages/exploit_feasibility/)

The module performs layered analysis through `analyze_binary()`:

**Protection Analysis:**
- Binary protections via `checksec`: RELRO (Partial/Full), PIE (position-independent), NX/DEP (non-executable stack), Stack Canary, FORTIFY_SOURCE
- glibc mitigation analysis: pointer mangling, tcache hardening, safe linking, `__free_hook`/`__malloc_hook` removal status, `%n` format string verification
- Kernel mitigation analysis: ASLR level (0/1/2), `mmap_min_addr`, `ptrace_scope`

**ROP Gadget Analysis:**
- Scans for useful gadgets: `pop rdi; ret`, `pop rsi; ret`, `syscall; ret`, `leave; ret`, etc.
- Bad byte analysis per target address (null bytes, newlines in payload)
- One-gadget analysis with partial overwrite viability assessment

**Exploit Primitive Enumeration:**
- Arbitrary read/write detection
- Control flow hijack (RIP/RSP control)
- Heap control primitives
- Format string capabilities (call count, single-shot detection)

**Input Handler Analysis:**
- Detects input functions: `strcpy`, `gets`, `fgets`, `read`, `recv`, `scanf`
- Payload constraint analysis: bad bytes, maximum length, charset restrictions

**Output:**
Rich verdict with classification (exploitable / likely_exploitable / difficult / unlikely / blocked), concrete targets, viable techniques, and actionable guidance.

#### Crash Analysis Module (packages/binary_analysis/)

**CrashAnalyser class (crash_analyser.py, 1325 lines):**

10-step crash analysis pipeline:
1. Get binary info (`file`, `readelf`)
2. Detect ASan instrumentation
3. Run ASan analysis if available
4. Run debugger analysis (GDB on Linux, LLDB on macOS — auto-detected)
5. Get disassembly at crash site (`objdump`)
6. Analyze memory layout/protections (ASLR, stack canaries, NX/DEP)
7. Detect environmental crashes (debugger artifacts, sanitizer artifacts)
8. Analyze memory regions around crash address
9. Resolve function names (`addr2line`, symbol table, link register)
10. Compute stack hash for deduplication

**Crash type classification** (signal-based + function-based + stack-trace-based):
- heap_overflow, stack_overflow, null_pointer_dereference, use_after_free, double_free
- format_string_vulnerability, integer_overflow, buffer_overflow, segmentation_fault
- division_by_zero, illegal_instruction, bus_error

#### Crash Analysis Skills (.claude/skills/crash-analysis/)

4 specialized sub-skills:
- **rr Debugger**: Deterministic record-replay debugging with reverse execution. Includes `crash_trace.py` script for automated trace extraction (supports both regular and ASAN crashes).
- **Line Execution Checker**: C++17 tool that checks if specific source lines were executed using gcov data.
- **gcov Coverage**: Add gcov instrumentation to C/C++ projects for coverage-guided analysis.
- **Function Tracing**: Uses `-finstrument-functions` hooks with per-thread logs and Perfetto visualization output.

#### Expert Personas

RAPTOR loads specialized expert personas progressively:
- **Crash Analyst** (Charlie Miller/Halvar Flake persona, 284 lines): Systematic framework — crash type ID → register analysis → exploit primitives → mitigations → attack scenario → feasibility classification (Trivial/Moderate/Complex/Infeasible)
- **Offensive Security Researcher** (200 lines): Decision trees for format string, stack overflow, and heap exploitation. "6 Byte Rule" for x86_64 + strcpy. "Full RELRO Trap" explanation.
- **Exploit Developer** (Mark Dowd persona, 337 lines): 7 "Prime Directives" requiring working code, complete executability, safe testing, realistic constraints, honest assessment. Templates for every vulnerability type.

#### Anti-Hallucination Patterns

The crash analysis system uses a hypothesis/rebuttal loop:
- Crash analyzer writes hypothesis with mandatory evidence (>=3 actual debugger outputs, >=5 distinct memory addresses)
- Checker agent validates mechanically (grep for red flags: "expected output", "should show", "likely", "probably")
- If rejected, analyzer retries with feedback (max 3 iterations)

### Key Design Decisions

1. **Profile-based analysis**: `_get_profile_for_vuln_type()` auto-selects analysis strategy — web vulnerabilities skip memory mitigation checks entirely
2. **Same-tier LLM fallback**: When analyzing, LLM fallback stays within cloud or local tier, never crosses (prevents inconsistent analysis quality)
3. **Mandatory gates**: The `/exploit` command forces feasibility analysis BEFORE any exploit work. Lists specific things NOT to suggest when mitigations are present (e.g., "If Full RELRO, do NOT suggest GOT overwrites")
4. **Context persistence**: `save_exploit_context()` persists analysis to JSON files that survive context window compaction

---

## Current State in Hermes Agent

**What we have:**
- No binary analysis capabilities whatsoever
- `terminal` tool can run `checksec`, `readelf`, `objdump`, `gdb` etc. if installed
- `execute_code` can run Python scripts for analysis
- `delegate_task` can spawn sub-agents for parallel analysis

**What we don't have:**
- No skill for binary security assessment
- No crash triage workflow
- No exploit feasibility analysis
- No integration with debugging tools (GDB, LLDB, rr)
- No knowledge of binary protections or exploitation techniques

**Relevant existing issues:**
- #382 — Code Security Audit Skill (complementary — source code analysis vs. binary analysis)
- #344 — Multi-Agent Architecture (relevant for hypothesis/rebuttal loops)

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **skill** because:
- All analysis tools (checksec, readelf, objdump, nm, ROPgadget, GDB) are CLI tools callable via `terminal`
- The analysis is LLM-driven interpretation of tool outputs — perfectly suited to skill instructions
- No custom Python integration needed in the agent harness
- No streaming, real-time events, or binary data handling by the agent
- Expert personas are prompting patterns, not code

**Bundled vs. Skills Hub:** Recommend **Skills Hub**. Binary analysis is highly specialized (security researchers, CTF players, systems programmers). Required tools (checksec, ROPgadget, GDB) are not commonly installed on developer machines.

**Category:** `security` (same category as Code Security Audit skill)

### What We'd Need

1. **SKILL.md** — Workflow instructions covering binary protection analysis, crash triage, and exploit feasibility assessment. Includes adapted expert persona prompts.
2. **references/protections-guide.md** — Agent reference explaining each protection (RELRO, PIE, NX, canary, ASLR, FORTIFY) and what they prevent
3. **references/exploitation-techniques.md** — Decision trees for common exploitation paths (adapted from RAPTOR's offensive security researcher persona)
4. **references/crash-types.md** — Classification guide for crash types with investigation steps
5. **scripts/binary-audit.sh** — Helper script that runs checksec + readelf + basic analysis and outputs structured JSON

### Phased Rollout

**Phase 1: Binary Protection Analysis + Crash Triage**
- Detect and use available tools (checksec, readelf, objdump, file, strings, nm)
- Run comprehensive protection analysis on a binary
- Analyze crash files/core dumps with GDB (Linux) or LLDB (macOS)
- Classify crash type (heap overflow, UAF, format string, etc.)
- Present findings with human-readable explanations
- Assess basic exploitability based on protections

**Phase 2: Deep Exploit Feasibility**
- ROP gadget analysis (via ROPgadget tool)
- Bad byte analysis for payload constraints
- Input handler detection and constraint mapping
- glibc mitigation analysis (version-aware)
- Full exploit feasibility verdict with technique recommendations
- Adapted expert persona prompts (crash analyst, exploit developer)
- Context persistence for multi-turn exploit development

**Phase 3: Fuzzing Integration + Advanced Analysis**
- AFL++ campaign orchestration (setup, run, monitor, collect crashes)
- Crash deduplication and ranking
- rr record-replay debugging integration (Linux x86_64 only)
- gcov coverage analysis for coverage-guided investigation
- Function tracing with Perfetto visualization
- Integration with Code Security Audit skill (#382) for combined source+binary analysis
- Batch crash triage (process multiple crashes, rank by severity)

---

## Pros & Cons

### Pros
- **Unique capability** — No other AI agent framework offers integrated binary security analysis
- **High-value for security researchers** — Automates tedious manual analysis steps
- **Expert-level prompting** — RAPTOR's personas encode decades of reverse engineering expertise
- **Platform-aware** — GDB on Linux, LLDB on macOS (mirrors RAPTOR's approach)
- **Progressive complexity** — Phase 1 is useful with just `file` and `readelf`; deeper tools add power
- **MIT-licensed source** — RAPTOR's analysis patterns and code are freely adaptable

### Cons / Risks
- **Highly specialized audience** — Most developers won't need binary exploitation analysis
- **Tool dependencies** — Full analysis requires checksec, ROPgadget, GDB, optionally rr and AFL++
- **Platform limitations** — rr only works on Linux x86_64; some tools Linux-only
- **Safety concerns** — Exploit generation capabilities need clear ethical usage guidelines
- **LLM accuracy** — Binary analysis requires precise reasoning; LLMs may hallucinate about register values or memory layouts. RAPTOR's anti-hallucination patterns (mandatory debugger output, mechanical checks) are essential.
- **Scope** — Could easily expand into a full exploit development framework; must stay focused on analysis/triage

---

## Open Questions

1. Should the skill include AFL++ fuzzing in Phase 1, or defer to Phase 3 as proposed?
2. How much of RAPTOR's exploit_feasibility Python code (2500 lines) should we adapt vs. reimplementing as skill instructions + shell commands?
3. Should exploit PoC generation be included, or just analysis/triage? (Ethical considerations)
4. Should the skill work with remote binaries (download, analyze) or only local files?
5. How should we handle the hypothesis/rebuttal validation loop — via `delegate_task` sub-agents or iterative self-checking?

---

## References

- [RAPTOR](https://github.com/gadievron/raptor) — Source repo (MIT license)
- [RAPTOR packages/exploit_feasibility/](https://github.com/gadievron/raptor/tree/main/packages/exploit_feasibility) — Binary analysis module
- [RAPTOR packages/binary_analysis/](https://github.com/gadievron/raptor/tree/main/packages/binary_analysis) — Crash analysis module
- [RAPTOR packages/fuzzing/](https://github.com/gadievron/raptor/tree/main/packages/fuzzing) — AFL++ integration
- [RAPTOR .claude/skills/crash-analysis/](https://github.com/gadievron/raptor/tree/main/.claude/skills/crash-analysis) — Crash analysis skills
- [RAPTOR tiers/personas/](https://github.com/gadievron/raptor/tree/main/tiers/personas) — Expert security personas
- [checksec](https://github.com/slimm609/checksec.sh) — Binary protection checker
- [ROPgadget](https://github.com/JonathanSalwan/ROPgadget) — ROP gadget finder
- [AFL++](https://aflplus.plus/) — Coverage-guided fuzzer
- Hermes Agent #382 — Code Security Audit Skill (complementary)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Binary Security Analysis Skill — Exploit Feasibility, Crash Triage, and Protection Analysis (inspired by RAPTOR) #383

Overview

Research Findings

How RAPTOR's Binary Analysis Works

Exploit Feasibility Module (packages/exploit_feasibility/)

Crash Analysis Module (packages/binary_analysis/)

Crash Analysis Skills (.claude/skills/crash-analysis/)

Expert Personas

Anti-Hallucination Patterns

Key Design Decisions

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Binary Security Analysis Skill — Exploit Feasibility, Crash Triage, and Protection Analysis (inspired by RAPTOR) #383

Description

Overview

Research Findings

How RAPTOR's Binary Analysis Works

Exploit Feasibility Module (packages/exploit_feasibility/)

Crash Analysis Module (packages/binary_analysis/)

Crash Analysis Skills (.claude/skills/crash-analysis/)

Expert Personas

Anti-Hallucination Patterns

Key Design Decisions

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions