You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenPlanter is an open-source recursive investigation agent (MIT license, 1.4k stars) that acts as a "Community Edition of Palantir" — ingesting heterogeneous public datasets, resolving entities across them, and surfacing non-obvious connections through evidence-backed analysis. Its most distinctive and transferable feature is a complete OSINT investigation framework: a curated wiki of 16 public data sources with structured documentation and Python acquisition scripts, plus reusable investigation templates for entity resolution, cross-link analysis, and statistical timing correlation.
Hermes Agent currently has domain-intel (passive DNS/WHOIS/SSL reconnaissance) and arxiv (academic paper search) as its only research/OSINT capabilities. There is no framework for structured investigations across heterogeneous data sources, no entity resolution tooling, and no evidence chain construction. This would make Hermes Agent capable of genuine investigative work — from following campaign finance trails to cross-referencing government contracts with lobbying disclosures.
This should be a skill (likely Skills Hub rather than bundled, given the specialized audience) because the entire capability can be expressed as instructions + shell commands + existing Hermes tools (terminal, web_extract, read_file, write_file, search_files). The acquisition scripts use only Python stdlib. No custom tool integration needed.
Research Findings
How OpenPlanter's Investigation Framework Works
1. Data Source Wiki (16 Sources, 9 Categories)
OpenPlanter maintains a structured wiki of public data sources, each following a standardized 9-section template:
Category
Sources
Campaign Finance
MA OCPF, FEC Federal
Government Contracts
Boston Open Checkbook, USASpending.gov, SAM.gov
Corporate Registries
MA Secretary of Commonwealth, SEC EDGAR
Financial
FDIC BankFind
Lobbying
Senate LD-1/LD-2 Disclosures
Nonprofits
ProPublica 990 / IRS 990
Regulatory
EPA ECHO, OSHA Inspections
Sanctions
OFAC SDN List
International
ICIJ Offshore Leaks
Infrastructure
US Census Bureau ACS
Template structure per source:
Summary — what it is, who publishes, why it matters
Data Schema — key fields, record types, table relationships
Coverage — jurisdiction, time range, update frequency, data volume
Cross-Reference Potential — which other sources can be joined and on what keys
Data Quality — known issues (formatting, missing fields, duplicates)
Acquisition Script — path to the Python script that downloads/transforms the data
Legal & Licensing — public records law, terms of use
References — official docs, data dictionaries
The cross-reference potential section is particularly valuable — it explicitly maps join keys between sources, creating a graph of data source interconnections that an agent can traverse to plan multi-source investigations.
2. Acquisition Scripts (Python Stdlib Only)
Each data source has a corresponding fetch script (scripts/fetch_fec.py, scripts/fetch_sec_edgar.py, etc.) that uses only Python stdlib (urllib.request, json, csv, xml.etree, argparse). Zero external dependencies for data acquisition.
Scripts handle: API pagination, rate limiting, data normalization, CSV/JSON output, error handling.
3. Investigation Templates
Entity Resolution (entity_resolution.py, ~741 lines):
Three-tier name normalization:
For each vendor-politician pair, calculate mean distance from donations to nearest contract award
Generate 1000 random null hypothesis award dates
P-value = fraction of permutations where mean distance ≤ observed
Effect size = (null_mean - observed) / null_std
This is genuine computational investigative journalism methodology.
Findings Builder (build_findings_json.py, ~163 lines):
Assembles structured investigation reports with machine-readable evidence chains: each finding has id, title, severity, confidence, summary, evidence list, and source files.
Key Design Decisions
Wiki-as-knowledge-base — Data source documentation is structured for both humans AND AI agents. An agent reads the wiki entry and knows the API endpoint, auth requirements, schema, rate limits, and CLI commands.
Stdlib-only acquisition — Zero dependency installation for data fetching. Maximum portability.
Explicit confidence levels — Every entity match has a confidence tier (confirmed/probable/possible/unresolved), preventing false positives from being treated as certainties.
Evidence chain construction — Every claim traces to a specific record: claim → evidence → source → confidence.
Template-driven extensibility — Adding new data sources follows a copy-template-and-fill pattern.
arxiv skill — academic paper search via free REST API.
web_search + web_extract tools — general web research.
terminal tool — can run Python scripts for data analysis.
execute_code tool — sandboxed Python execution with RPC tool access.
What we don't have:
No structured investigation framework
No entity resolution capabilities
No public data source catalog
No evidence chain tracking
No investigation templates (entity matching, cross-linking, timing analysis)
No red flag / anomaly detection patterns
The domain-intel skill is the closest analog but covers only DNS/infrastructure OSINT. This proposal covers the much broader space of financial, regulatory, corporate, and political OSINT.
Implementation Plan
Skill vs. Tool Classification
This should be a skill because:
The entire capability is instructions + shell commands + existing tools
Acquisition scripts are standalone Python (run via terminal)
Entity resolution scripts are standalone Python (run via terminal)
Data source documentation is reference material (read via read_file)
No custom Python integration or API key management needed in the agent harness
No binary data, streaming, or real-time events
Bundled vs. Skills Hub: This is specialized (investigative journalists, researchers, OSINT analysts) rather than broadly useful to most users. Recommend Skills Hub with documentation for how to install and use it.
What We'd Need
Skill SKILL.md — Investigation workflow instructions with trigger conditions
Reference wiki — Adapted data source entries (references/ directory)
Overview
OpenPlanter is an open-source recursive investigation agent (MIT license, 1.4k stars) that acts as a "Community Edition of Palantir" — ingesting heterogeneous public datasets, resolving entities across them, and surfacing non-obvious connections through evidence-backed analysis. Its most distinctive and transferable feature is a complete OSINT investigation framework: a curated wiki of 16 public data sources with structured documentation and Python acquisition scripts, plus reusable investigation templates for entity resolution, cross-link analysis, and statistical timing correlation.
Hermes Agent currently has
domain-intel(passive DNS/WHOIS/SSL reconnaissance) andarxiv(academic paper search) as its only research/OSINT capabilities. There is no framework for structured investigations across heterogeneous data sources, no entity resolution tooling, and no evidence chain construction. This would make Hermes Agent capable of genuine investigative work — from following campaign finance trails to cross-referencing government contracts with lobbying disclosures.This should be a skill (likely Skills Hub rather than bundled, given the specialized audience) because the entire capability can be expressed as instructions + shell commands + existing Hermes tools (
terminal,web_extract,read_file,write_file,search_files). The acquisition scripts use only Python stdlib. No custom tool integration needed.Research Findings
How OpenPlanter's Investigation Framework Works
1. Data Source Wiki (16 Sources, 9 Categories)
OpenPlanter maintains a structured wiki of public data sources, each following a standardized 9-section template:
Template structure per source:
The cross-reference potential section is particularly valuable — it explicitly maps join keys between sources, creating a graph of data source interconnections that an agent can traverse to plan multi-source investigations.
2. Acquisition Scripts (Python Stdlib Only)
Each data source has a corresponding fetch script (
scripts/fetch_fec.py,scripts/fetch_sec_edgar.py, etc.) that uses only Python stdlib (urllib.request,json,csv,xml.etree,argparse). Zero external dependencies for data acquisition.Scripts handle: API pagination, rate limiting, data normalization, CSV/JSON output, error handling.
3. Investigation Templates
Entity Resolution (
entity_resolution.py, ~741 lines):Three-tier name normalization:
Three-tier matching with explicit confidence:
employer_exact/donor_exact— high confidenceemployer_fuzzy/donor_fuzzy— medium confidenceemployer_token_overlap— low confidenceRed flag analysis targeting pay-to-play indicators:
Cross-Link Analysis (
cross_link_analysis.py, ~586 lines):Alternative matching pipeline using pandas + optional rapidfuzz (token_sort_ratio at threshold 82). Detects contractor-donor matches and bundled donations.
Timing Analysis (
timing_analysis.py, ~338 lines):Statistical permutation testing for donation-contract timing correlation:
This is genuine computational investigative journalism methodology.
Findings Builder (
build_findings_json.py, ~163 lines):Assembles structured investigation reports with machine-readable evidence chains: each finding has id, title, severity, confidence, summary, evidence list, and source files.
Key Design Decisions
Current State in Hermes Agent
What we have:
domain-intelskill — passive DNS/WHOIS/SSL/crt.sh reconnaissance. Domain-focused only.arxivskill — academic paper search via free REST API.web_search+web_extracttools — general web research.terminaltool — can run Python scripts for data analysis.execute_codetool — sandboxed Python execution with RPC tool access.What we don't have:
The
domain-intelskill is the closest analog but covers only DNS/infrastructure OSINT. This proposal covers the much broader space of financial, regulatory, corporate, and political OSINT.Implementation Plan
Skill vs. Tool Classification
This should be a skill because:
terminal)terminal)read_file)Bundled vs. Skills Hub: This is specialized (investigative journalists, researchers, OSINT analysts) rather than broadly useful to most users. Recommend Skills Hub with documentation for how to install and use it.
What We'd Need
Phased Rollout
Phase 1: Core Skill + Top Data Sources
Phase 2: Full Data Source Catalog + Advanced Analysis
Phase 3: Integration & Polish
delegate_taskfor parallel multi-source investigationsPros & Cons
Pros
Cons / Risks
Open Questions
References
domain-intelskill — Existing OSINT capability (domain-focused only)