An AI-powered platform for detecting corruption in government contracts and public records.
Commons autonomously investigates San Francisco government data (contracts, campaign finance, business registrations) to uncover pay-to-play schemes, shell company networks, and conflict-of-interest patterns. Built for journalists, investigators, and civic watchdogs.
🌐 Live Demo: https://commons-ovyq.onrender.com
Corruption in government procurement happens at the intersection of three domains:
- Contracts: which companies win city work
- Campaign Finance: who funds politicians who award contracts
- Business Registration: hidden ownership of shell companies
Investigating these connections manually takes weeks. Commons does it in seconds.
- Knowledge Graph: 186K+ entities (persons, companies, city departments, contracts, campaigns) + 245K+ edges mapping their relationships
- AI Investigation Agent: Gemini Flash with function calling autonomously traverses the graph to find corruption patterns
- Interactive Frontend: Real-time visualization of entity networks, contract flows, and corruption findings
- Anonymous Tip Submission: Whistleblowers can submit tips without accounts
- Saved Investigations: Persist and publish findings for follow-up reporting
| Feature | What It Detects |
|---|---|
| Pay-to-Play Detection | Owners donate to politician → their company wins contracts from that politician |
| Shell Company Networks | Multiple newly-formed companies sharing addresses, winning from same departments |
| Conflict of Interest | City officials awarding contracts to companies their relatives own/work for |
| Pattern Aggregation | Which vendors get the most contracts? Which politicians receive the most donations? |
| Prior Investigation Search | Check if a company/person has already been investigated |
| Anonymous Tip Intake | One-time-retrieval tokens for secure tip submission |
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Preact + TypeScript + Vite | Interactive investigation UI + entity graph viz |
| Backend | Python 3.11 + Gemini Flash 2.0 | Investigation agent with function calling |
| Database | SQLite (local) / Turso/LibSQL (cloud) | Knowledge graph storage |
| Auth | Auth0 | Login + RBAC for investigator accounts |
| Alerting | Server-Sent Events (SSE) | Real-time investigation progress streaming |
| Deployment | Render | Free-tier Python web service + static hosting |
| Optional: TrueFoundry AI Gateway | Token cost tracking + LLM observability | |
| Optional: Overmind SDK | Synthetic evaluation of investigation quality |
1. SODA API (SF Open Data)
↓
Airbyte connectors extract contracts, donations, business registrations
↓
2. Entity Extraction + Linking
↓
Python pipeline fuzzy-matches companies/persons → deduplicates → normalizes
↓
3. Knowledge Graph
↓
SQLite with indexed tables: entities, edges, investigations
↓
4. Investigation Agent (Gemini Flash)
↓
Function calling: search → get_details → traverse → detect_patterns → aggregate
↓
5. Frontend (SSE stream)
↓
Real-time rendering of findings in graph, globe, narrative panels
When a user starts an investigation (e.g., "Investigate Recology's SF contracts"):
- Agent receives query → Frontend sends to
/api/investigate?q=... - Search phase:
search_entity()finds Recology in the graph (186K entities scanned via SQLite full-text search) - Detail phase:
get_entity_details()retrieves Recology's metadata (contract count, locations, officers) - Traversal phase:
traverse_connections()runs 2-3 hop BFS to find related entities- Recology → officers → their other companies
- Recology → contract departments → those departments' other vendors
- Recology → shared addresses → other companies at same address
- Pattern detection:
detect_patterns()runs heuristics for:- Conflict of interest (officers donating to politicians who awarded contracts)
- Shell company networks (newly-formed, shared infrastructure)
- Unusually high contract values relative to company age
- Aggregation:
aggregate_query()surfaces the BiggestPlayers (top vendors by total contract value) - Synthesis: Agent compiles findings into a structured briefing with evidence chains and confidence scores
- Frontend renders: As SSE events stream in, the frontend updates entity graphs, highlights suspicious edges, and displays the narrative briefing
Prerequisites: Python 3.11+, Node.js 20+, GEMINI_API_KEY
# Clone and enter the project
git clone https://github.com/Eman-Gon/AWSDeepAgentsHackathon.git
cd AWSDeepAgentsHackathon
# Install Python dependencies
pip install -r requirements.txt
# Install frontend dependencies
cd homepage && npm install && cd ..
# Build the SQLite knowledge graph from SODA API
python -m pipeline.run_pipeline --local --sqlite
# Start the Python backend (port 8000)
python -m agent.server
# In another terminal, start the Vite dev server (port 5173)
cd homepage && npm run dev
# App available at http://localhost:5173The repo includes a render.yaml blueprint that deploys both frontend and backend as a single web service.
One-time setup:
-
Create a new Blueprint project pointing to this GitHub repo
-
Set these environment variables:
GEMINI_API_KEY— your Google Gemini API keyVITE_AUTH0_DOMAIN,VITE_AUTH0_CLIENT_ID,VITE_AUTH0_REDIRECT_URI— for login (or setVITE_AUTH_DEV_BYPASS=trueto skip auth)
-
Click Deploy — the build script (
build.sh) handles everything:- Installs Python + Node deps
- Builds the Vite frontend
- Fetches data from SODA API and seeds the SQLite graph
- Starts the combined server on
$PORT(provided by Render)
Stream investigation results in real-time:
curl "http://localhost:8000/api/investigate?q=Investigate+Recology" \
-H "Accept: text/event-stream"Returns SSE stream of AgentStep JSON objects:
{"id": "1", "type": "search", "details": "Found entity 'Recology Inc'..."}
{"id": "2", "type": "traverse", "details": "Explored 42 connections..."}
{"id": "3", "type": "findings", "details": "5 potential pay-to-play indicators..."}List all saved investigations:
curl http://localhost:8000/api/investigationsSave an investigation:
curl -X POST http://localhost:8000/api/investigations \
-H "Content-Type: application/json" \
-d '{"entity_id": "recology", "findings": "...", "verdict": "suspicious"}'Publish a finding (mark as ready for journalist):
curl -X PATCH http://localhost:8000/api/investigations/recology \
-H "Content-Type: application/json" \
-d '{"published": true}'Submit a tip (returns one-time retrieval token):
curl -X POST http://localhost:8000/api/tips \
-H "Content-Type: application/json" \
-d '{"entity_name": "Shell Corp Inc", "allegation": "...", "evidence_url": "..."}'Response:
{"token": "abc123def456..."}Retrieve a tip (token is burned after retrieval):
curl http://localhost:8000/api/tips/abc123def456...curl http://localhost:8000/api/health
# → {"status": "healthy"}
curl http://localhost:8000/api/tips
# (GET /api/tips returns "Method Not Allowed" — only POST and token-specific GETs work)The investigation agent has access to these tools (via Gemini function calling):
| Function | Purpose | Example |
|---|---|---|
search_entity(name, entity_type) |
Fuzzy search for entities in the graph | search_entity("Recology", "company") |
get_entity_details(entity_id) |
Retrieve full metadata for an entity | get_entity_details("recology") |
get_edges_for_entity(entity_id, relationship_type) |
Get all edges of a specific type | get_edges_for_entity("recology", "CONTRACTED_WITH") |
traverse_connections(entity_id, hops, direction) |
BFS traversal (forward/backward/bidirectional) | traverse_connections("recology", hops=2, "bidirectional") |
aggregate_query(relationship_type, limit, sort_by) |
Find top entities by metric | aggregate_query("CONTRACTED_WITH", limit=10, sort_by="total_value") |
| Function | Purpose |
|---|---|
check_campaign_finance(entity_name) |
Find donations from/to entity |
file_investigation(entity_id, findings_summary) |
Persist investigation to DB |
check_prior_investigations(entity_id_or_keyword) |
Search saved investigations |
publish_finding(investigation_id) |
Mark investigation as published for journalist follow-up |
| Function | Purpose |
|---|---|
detect_patterns(entity_id) |
Run heuristics for corruption red flags |
collect_airbyte_evidence(entity_id) |
Fetch additional context from Airbyte enrichment API (if available) |
- person: Individual (business owner, city official, donor)
- company: Vendor, contractor, shell company
- department: City department (e.g., "Department of Public Works")
- contract: Specific city procurement
- campaign: Campaign finance record
- address: Physical location
| Type | Direction | Meaning |
|---|---|---|
CONTRACTED_WITH |
company → department | Company was awarded contracts by this city department |
AWARDED_BY |
contract → department | Contract was awarded by this department |
DONATED_TO |
person/company → recipient | Made campaign contribution |
OFFICER_OF |
person → company | Officer/director/owner of company |
REGISTERED_AT |
company → address | Business registered at this address |
WORKS_FOR |
person → department | Employee of city agency |
| Variable | Required? | Purpose |
|---|---|---|
GEMINI_API_KEY |
✅ Yes | Google Gemini API key for the investigation agent |
GOOGLE_API_KEY |
✅ (alt) | Alternative to GEMINI_API_KEY |
TURSO_DATABASE_URL |
Optional | LibSQL cloud database (defaults to local SQLite) |
TURSO_AUTH_TOKEN |
Optional | Auth token for LibSQL |
VITE_AUTH0_DOMAIN |
Optional | Auth0 tenant for login |
VITE_AUTH0_CLIENT_ID |
Optional | Auth0 SPA app ID |
VITE_AUTH0_REDIRECT_URI |
Optional | Auth0 callback URL |
VITE_AUTH0_AUDIENCE |
Optional | Auth0 API identifier |
VITE_AUTH_DEV_BYPASS |
Optional | Set to true to skip Auth0 entirely (dev only) |
TRUEFOUNDRY_BASE_URL |
Optional | TrueFoundry AI Gateway URL for LLM routing |
TRUEFOUNDRY_API_KEY |
Optional | TrueFoundry API key |
OVERMIND_API_KEY |
Optional | Overmind tracing for evaluation + optimization |
SODA_APP_TOKEN |
Optional | SODA API concurrency token (unauthenticated requests work too) |
agent/
investigator.py # Gemini agent with function calling
server.py # SSE HTTP server
graph_queries.py # All graph functions + investigation DB
patterns.py # Corruption detection heuristics
step_emitter.py # SSE event formatting
airbyte_enrichment.py # Optional Airbyte context
truefoundry_backend.py # Optional TrueFoundry routing
homepage/
src/
components/ # Reusable UI panels:
GraphPanel.ts # - Interactive entity graph (force-directed)
GlobePanel.ts # - Globe visualization of contract locations
NarrativePanel.ts # - Rich-text investigation briefing
FindingsPanel.ts # - Structured findings with severity
EntitiesPanel.ts # - Entity search/filter
SearchPanel.ts # - Investigation query input
TipsPanel.ts # - Tip submission form
api/
investigate.js # Proxies to backend /api/investigate
publish.js # Proxies to backend /api/*/publish
tips.js # Proxies to backend /api/tips
pipeline/
run_pipeline.py # Main ETL orchestrator
soda_source.py # SODA API connectors
entity_extraction.py # Entity matching + deduplication
aerospike_loader.py # (Optional) Graph load to Aerospike
data/
contracts.json # Pre-seeded SF contracts (150MB+, gitignored)
campaign_finance.json # Pre-seeded SF donations
businesses.json # Pre-seeded SF business registrations
test_e2e.py # End-to-end API test suite
render.yaml # Render Blueprint (deployment config)
build.sh # Build script for Render
Run the full E2E test suite (20 tests covering graph queries, API endpoints, and agent flow):
# Against local server (http://localhost:8000)
python test_e2e.py
# Against live Render deployment
GEMINI_API_KEY=your-key python test_e2e.py --backend-url https://commons-ovyq.onrender.comTest coverage:
- ✅ Health check + static file serving
- ✅ Entity search (exact + fuzzy matching)
- ✅ Graph traversal (BFS with depth limits)
- ✅ Campaign finance lookups
- ✅ Pattern detection heuristics
- ✅ Investigation storage + retrieval
- ✅ Finding publication
- ✅ Anonymous tip submission (token generation + burning)
- ✅ SSE streaming (integration test)
- ✅ CORS preflight handling
-
Implement the function in
agent/graph_queries.py:def my_new_query(param1: str, param2: int) -> list[dict]: """Tool description for Gemini.""" # Implementation return results
-
Add a tool declaration in
agent/investigator.pyTOOL_DECLARATIONSlist:{ "name": "my_new_query", "description": "What this tool does", "parameters": { "type": "object", "properties": { "param1": {"type": "string"}, "param2": {"type": "integer"} }, "required": ["param1", "param2"] } } -
Add dispatch in
TOOL_DISPATCHdict:"my_new_query": lambda args: graph_queries.my_new_query( args["param1"], args["param2"] )
-
Test with the agent:
python -m agent.server --port 8000 curl "http://localhost:8000/api/investigate?q=Use+my_new_query+on+..."
- Edit components in
homepage/src/components/ - Vite hot-reloads on save during dev
- Build for production:
cd homepage && npm run build→ outputs tohomepage/dist/
Push to GitHub → Render auto-deploys from the Blueprint config. To force redeploy:
cd .render
render deploy --service-id srv-...Or go to https://dashboard.render.com and click "Manual Deploy".
- SQLite only: Knowledge graph is ~500MB SQLite file in memory. For 10M+ entities, use Turso/LibSQL or Aerospike
- SODA API throttling: Rate-limited during pipeline run (~100 requests/min). Can take 2-3 hours to build full graph from scratch
- Basic auth: No fine-grained RBAC yet. Auth0 present but not enforced. All investigations visible to all users
- No full-text search: Entity search uses simple substring matching + fuzzy name comparison (no ElasticSearch)
- No persistent tips DB: Tips are stored in memory + one-time tokens are SHA-256 hashes (not synced across server instances)
- Graph database: Migrate from SQLite to Aerospike (key-value graph) for 10M+ entity scale
- Real-time sync: Stream data from SODA API incrementally instead of full nightly pipeline
- RBAC + audit log: Track who investigated what entity, add reviewer workflows
- Journalist dashboard: Published findings feed, tip status tracking, collaboration features
- Overmind integration: Automatically evaluate quality of findings + refine agent prompts
- Browser extension: One-click investigation of any SF vendor from city contracts web portal
- Mobile app: React Native version for field investigators
Commons was built at the AWS Deep Agents Hackathon using:
- Google Gemini Flash 2.0 for the investigation agent with function calling
- SODA API for SF government open data
- Preact for the lightweight UI framework
- Render for free-tier Python hosting
- Auth0 for authentication
- Airbyte for data connectors (optional enrichment)
- Overmind SDK for LLM observability (optional)
MIT License — See LICENSE file
Have questions, issues, or want to contribute?
- File an issue on GitHub
- Email: investigative-intelligence@commons.app (placeholder)
For journalism partnerships or data inquiries:
- Visit: https://commons.app (coming soon)
- Pay-to-play: When a donation to a politician is followed by that politician's department awarding contracts to the donor's company
- Shell company: A newly-formed company with minimal operations, often used to obscure ownership
- Entity linking: Process of matching multiple names (e.g., "Recology Inc", "Recology", "Recology, Inc.") to a single canonical entity in the graph
- SODA API: Socrata Open Data API — standard platform used by 100+ city/county/state governments in the US
- Function calling: LLM capability where the model decides which functions to call and in what order (vs. hardcoded flows)
- SSE (Server-Sent Events): HTTP protocol for pushing real-time updates to browsers (simpler than WebSockets)
- Aerospike: High-performance key-value database optimized for real-time analytics on billions of records