Skip to content

naga-k/trustgate

Repository files navigation

TrustGate

The approval layer AI workflows forgot.

vibeFORWARD Hackathon · Track 2 — Guardrails & Safe Use · New York City

TrustGate sits between an AI agent and the humans who have to sign off, running live guardrail checks on every AI decision and showing why each recommendation should or shouldn't be trusted. We're starting with AI-driven vendor invoice approval — a workflow every Fortune 500 runs, and one that lost businesses $2.9 billion to invoice fraud in 2023 alone.


Why this problem, right now

Enterprises are deploying AI into high-stakes workflows — invoice approvals, insurance claims, hiring, policy interpretation — faster than they can govern them. When it goes wrong, they eat the liability. Today's "guardrail" is hope, plus a human rubber-stamp.

Four incidents that make this concrete:

When What happened What it means
Mar 2024 NYC's own MyCity chatbot told small business owners they could take worker tips, fire people for reporting harassment, and refuse Section 8 tenants — all illegal under NYC law The City of New York shipped an unguarded LLM into a compliance-sensitive workflow. It stayed live after The Markup reported it.
Feb 2024 Air Canada was ruled legally liable for a fare its chatbot hallucinated — tribunal rejected the "the AI is a separate entity" defense AI hallucinations are no longer embarrassing; they are a balance-sheet liability.
2023 JPMorgan banned ChatGPT firm-wide for 250,000+ employees The biggest US bank chose "turn it off" over "deploy without a trust layer." That trust layer is the opportunity.
Oct 2025 Deloitte Australia issued a partial refund on a government report due to AI-hallucinated citations Even Big 4 consultancies, with all the process controls money can buy, ship hallucinated compliance content to paying clients.

And the regulatory tailwind: NYC Local Law 144 (in effect since July 2023) already requires bias audits for AI used in hiring decisions. AI-governance-as-a-product isn't speculative — it's a line item.


What we built

For every AI invoice recommendation, TrustGate runs four guardrail checks in parallel before the decision ever reaches the AP manager's queue:

1. Policy Check (LLM-grounded)

Claude compares the invoice against a written policy document (policy.md) and cites the exact clause violated. Catches over-limit amounts, unapproved vendor categories, missing PO references.

2. Source Verification (Tavily-powered)

For any external claim — vendor legitimacy, business address, domain authenticity — we hit Tavily for live web evidence and surface the source URLs. This is the guardrail that catches vendor impersonation fraud (lookalike domains, shell companies, vendors that don't exist on the open web) — the exact pattern in billions of dollars of FBI-reported Business Email Compromise loss.

3. Anomaly Check (statistical)

Flags invoices whose amount sits more than 2σ above a vendor's historical mean, or first-time-vendor spikes over policy thresholds. Non-LLM on purpose — not every guardrail should depend on a model.

4. Confidence Decomposition (explainability)

The AI agent's "94% confident" is meaningless to a human reviewer. TrustGate breaks that number into three weighted factors and flags when the factors don't support the headline confidence. (This is the Trust & Transparency virtue we borrow from Track 1.)

Each check returns { status: "pass" | "flag" | "block", reason, evidence }. Each decision and verdict writes to an append-only audit log that you can export as JSON — because in regulated workflows, "we have an audit trail" is non-negotiable.


Demo (90 seconds, on stage)

Five seeded invoices, each designed to showcase a different guardrail:

  1. Clean invoice — known vendor, normal amount → all green, one-click approve.
  2. Vendor impersonation — looks like Acme Supplies but domain is .co not .comSource Verification blocks, with Tavily evidence.
  3. Policy violation — $47,000 invoice under a $10k single-approver cap → Policy Check flags with cited clause.
  4. Historical anomaly — known vendor, 4× their usual amount → Anomaly Check flags with z-score.
  5. Opaque overconfidence — AI says 94% approve, but the rationale is thin → Confidence Decomposition flags, showing that the factors don't support the number.

The demo arc is the pitch arc.


Market

The AI-guardrails category is ~18 months old, so we triangulate from three measured adjacencies plus the cost-of-inaction.

Emerging category — AI TRiSM (Trust, Risk, Security Management): Named by Gartner as a top strategic tech trend. Analyst estimates for AI governance tooling cluster in the $1–3B today growing to $10–15B by 2028–2030 range, with wide variance because the category is still being defined.

Established adjacencies TrustGate plugs into:

Market Size today Why it matters
Accounts Payable automation ~$3–5B, ~$7–12B by 2030 Our beachhead
Governance, Risk & Compliance (GRC) software ~$50B+ Adjacent buying center
AI model risk / ML observability (Arize, Fiddler, Credo AI) ~$1–3B, fast-growing Direct competitive set

Cost of doing nothing (often more persuasive than TAM with enterprise judges):

  • $2.9B in FBI-reported Business Email Compromise losses in 2023 — invoice/vendor fraud is a major share
  • JPMorgan's 250,000-employee ChatGPT ban = a rough proxy for the productivity tax every enterprise pays when they have no trust layer
  • Air Canada = every chatbot hallucination now carries a legal price tag that didn't exist 24 months ago

Initial wedge (SOM): mid-market companies (500–5,000 employees) in regulated industries — financial services, healthcare, insurance, government contractors — running AI-assisted AP or procurement. Tens of thousands of US companies, each a $10k–$100k/yr spender once the category matures. A 1% wedge is nine figures of ARR.


How it works — architecture at a glance

┌──────────────┐       ┌──────────────────────┐       ┌───────────────┐
│  AI Agent    │──────▶│  TrustGate Backend   │──────▶│  Approval     │
│ (recommends) │       │  (4 parallel checks) │       │  Queue UI     │
└──────────────┘       │                      │       └───────────────┘
                       │  • Policy (Claude)   │               │
                       │  • Source (Tavily)   │               ▼
                       │  • Anomaly (stats)   │       ┌───────────────┐
                       │  • Confidence (LLM)  │       │  AP Manager   │
                       └──────────────────────┘       │  approves /   │
                                │                     │  overrides    │
                                ▼                     └───────────────┘
                       ┌──────────────────────┐               │
                       │  Append-only Audit   │◀──────────────┘
                       │  Log (JSON export)   │
                       └──────────────────────┘

Stack

  • Frontend: Lovable → React + Tailwind
  • Backend: Python FastAPI (single main.py, ~40 lines of async orchestration)
  • LLM: Anthropic Claude (claude-sonnet-4-6)
  • Search: Tavily API
  • Storage: in-memory + JSON file audit log (intentional — no DB fights at hackathon scope)

Repo layout

.
├── README.md          ← you are here
├── SPEC.md            ← full product + build spec
├── policy.md          ← invoice approval policy (fed to the Policy guardrail)
├── pyproject.toml     ← Python deps (managed by uv)
├── .env.example       ← copy to .env and fill in keys
├── src/
│   ├── main.py        ← FastAPI app + routes
│   ├── llm.py         ← Lava-proxied Claude helper
│   ├── guardrails.py  ← 4 guardrail checks (parallel)
│   ├── seed.py        ← 5 demo invoices
│   └── models.py      ← Pydantic schemas
└── hackathon-info.pdf ← vibeFORWARD brief

For the full build plan, seed-data design, API surface, UI spec, demo script, and judge-specific landing points, see SPEC.md.


Run it locally

Prerequisites

  • Python 3.11+
  • uv — fast Python package manager
    curl -LsSf https://astral.sh/uv/install.sh | sh   # macOS/Linux
    # or: brew install uv

Setup (one-time)

git clone https://github.com/naga-k/trustgate
cd trustgate

# install deps into an isolated .venv
uv sync

# copy the env template and fill in your keys
cp .env.example .env
# edit .env: LAVA_API_KEY is required, TAVILY_API_KEY is optional (falls back to heuristic)

Run the whole app

uv run uvicorn src.main:app --reload --port 8787

One process serves everything:

The reference frontend is intentionally a single static HTML file (Tailwind CDN + vanilla JS) so the whole system runs from one command with no build step, no node_modules, no deploy dependency. Teammates can replace it with a richer frontend (Lovable, Next.js, etc.) and just point at the same API.

Smoke test

# health
curl http://localhost:8787/api/health

# list all 5 seed invoices with their AI recommendations
curl http://localhost:8787/api/invoices | jq

# run all 4 guardrails on the vendor-impersonation invoice
curl -X POST http://localhost:8787/api/invoices/INV-8822/run-guardrails | jq

API routes

Route Purpose
GET /api/health liveness check
GET /api/invoices list all invoices with AI rec + latest verdicts
GET /api/invoices/{id} full detail for one invoice
POST /api/invoices/{id}/run-guardrails fire all 4 checks in parallel
POST /api/invoices/{id}/decide record a human approve/review/block decision
GET /api/audit-log append-only verdict + decision log

Environment variables

Var Required? What
LAVA_API_KEY yes Lava gateway key (aks_live_... or aks_test_...). All Claude traffic routes through Lava.
TAVILY_API_KEY optional Tavily search key for live vendor-legitimacy verification. Without it, Source Verification falls back to a TLD-heuristic.

Team setup

After cloning:

uv sync                     # install deps
cp .env.example .env        # then paste your Lava key (and Tavily if you have one)
uv run uvicorn src.main:app --reload --port 8787

That's it. No global Python install, no pip install juggling, no venv activation ceremony — uv handles all of it.

For the frontend (Lovable-generated React app), see the frontend/ directory once it's committed, or point your Lovable project's API base URL at http://localhost:8787.


Team

Built at vibeFORWARD NYC.


License

MIT — see LICENSE (to be added).

About

TrustGate — the approval layer AI workflows forgot. vibeFORWARD Track 2 (Guardrails & Safe Use).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors