Skip to content

VectorSigmaOmega/Anchor

Repository files navigation

Anchor

Anchor is a narrow, production-shaped RAG system over a fixed corpus of official English-language RBI Master Directions and SEBI Master Circulars. It answers only from retrieved context, validates citations server-side, and refuses when support is weak.

Scope

  • Corpus source of truth: corpus/manifest.yaml
  • Public surface: GET /, POST /query, GET /healthz
  • Internal endpoints: GET /readyz, GET /metrics
  • Out of scope: tax law, chat memory, uploads, user accounts, workflow engines, vector DBs other than PostgreSQL + pgvector

Stack

  • FastAPI
  • PostgreSQL + pgvector
  • Vertex AI for embeddings and generation
  • Cohere Rerank API
  • LangChain Core + langchain-google-vertexai as thin adapters
  • Langfuse for tracing
  • Next.js static export UI
  • nginx + systemd deployment artifacts

Repo Map

  • anchor/api FastAPI app and endpoint wiring
  • anchor/pipeline retrieval, RRF, refusal rules, generation orchestration, citation validation
  • anchor/ingest manifest loading, fetch, parse, chunk, embed, upsert
  • anchor/db connection pool, repository SQL, migrations
  • eval golden set, smoke set, eval entrypoint
  • ui static Next.js frontend
  • deploy Dockerfile, nginx config, systemd units
  • docs PRD, architecture, spec, eval summary, sanitized traces

Local Setup

  1. Create the virtual environment and install dependencies.
make setup
  1. Copy .env.example to .env and fill in:
  • DATABASE_URL
  • VERTEX_PROJECT_ID
  • VERTEX_LOCATION
  • GOOGLE_APPLICATION_CREDENTIALS
  • COHERE_API_KEY
  • LANGFUSE_PUBLIC_KEY
  • LANGFUSE_SECRET_KEY
  1. Start PostgreSQL with pgvector.
docker compose up -d postgres
  1. Apply the schema.
make migrate
  1. Ingest the allowlisted corpus.
make ingest
  1. Run the API.
make dev
  1. In another shell, build or run the UI.
cd ui
npm install
npm run dev

Query Contract

Request:

{
  "question": "What are the KYC requirements for small accounts?"
}

Answered response shape:

{
  "request_id": "uuid",
  "status": "answered",
  "answer": "Grounded plain-text answer.",
  "refusal_reason": null,
  "citations": [
    {
      "chunk_id": "rbi_kyc_2016::chunk_000",
      "doc_id": "rbi_kyc_2016",
      "doc_title": "Master Direction - Know Your Customer (KYC) Direction, 2016",
      "regulator": "RBI",
      "section_title": "Customer Due Diligence (CDD) Procedure",
      "page": 14,
      "source_url": "https://www.rbi.org.in/...",
      "quote": "Short supporting excerpt."
    }
  ],
  "disclaimer": "Demo only. Not legal or financial advice.",
  "latency_ms": 1800
}

Refusal response shape:

{
  "request_id": "uuid",
  "status": "refused",
  "answer": "",
  "refusal_reason": "insufficient_support",
  "citations": [],
  "disclaimer": "Demo only. Not legal or financial advice.",
  "latency_ms": 900
}

Important Commands

make setup
make migrate
make ingest
make test
make lint
make eval-smoke
make eval
make ui-build

Evaluation

Smoke eval is fixture-backed so CI can validate the benchmark path without cloud credentials:

./.venv/bin/python eval/run.py --smoke --fixture-mode --write-docs

Full eval runs the direct query service against the indexed corpus and requires real provider credentials plus a populated database:

./.venv/bin/python eval/run.py --write-docs

Deployment Notes

  • nginx serves ui/out and proxies /query and /healthz
  • systemd units live under deploy/systemd
  • nginx config lives at deploy/nginx/anchor.conf
  • the GitHub Actions deploy workflow expects a VPS with Python, Node, nginx, PostgreSQL + pgvector, and /etc/anchor/anchor.env

Review Aids

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors