Anchor is a narrow, production-shaped RAG system over a fixed corpus of official English-language RBI Master Directions and SEBI Master Circulars. It answers only from retrieved context, validates citations server-side, and refuses when support is weak.
- Corpus source of truth:
corpus/manifest.yaml - Public surface:
GET /,POST /query,GET /healthz - Internal endpoints:
GET /readyz,GET /metrics - Out of scope: tax law, chat memory, uploads, user accounts, workflow engines, vector DBs other than PostgreSQL + pgvector
- FastAPI
- PostgreSQL + pgvector
- Vertex AI for embeddings and generation
- Cohere Rerank API
- LangChain Core +
langchain-google-vertexaias thin adapters - Langfuse for tracing
- Next.js static export UI
- nginx + systemd deployment artifacts
anchor/apiFastAPI app and endpoint wiringanchor/pipelineretrieval, RRF, refusal rules, generation orchestration, citation validationanchor/ingestmanifest loading, fetch, parse, chunk, embed, upsertanchor/dbconnection pool, repository SQL, migrationsevalgolden set, smoke set, eval entrypointuistatic Next.js frontenddeployDockerfile, nginx config, systemd unitsdocsPRD, architecture, spec, eval summary, sanitized traces
- Create the virtual environment and install dependencies.
make setup- Copy
.env.exampleto.envand fill in:
DATABASE_URLVERTEX_PROJECT_IDVERTEX_LOCATIONGOOGLE_APPLICATION_CREDENTIALSCOHERE_API_KEYLANGFUSE_PUBLIC_KEYLANGFUSE_SECRET_KEY
- Start PostgreSQL with pgvector.
docker compose up -d postgres- Apply the schema.
make migrate- Ingest the allowlisted corpus.
make ingest- Run the API.
make dev- In another shell, build or run the UI.
cd ui
npm install
npm run devRequest:
{
"question": "What are the KYC requirements for small accounts?"
}Answered response shape:
{
"request_id": "uuid",
"status": "answered",
"answer": "Grounded plain-text answer.",
"refusal_reason": null,
"citations": [
{
"chunk_id": "rbi_kyc_2016::chunk_000",
"doc_id": "rbi_kyc_2016",
"doc_title": "Master Direction - Know Your Customer (KYC) Direction, 2016",
"regulator": "RBI",
"section_title": "Customer Due Diligence (CDD) Procedure",
"page": 14,
"source_url": "https://www.rbi.org.in/...",
"quote": "Short supporting excerpt."
}
],
"disclaimer": "Demo only. Not legal or financial advice.",
"latency_ms": 1800
}Refusal response shape:
{
"request_id": "uuid",
"status": "refused",
"answer": "",
"refusal_reason": "insufficient_support",
"citations": [],
"disclaimer": "Demo only. Not legal or financial advice.",
"latency_ms": 900
}make setup
make migrate
make ingest
make test
make lint
make eval-smoke
make eval
make ui-build- Golden set:
eval/golden.jsonl - Smoke subset:
eval/smoke.jsonl - Latest summary:
docs/EVAL.md
Smoke eval is fixture-backed so CI can validate the benchmark path without cloud credentials:
./.venv/bin/python eval/run.py --smoke --fixture-mode --write-docsFull eval runs the direct query service against the indexed corpus and requires real provider credentials plus a populated database:
./.venv/bin/python eval/run.py --write-docs- nginx serves
ui/outand proxies/queryand/healthz - systemd units live under
deploy/systemd - nginx config lives at
deploy/nginx/anchor.conf - the GitHub Actions deploy workflow expects a VPS with Python, Node, nginx, PostgreSQL + pgvector, and
/etc/anchor/anchor.env
- Architecture:
docs/ARCHITECTURE.md - Implementation contract:
docs/SPEC.md - Product requirements:
docs/PRD.md - Sanitized trace snapshot:
docs/traces/query_trace_sanitized.json