NyaaAIk — AI-Powered Legal Research Assistant

Bharat Bricks Hacks 2026 · IIT Delhi Built on Databricks · Powered by Meta Llama 4 Maverick · Indian Law RAG

📖 Project Story & Motivation

The motivation behind NyaaAIk is to democratize the accessibility and affordability of legal resources in India. Navigating the Indian judicial system can be daunting and prohibitively expensive for the average citizen.

NyaaAIk is a Databricks-native legal assistant using Llama 4 Maverick and Sarvam Saaras v3. It employs a state-of-the-art Hybrid RAG architecture: Databricks Genie autonomously chunks statutory laws into Delta Tables, synced natively with Vector Search. Simultaneously, an LLM refines user queries for parallel semantic vector retrieval and live web scraping of recent rulings. This bridges the gap—providing courtroom-grade insight to advocates while translating complex statutory laws into actionable, plain-language guidance for everyday citizens.

What is NyaaAIk?

NyaaAIk is an AI legal assistant for Indian users that:

Searches the Bharatiya Nyaya Sanhita (BNS), BNSS, and BSA using RAG
Finds live court judgments from Indian Kanoon
Understands voice in Hindi, Tamil, Bengali, Telugu and 7 more Indian languages (Sarvam Saaras v3)
Analyzes uploaded PDFs, Word docs, and screenshots using Llama 4 Maverick vision
Responds in two modes: Advocate (courtroom English) and Citizen (plain language)
Works in Dark and Light themes

Tech Stack

Layer	Tool	Purpose
LLM	Meta Llama 4 Maverick (via Databricks AI Gateway)	Chat, RAG answers, document vision
Vector Search	Databricks Vector Search	Semantic search over BNS/BNSS/BSA legal corpus
Live Cases	Indian Kanoon API	Real-time court judgment retrieval
STT	Sarvam Saaras v3 (`saaras:v3`)	Indian language voice → text (10 languages)
Backend	Python + Flask	REST API server
Frontend	React + Vite	Single-page app
Deployment	Databricks Apps	Hosting on Databricks workspace
Secrets	Databricks Secret Scopes	Secure API key storage
Storage	In-memory (per-session)	Uploaded documents + chat context
Persistence	Browser localStorage	Chat history across sessions

🏛️ System Architecture & Data Flow

flowchart TD
    %% Base Inputs
    U[User Query] --> Refiner[LLM Query Refiner<br/>Llama 4 Maverick]

    %% Ingestion & Vector Search Pipeline
    subgraph DataPlatform[Databricks Data Intelligence Platform]
        direction TB
        BNS[(Delta Tables<br/>BNS / BNSS / BSA)]
        FallBk["/Planned: Chunk Set 2<br/>2000 Case Fallback<br/>Not done — data unavailable in cluster"/]

        Genie([Databricks Genie<br/>Auto-Chunking]) --> BNS
        Genie -.-> FallBk

        BNS --> VS1[(Databricks Vector Search<br/>Index 1)]
        FallBk -.-> VS2["/Planned: Vector Search Index 2<br/>Not done — data unavailable in cluster"/]
    end

    %% Live Searching
    subgraph WebScraping[Live Web Scraping]
        Kanoon([Indian Kanoon API])
    end

    %% Retrieval Layer
    Refiner -- Semantic Query --> VS1 & VS2
    Refiner -- Refined Keywords --> Kanoon

    VS1 -- Highly Relevant Statutes --> Synth
    VS2 -.->|Data Unavailable| Synth
    Kanoon -- Top 5-10 Live Cases --> Synth

    %% Synthesis Layer
    subgraph LLM[LLM Synthesis Engine]
        direction TB
        Persona["UI Prompts/Modifiers<br/>Advocate vs Citizen"] --> Synth["Meta Llama 4 Maverick<br/>Open Source LLM"]
        U -- Original Context --> Synth
    end

    %% Output
    Synth --> Out["Structured Legal Response<br/>+ Properly Cited Cases"]

    classDef db fill:#ff3621,stroke:#333,stroke-width:1px,color:#fff;
    classDef os fill:#1a73e8,stroke:#333,stroke-width:1px,color:#fff;
    classDef planned fill:#f0f0f0,stroke:#666,stroke-width:2px,stroke-dasharray: 5 5,color:#666;

    class VS1,Refiner db;
    class Synth os;
    class FallBk,VS2 planned;

Note: Highlighted components leverage the open-source Meta Llama 4 Maverick model and extensive Databricks enterprise infrastructure.

🗃️ Databricks Notebook Pipeline — How We Build the RAG Index

The entire data pipeline that powers NyaaAIk's legal knowledge base is run through two Databricks notebooks, executed in sequence inside the workspace. The diagram below shows exactly what each notebook does and how data flows end-to-end from raw law files to the deployed app.

End-to-End Pipeline Diagram

flowchart TD
    classDef notebook fill:#1a3c6e,stroke:#4a90d9,stroke-width:2px,color:#fff
    classDef delta fill:#cc3300,stroke:#ff6644,stroke-width:2px,color:#fff
    classDef volume fill:#2d6a4f,stroke:#52b788,stroke-width:2px,color:#fff
    classDef app fill:#1a5c1a,stroke:#52b788,stroke-width:2px,color:#fff
    classDef source fill:#444,stroke:#999,stroke-width:1px,color:#fff

    %% ── RAW DATA SOURCES ─────────────────────────────────────────────────
    subgraph Sources["Raw Data Sources"]
        CSV["BNS Sections CSV<br/>Volume upload or GitHub mirror"]
        PDF["Official Law PDFs<br/>BNS / BNSS / BSA / Constitution<br/>MHA Gazette / Indiacode.nic.in"]
        HF["Hugging Face Dataset<br/>IPC to BNS Mapping"]
    end

    %% ── NOTEBOOK 1 ───────────────────────────────────────────────────────
    subgraph NB1["NOTEBOOK 1 — india_legal_policy_ingest.ipynb"]
        direction TB
        N1_0["Step 0: Install Dependencies<br/>requests pandas bs4 pymupdf openpyxl"]
        N1_1["Step 1: Config and Helpers<br/>Read secrets from nyaya-dhwani scope<br/>CATALOG=main / SCHEMA=india_legal"]
        N1_1b["Step 1b: Sarvam Connectivity Test<br/>Optional ping to sarvam.ai"]
        N1_2["Step 2: Unity Catalog Setup<br/>CREATE DATABASE main.india_legal<br/>CREATE VOLUME legal_files"]
        N1_3["Step 3: Load BNS Sections<br/>1 Try Volume CSV / 2 GitHub mirrors<br/>3 PDF + ai_parse_document fallback"]
        N1_4["Step 3c: Download Official Law PDFs<br/>BNS / BNSS / BSA / Constitution<br/>Saved to Unity Catalog Volume"]
        N1_5["Step 4: BNS to IPC Mapping Table<br/>Fetch from Hugging Face<br/>Fallback: 7-row built-in stub"]
        N1_6["Step 5: Build legal_rag_corpus<br/>Merge BNS + IPC mapping chunks<br/>Save as Delta Table"]
        N1_7["Step 6: Verify Tables<br/>SHOW TABLES IN main.india_legal<br/>SELECT preview from legal_rag_corpus"]

        N1_0 --> N1_1 --> N1_1b --> N1_2 --> N1_3 --> N1_4 --> N1_5 --> N1_6 --> N1_7
    end

    %% ── DELTA TABLES ─────────────────────────────────────────────────────
    subgraph DeltaLayer["Delta Lake — main.india_legal"]
        T1["bns_sections<br/>Delta Table"]
        T2["bns_ipc_mapping<br/>Delta Table"]
        T3["legal_rag_corpus<br/>chunk_id / source / doc_type / title / text"]
        T4["bns_parsed_elements<br/>Optional — PDF ai_parse_document path"]
    end

    %% ── NOTEBOOK 2 ───────────────────────────────────────────────────────
    subgraph NB2["NOTEBOOK 2 — build_rag_index.ipynb"]
        direction TB
        N2_0["Cell 1: Set REPO_ROOT<br/>Edit path to your workspace clone"]
        N2_1["Cell 1 run: Install RAG Stack<br/>numpy / faiss-cpu 1.7.x / sentence-transformers<br/>pip install -e repo rag rag_embed"]
        N2_2["Cell 2: Load Corpus from Delta<br/>SELECT from legal_rag_corpus to Pandas"]
        N2_3["Cell 3: Embed All Chunks<br/>all-MiniLM-L6-v2 / normalize / float32"]
        N2_4["Cell 4: Save RAG Artifacts<br/>corpus.faiss / chunks.parquet / manifest.json<br/>to nyaya_index Volume"]
        N2_5["Cell 5: Smoke Test<br/>Load index / search theft under BNS / top 5"]

        N2_0 --> N2_1 --> N2_2 --> N2_3 --> N2_4 --> N2_5
    end

    subgraph NB3["NOTEBOOK 3 — setup_vector_search.py (Optional)"]
        direction TB
        N3_1["Create VS Endpoint: nyaya_vs_endpoint<br/>STANDARD type / 20-50ms latency"]
        N3_2["Enable CDF on legal_rag_corpus<br/>Required for Delta Sync index"]
        N3_3["Create Delta Sync Index<br/>legal_rag_corpus_index<br/>Embedding: databricks-bge-large-en"]
        N3_4["Trigger Sync and Wait for ONLINE<br/>Grant CAN_QUERY to App service principal"]

        N3_1 --> N3_2 --> N3_3 --> N3_4
    end

    subgraph NB4["NOTEBOOK 4 — run_benchmark.py (Evaluation)"]
        direction TB
        B1["Phase 1: Retrieval Eval<br/>FAISS vs Vector Search<br/>Recall@7 and MRR"]
        B2["Phase 2: MCQ Accuracy<br/>Internal + BhashaBench-Legal<br/>English and Hindi"]
        B3["Phase 3: Open-Ended Quality<br/>Keyword coverage / chunk recall"]
        B4["Phase 4: Multilingual<br/>Sarvam translation round-trip<br/>Save results to Delta"]

        B1 --> B2 --> B3 --> B4
    end

    %% ── VOLUME OUTPUT ────────────────────────────────────────────────────
    subgraph VolumeOut["Unity Catalog Volume — nyaya_index"]
        V1["nyaya_index/<br/>corpus.faiss — FAISS index<br/>chunks.parquet — metadata<br/>manifest.json — model info"]
    end

    %% ── DEPLOYED APP ─────────────────────────────────────────────────────
    subgraph AppLayer["Deployed NyaaAIk — Databricks Apps"]
        Retriever["retriever.py<br/>Vector Search + FAISS fallback"]
        Flask["Flask API app/main.py<br/>/api/chat / /api/transcribe / /api/upload"]
        React["React Frontend<br/>Advocate / Citizen / Voice / Upload"]
    end

    %% ── DATA FLOW ────────────────────────────────────────────────────────
    CSV --> N1_3
    HF  --> N1_5
    PDF --> N1_4

    N1_3 --> T1
    N1_5 --> T2
    N1_6 --> T3
    N1_3 -.->|PDF path only| T4

    T3 --> N2_2
    N2_4 --> V1

    T3 -.->|Optional VS setup| N3_2
    N3_3 --> VS_IDX[("legal_rag_corpus_index<br/>Databricks Vector Search")]

    V1  --> Retriever
    VS_IDX --> Retriever
    T3  --> Retriever
    Retriever --> Flask --> React

    V1 -.->|Evaluation| B1
    T3 -.->|Evaluation| B1

    %% ── CLASS ASSIGNMENTS ────────────────────────────────────────────────
    class NB1,NB2,NB3,NB4 notebook
    class T1,T2,T3,T4,VS_IDX delta
    class V1 volume
    class Retriever,Flask,React app
    class CSV,PDF,HF source

🚀 Running on Databricks — Notebook-by-Notebook Guide

Follow these steps in order inside your Databricks workspace. No local Python needed for data setup.

Prerequisites

Before running any notebook, ensure:

Databricks workspace with Unity Catalog enabled
Secret scope nyaya-dhwani created (see Deployment Step 4) containing:
- sarvam_api_key
- indian_kanoon_api_token
- datagov_api_key (optional)
Repo cloned into Workspace → Repos (see Step 8)
A compute cluster or Serverless notebook attached

⚠️ First-Time Workspace Setup — If You Get `catalog 'main' not found`

This error means either:

Unity Catalog is not fully configured in your workspace, or
The main catalog doesn't exist yet, or
You are on a trial workspace that uses a different default catalog name

Fix Option A — Create the catalog and schema manually (recommended):

Open any Databricks notebook, attach to a cluster, and run this before Notebook 1:

-- Run this in a SQL cell or notebook
CREATE CATALOG IF NOT EXISTS main;
CREATE SCHEMA  IF NOT EXISTS main.india_legal;
CREATE VOLUME  IF NOT EXISTS main.india_legal.legal_files;

Or via the UI: Catalog → + Add → Add a catalog → name it main → Create.

Fix Option B — Use your own catalog name:

If your workspace has a different catalog (e.g. hive_metastore or a custom one), open Notebook 1 and edit Step 1 to change:

CATALOG  = 'main'        # ← change to your catalog name, e.g. 'hive_metastore'
SCHEMA   = 'india_legal' # ← keep as-is or rename
VOLUME   = 'legal_files' # ← keep as-is or rename

⚠️ If you change CATALOG, also update app.yaml and setup_vector_search.py to use the same name consistently.

Verify Unity Catalog is active:

SHOW CATALOGS;
-- Should list 'main' and your workspace catalog

STEP 0 — Upload BNS / BNSS / BSA Data & Convert to Delta Tables using Genie

Do this before running Notebook 1. The notebooks need the raw legal text data to exist somewhere accessible. You have two paths:

Path A — Upload CSV to the Volume (Fastest)

Download the BNS sections CSV from Kaggle:
👉 kaggle.com/datasets/nandr39/bharatiya-nyaya-sanhita-dataset-bns
In your Databricks workspace, go to:
Catalog → main → india_legal → Volumes → legal_files → Upload to this volume
Upload bns_sections.csv (rename it exactly to bns_sections.csv if needed)
Notebook 1 → Step 3 will automatically find it at /Volumes/main/india_legal/legal_files/bns_sections.csv

Path B — Use Databricks Genie to Convert Raw Files to Delta Tables (Recommended for full corpus)

Databricks Genie can read raw PDFs, CSVs, and text directly from your Volume and convert them into structured Delta tables automatically.

Step-by-step:

Upload raw law files to the Volume:
- Go to Catalog → main → india_legal → Volumes → legal_files → Upload
- Upload any of the following you have:
  - bns_sections.csv — BNS section text (structured)
  - bns_2023.pdf — Official BNS Gazette PDF (MHA)
  - bnss_2023.pdf — BNSS Gazette PDF
  - bsa_2023.pdf — BSA Gazette PDF
  - constitution_of_india.pdf — Constitution PDF
Open Databricks Genie:
- Go to New → Genie Space in the left sidebar
- Or navigate to Genie from the sidebar

Tell Genie what to do — example prompts you can use:

I have uploaded bns_sections.csv to /Volumes/main/india_legal/legal_files/.
Please read it and save it as a Delta table called main.india_legal.bns_sections
with columns: section_number, section_title, section_text, source

Read the PDF at /Volumes/main/india_legal/legal_files/bns_2023.pdf,
extract all sections with their numbers and text,
and save as Delta table main.india_legal.bns_sections

Verify Genie created the table:

SELECT COUNT(*) FROM main.india_legal.bns_sections;
SELECT * FROM main.india_legal.bns_sections LIMIT 5;

Once the table exists, Notebook 1 Step 5 (Build Corpus) will pick it up automatically when building legal_rag_corpus

💡 Tip: Genie works best with structured CSVs. For PDFs, the notebook's ai_parse_document fallback in Step 3b does the same thing automatically — so if Genie struggles with a PDF, just upload the file to the volume and let Notebook 1 handle it.

💡 Tip: If you don't have any files yet — Notebook 1 Step 3 will automatically download the official PDFs from the MHA Gazette and extract sections for you. You don't need to upload anything manually in that case.

NOTEBOOK 1 — Data Ingestion

File: notebooks/india_legal_policy_ingest.ipynb

Open this notebook in your Databricks workspace. Run each cell in order:

Cell	What It Does
Step 0	Installs Python packages: `requests`, `pandas`, `beautifulsoup4`, `lxml`, `openpyxl`, `pymupdf`
Step 1	Reads secrets from `nyaya-dhwani` scope. Sets `CATALOG=main`, `SCHEMA=india_legal`, `VOLUME=legal_files`
Step 1b (optional)	Pings Sarvam API with a test message to confirm the key works
Step 2	Creates `main.india_legal` schema and `legal_files` Volume. If you get `catalog not found` — run the First-Time Setup SQL above first, or change `CATALOG` in Step 1
Step 3	Loads BNS sections — tries Volume CSV first, then GitHub mirrors, then PDF fallback
Step 3a	GitHub mirror fallback (only runs if Step 3 Volume CSV is missing)
Step 3b	PDF + `ai_parse_document` + Sarvam enrichment (only if CSV and mirrors both fail)
Step 3c	Downloads official law PDFs from MHA Gazette to the Unity Catalog Volume
Step 3d	Saves `bns_sections` as a Delta table
Step 4	Fetches BNS ↔ IPC mapping from Hugging Face. Falls back to a 7-row built-in stub
Build Corpus	Merges all sources into `legal_rag_corpus` Delta table (`chunk_id`, `source`, `doc_type`, `title`, `text`)
Verify	`SHOW TABLES` + preview query to confirm data is populated

Expected output after this notebook:

✅ Schema : main.india_legal
✅ Volume : /Volumes/main/india_legal/legal_files
🏛️  legal_rag_corpus: <N> total chunks
  BNS_2023         → <N> rows
  BNS_IPC_MAPPING  → <N> rows

💡 Tip — Manual CSV upload: If Step 3 fails to find a BNS CSV, download one from
Kaggle: Bharatiya Nyaya Sanhita Dataset,
upload via Catalog → Volumes → legal_files → Upload to this volume, then re-run Step 3.

NOTEBOOK 2 — Build the RAG Index

File: notebooks/build_rag_index.ipynb

⚠️ Run this notebook AFTER Notebook 1 completes — it reads from legal_rag_corpus.

Open this notebook in Databricks. Run each cell in order:

Cell	What It Does
Cell 1 — Edit REPO_ROOT	Set `REPO_ROOT` to your cloned repo path. Right-click the repo in the Workspace sidebar → Copy path. Example: `/Workspace/Users/you@domain.com/nyaya-dhwani-hackathon`
Cell 1 — Run	Installs RAG stack: `numpy<2`, `faiss-cpu<1.8`, `sentence-transformers`, `pyarrow`, `pandas`. Then runs `pip install -e <REPO_ROOT>[rag,rag_embed]` to make `import nyaya_dhwani` work
Cell 2	Loads corpus from Delta: `main.india_legal.legal_rag_corpus` → Pandas DataFrame
Cell 3	Embeds all text chunks using `sentence-transformers/all-MiniLM-L6-v2` with L2 normalization
Cell 4	Saves three RAG artifacts to the Unity Catalog Volume
Cell 5	Smoke test: loads the index and runs `"What is theft under BNS?"` → prints top-5 results

Expected output:

✅ pip: numpy 1.x, pandas<3, faiss-cpu 1.7.x, pyarrow, sentence-transformers
✅ import nyaya_dhwani → /Workspace/Users/.../src/nyaya_dhwani
✅ import faiss OK
(<N_chunks>, 384)   ← embedding dimensions
✅ Artifacts saved to /Volumes/main/india_legal/legal_files/nyaya_index/

Output artifacts saved to:

/Volumes/main/india_legal/legal_files/nyaya_index/
├── corpus.faiss      ← FAISS similarity index
├── chunks.parquet    ← chunk metadata (id, source, title, text)
└── manifest.json     ← embedding model name, catalog, schema, timestamp

💡 If import nyaya_dhwani fails: Ensure REPO_ROOT exactly matches the path shown in the Workspace sidebar. Do not add %restart_python in the install cell — it breaks the kernel before later cells run.

After Both Notebooks Complete

The retriever.py module in the deployed app automatically loads the FAISS index from the Volume path. No additional steps are needed — just deploy the app as described in Steps 9–10 below.

Repository Structure

nyaya-dhwani-hackathon/
│
├── app/
│   ├── main.py              ← Flask backend (all API routes)
│   └── static/              ← Built React app (DO NOT edit manually)
│       ├── index.html
│       └── assets/
│
├── notebooks/
│   ├── india_legal_policy_ingest.ipynb  ← NOTEBOOK 1: Data ingestion → Delta tables
│   ├── build_rag_index.ipynb            ← NOTEBOOK 2: Build FAISS index → Volume
│   ├── setup_vector_search.py           ← NOTEBOOK 3: Create VS endpoint + Delta Sync index (optional)
│   └── run_benchmark.py                 ← NOTEBOOK 4: RAG evaluation — retrieval / MCQ / multilingual
│
├── frontend/
│   ├── src/
│   │   ├── App.jsx          ← Root component + theme + state
│   │   ├── index.css        ← All styles (dark + light themes)
│   │   ├── components/
│   │   │   ├── Topbar.jsx        ← Header + theme toggle
│   │   │   ├── Sidebar.jsx       ← Chat history + docs list
│   │   │   ├── ControlsBar.jsx   ← Persona + Court + Style controls
│   │   │   ├── TopicsChips.jsx   ← Quick topic selector
│   │   │   ├── WelcomeScreen.jsx ← First-load suggestions
│   │   │   ├── ChatWindow.jsx    ← Message thread
│   │   │   ├── InputBar.jsx      ← Text input + mic + upload
│   │   │   └── UploadModal.jsx   ← Document upload UI
│   │   ├── hooks/
│   │   │   ├── useChat.js        ← Chat state + API calls
│   │   │   └── useSarvamVoice.js ← MediaRecorder → Sarvam STT
│   │   └── utils/
│   │       ├── api.js            ← All fetch calls to Flask
│   │       └── icons.jsx         ← SVG icon components
│   ├── package.json
│   └── vite.config.js
│
├── src/
│   └── nyaya_dhwani/
│       ├── sarvam_client.py  ← Sarvam API helpers (STT, TTS, translate)
│       ├── llm_client.py     ← Maverick LLM calls
│       ├── retriever.py      ← Databricks Vector Search + FAISS fallback
│       └── case_search.py    ← Indian Kanoon API integration
│
├── app.yaml                  ← Databricks App config
├── setup_secrets.py          ← Script to create secret scope
└── README.md

Complete Deployment Guide (Step by Step)

Follow every step in order. Do not skip.

STEP 1 — Prerequisites

Before starting, make sure you have:

A Databricks workspace (Community Edition will NOT work — you need Databricks Apps access)
Access to Databricks AI Gateway with Llama 4 Maverick configured
A Vector Search endpoint with the BNS legal corpus indexed
An Indian Kanoon API token → get it from indiankanoon.org
A Sarvam API key → get it from dashboard.sarvam.ai
Git installed locally
Node.js 18+ installed locally (for building the frontend)

STEP 2 — Clone the Repository

git clone https://github.com/gaganTakIITD/NyaaAIk.git
cd NyaaAIk

STEP 3 — Configure `app.yaml`

Open app.yaml. Update the values to match your Databricks workspace:

command: ["python", "app/main.py"]

env:
  - name: LLM_OPENAI_BASE_URL
    value: "https://<YOUR-GATEWAY-ID>.ai-gateway.cloud.databricks.com/mlflow/v1"

  - name: LLM_MODEL
    value: "databricks-llama-4-maverick"

  - name: NYAYA_RETRIEVAL_BACKEND
    value: "vector_search"

  - name: NYAYA_VS_ENDPOINT_NAME
    value: "nyaya_vs_endpoint"         # ← your Vector Search endpoint name

  - name: NYAYA_VS_INDEX_NAME
    value: "main.india_legal.legal_rag_corpus_index"  # ← your index

  - name: SARVAM_STT_MODEL
    value: "saaras:v3"

⚠️ Do NOT put API keys here. They are loaded from Databricks Secrets (Step 4).

STEP 4 — Set Up Databricks Secret Scope

This stores API keys securely. Run the following in a Databricks notebook (not locally):

Cell 1 — Get workspace context

import requests

ctx   = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
HOST  = ctx.apiUrl().get()
TOKEN = ctx.apiToken().get()
HEADERS = {"Authorization": f"Bearer {TOKEN}"}
print("Host:", HOST)

Cell 2 — Create the secret scope

r = requests.post(f"{HOST}/api/2.0/secrets/scopes/create",
    headers=HEADERS,
    json={"scope": "nyaya-dhwani", "initial_manage_principal": "users"})
print(r.status_code, r.text)
# 200 = created   |   RESOURCE_ALREADY_EXISTS = already exists (both OK)

Cell 3 — Store Indian Kanoon API token

r = requests.post(f"{HOST}/api/2.0/secrets/put",
    headers=HEADERS,
    json={
        "scope": "nyaya-dhwani",
        "key":   "indian_kanoon_api_token",
        "string_value": "YOUR_INDIAN_KANOON_TOKEN_HERE"
    })
print(r.status_code, r.text)

Cell 4 — Store Sarvam API key

r = requests.post(f"{HOST}/api/2.0/secrets/put",
    headers=HEADERS,
    json={
        "scope": "nyaya-dhwani",
        "key":   "sarvam_api_key",
        "string_value": "YOUR_SARVAM_API_KEY_HERE"
    })
print(r.status_code, r.text)

Cell 5 — Verify

r = requests.get(f"{HOST}/api/2.0/secrets/list?scope=nyaya-dhwani", headers=HEADERS)
print(r.json())
# Expected: {'secrets': [{'key': 'indian_kanoon_api_token', ...}, {'key': 'sarvam_api_key', ...}]}

✅ Values always show as [REDACTED] in Databricks — that is correct behaviour.

STEP 5 — Add App Resources (Grant Secret Access)

In the Databricks Apps UI, before deploying, add both secrets as resources so the app's service principal can read them:

Go to Compute → Apps → (your app) → Resources
Add first resource:
- Secret scope: nyaya-dhwani
- Secret key: indian_kanoon_api_token
- Permission: Can read
- Resource key: secret1
Add second resource:
- Secret scope: nyaya-dhwani
- Secret key: sarvam_api_key
- Permission: Can read
- Resource key: secret2

⚠️ Without these resource declarations, the app cannot read the secrets even if they exist in the scope.

STEP 5b — Run Notebook 1: Data Ingestion (inside Databricks)

Navigate to Workspace → Repos → NyaaAIk → notebooks
Open india_legal_policy_ingest.ipynb
Attach to a cluster (Standard or Serverless)
Run all cells in order — see the Notebook 1 guide above

STEP 5c — Run Notebook 2: Build RAG Index (inside Databricks)

Run this after Notebook 1 completes successfully.

Navigate to notebooks → build_rag_index.ipynb
Edit REPO_ROOT in Cell 1 to match your workspace path
Run all cells in order — see the Notebook 2 guide above

STEP 6 — Build the Frontend (Run Locally)

The built files must be committed to git. Run this on your local machine (not Databricks):

cd frontend
npm install
npm run build
cd ..

This outputs compiled files to app/static/. Verify:

app/static/index.html          ← must exist
app/static/assets/index-*.js   ← must exist
app/static/assets/index-*.css  ← must exist

⚠️ Never run npm run build on Databricks — Node.js is not available there.

STEP 7 — Push to GitHub

git add -A
git commit -m "deploy: update config and frontend build"
git push origin main

STEP 8 — Connect Repo to Databricks Workspace

In your Databricks workspace, go to Workspace → Repos → Add Repo
URL: https://github.com/gaganTakIITD/NyaaAIk.git
Click Create Repo
After creation, click Pull to get the latest code

STEP 9 — Deploy the App

Go to Compute → Apps → Create App
Choose Custom App
Set Source to the repo you added in Step 8
Set App file: app.yaml
Add the two secret resources (Step 5)
Click Deploy

The app starts. Watch the Logs tab for:

INFO - Indian Kanoon API: configured
INFO - Sarvam STT (Saaras): configured
INFO - Starting NyaaAIk on 0.0.0.0:8000

If you see NOT configured for any key → re-check Steps 4 and 5.

STEP 10 — Access the App

Click the URL shown in the Apps dashboard. You should see the NyaaAIk interface.

What to Do After Any Code Change

Every time you change code locally:

# If you changed frontend code (React/CSS):
cd frontend
npm run build
cd ..

# Always:
git add -A
git commit -m "your message"
git push origin main

# Then in Databricks:
# Workspace → Repos → NyaaAIk → Pull
# Compute → Apps → NyaaAIk → Deploy (or Restart)

What NOT to Do

❌ Don't	✅ Do instead
Put API keys in `app.yaml`	Use Databricks Secret Scope (Step 4)
Edit files inside `app/static/` manually	Edit source in `frontend/src/`, then `npm run build`
Run `npm run build` on Databricks	Build locally, commit the output
Commit after editing source but before building	Always build before committing
Use Windows line endings (CRLF) for Python/YAML	Keep `.gitattributes` — it enforces LF
Skip the resource declarations (Step 5)	Always add both secrets as App Resources
Run Notebook 2 before Notebook 1 finishes	Wait for `legal_rag_corpus` table to be created first

API Endpoints

Endpoint	Method	Purpose
`/api/chat`	POST	Main RAG query (persona, court, style, history, doc_ids)
`/api/upload`	POST	Upload PDF/DOCX/image for context
`/api/transcribe`	POST	Audio blob → Sarvam Saaras STT → transcript
`/api/topics`	GET	Legal topic seeds for quick chips
`/api/health`	GET	Health check
`/api/documents/:id`	DELETE	Remove an uploaded document

Features Reference

Persona

Persona	Response style
Advocate	Formal courtroom English, statutory citations (BNS/BNSS/BSA), ratio decidendi, legal doctrines
Citizen	Plain everyday language, jargon explained in brackets, practical step-by-step advice

Voice Input (Sarvam)

Click the mic button → select language → speak → stop
Supported: Hindi, English, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi
Audio is recorded in browser, sent to /api/transcribe, processed by Sarvam Saaras v3

Document Upload

PDF: text extracted with pypdf
DOCX: text extracted with python-docx
Images / Screenshots: sent as base64 to Llama 4 Maverick vision API
Ctrl+V: paste a screenshot directly into the input box
Documents are session-scoped (in server memory) — re-upload after app restarts

Theme

Dark: #131314 background, pure blue #4dabf7 accent
Light: #ffffff background, Google blue #1a73e8 accent
Toggle: ☀️/🌙 button top-right — saved to browser localStorage

Troubleshooting

Error	Cause	Fix
`catalog 'main' not found`	`main` catalog doesn't exist in this workspace	Run `CREATE CATALOG IF NOT EXISTS main;` in a SQL notebook first, or change `CATALOG = 'main'` in Notebook 1 Step 1
`schema 'main.india_legal' not found`	Schema not created yet	Run Step 2 of Notebook 1, or `CREATE SCHEMA IF NOT EXISTS main.india_legal;` manually
`volume 'legal_files' not found`	Volume not created yet	Run Step 2 of Notebook 1, or `CREATE VOLUME IF NOT EXISTS main.india_legal.legal_files;` manually
`Unexpected token '<'`	API returned HTML instead of JSON	Check app logs for Python errors
`Sarvam STT: NOT configured`	Key not loaded	Re-check secret scope + app resources
`Indian Kanoon: NOT configured`	Same	Same
UI loads but chat fails	Backend import error	Check Databricks app logs
`CRLF` / YAML parse error	Windows line endings in git	`.gitattributes` handles this; don't override
Voice mic button missing	Browser doesn't support MediaRecorder	Use Chrome or Edge
PDF upload fails	File > 20 MB	Compress or split the PDF
`Cannot import nyaya_dhwani`	Wrong `REPO_ROOT` in Notebook 2	Copy path from Workspace sidebar, set exactly
`legal_rag_corpus` table missing	Notebook 1 not run yet	Run `india_legal_policy_ingest.ipynb` first
FAISS index not found	Notebook 2 not run yet	Run `build_rag_index.ipynb` after Notebook 1
No BNS data — CSV missing	Volume CSV not uploaded	Download from Kaggle, upload to Volume, re-run Step 3
Sarvam 403 in Notebook	Key invalid or no billing	Check dashboard.sarvam.ai

Local Development (Optional)

To run locally for testing UI changes (no Databricks connection):

# Terminal 1 — backend (will error on Databricks-specific imports, use mock mode)
cd app
python main.py

# Terminal 2 — frontend dev server with proxy
cd frontend
npm run dev

Open http://localhost:5173

Note: Vector search and live cases won't work locally without Databricks credentials.

Built for Bharat Bricks Hacks 2026 · IIT Delhi · Team NyaaAIk

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.cursor		.cursor
app		app
docs		docs
frontend		frontend
notebooks		notebooks
src/nyaya_dhwani		src/nyaya_dhwani
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.yaml		app.yaml
project_writeup.md		project_writeup.md
pyproject.toml		pyproject.toml
requirements-app.txt		requirements-app.txt
requirements.txt		requirements.txt
setup_secrets.py		setup_secrets.py

Folders and files

Latest commit

History

Repository files navigation

NyaaAIk — AI-Powered Legal Research Assistant

📖 Project Story & Motivation

What is NyaaAIk?

Tech Stack

🏛️ System Architecture & Data Flow

🗃️ Databricks Notebook Pipeline — How We Build the RAG Index

End-to-End Pipeline Diagram

🚀 Running on Databricks — Notebook-by-Notebook Guide

Prerequisites

⚠️ First-Time Workspace Setup — If You Get catalog 'main' not found

STEP 0 — Upload BNS / BNSS / BSA Data & Convert to Delta Tables using Genie

Path A — Upload CSV to the Volume (Fastest)

Path B — Use Databricks Genie to Convert Raw Files to Delta Tables (Recommended for full corpus)

NOTEBOOK 1 — Data Ingestion

NOTEBOOK 2 — Build the RAG Index

After Both Notebooks Complete

Repository Structure

Complete Deployment Guide (Step by Step)

STEP 1 — Prerequisites

STEP 2 — Clone the Repository

STEP 3 — Configure app.yaml

STEP 4 — Set Up Databricks Secret Scope

STEP 5 — Add App Resources (Grant Secret Access)

STEP 5b — Run Notebook 1: Data Ingestion (inside Databricks)

STEP 5c — Run Notebook 2: Build RAG Index (inside Databricks)

STEP 6 — Build the Frontend (Run Locally)

STEP 7 — Push to GitHub

STEP 8 — Connect Repo to Databricks Workspace

STEP 9 — Deploy the App

STEP 10 — Access the App

What to Do After Any Code Change

What NOT to Do

API Endpoints

Features Reference

Persona

Voice Input (Sarvam)

Document Upload

Theme

Troubleshooting

Local Development (Optional)

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

⚠️ First-Time Workspace Setup — If You Get `catalog 'main' not found`

STEP 3 — Configure `app.yaml`

Packages