Available for opportunities

GenAI Engineer

Santhosh Kammari

Sole architect of AI systems deployed at the Reserve Bank of India.
Shipping LLM pipelines, hybrid retrieval engines, and multi-agent systems to production.

13.8K LOC
RBI Decision Support System Sole architect · 57 modules · 5 GPU microservices
3 weeks
End-to-end delivery Ingestion pipeline → query engine → production APIs
40+ rules
Trade Finance Engine UCP 600 compliance · deployed at banking clients
85% accuracy
BrowseComp Benchmark Multi-agent system · 1000+ web pages
PythonvLLMMilvusFastAPILangGraphPyTorchQwen3RAGDockerHuggingFace

01. About

I'm a GenAI Engineer at Newgen Software (Number Theory Group) with 2+ years building production AI/ML systems. I specialize in LLM-powered document intelligence, hybrid retrieval systems (RAG + Text2SQL), and multi-agent pipelines for financial and regulatory document processing.

I've been the sole architect of AI systems deployed at the Reserve Bank of India and banking clients — handling everything from GPU-orchestrated ingestion pipelines to stateful multi-turn conversational AI.

I hold an Integrated Dual Degree (BTech + MTech) in Information Technology from IIITM Gwalior.

3+ Production Systems
17K+ Lines of Production Code
900+ GitHub Contributions
1 IEEE Publication

02. Experience

Data Scientist

Newgen Software — Number Theory Group
Jul 2023 — Present
  • Sole architect of RBI's AI Decision Support System — 114 commits, 57 modules, ~13.8K LOC built in 3 weeks. Covers ingestion pipeline, hybrid query engine, similarity rules engine, 5 FastAPI GPU microservices, and annotation UIs.
  • Built 8-stage GPU-orchestrated PDF ingestion with dual-model ensemble extraction (Qwen3-14B text + Qwen3-VL vision) and Pydantic-enforced JSON arbitration — handles stamps, handwriting, and tables that OCR alone misses.
  • Designed hybrid RAG + Text2SQL query engine: LLM classifier routing → GTE dense + BGE-M3 sparse Milvus search → Qwen3-Reranker re-scoring → majority-vote SQL generation with automatic SQL-to-vector fallback.
  • Developed multi-stage stateful chat memory with history-intent pre-classification, semantic retrieval, 4-variant query refinement, and sufficiency voting — optimizing from 14 LLM calls to 2 on history-only paths.
  • Built 5-rule NBFC Similarity Engine running concurrent vector + SQL matching across 3 databases for regulatory compliance checking at application-receive time.

AI/ML Engineer

Number Theory
Jul 2022 — Jun 2023
  • Primary author of Trade Finance Rule Engine — 3,949 lines implementing 40+ UCP 600 compliance rules deployed at banking clients. Dynamic rule routing through configurable schema dispatch.
  • Built BERT-based signature verification combining Tesseract OCR coordinate extraction, sentence-transformer cosine similarity, and fuzzywuzzy fallback for real-world signature variants.
  • Designed spatial coordinate engine (1,845 lines) for structured document extraction — declarative geolocation schemas resolved at runtime against live OCR coordinates, eliminating retraining per bank form variant.
  • Co-authored Bundle API orchestrating 8 downstream microservices with race condition fixes, JWT auth, rate limiting, and per-service retry logic.

03. Featured Work

Production

RBI Decision Support System

Sole architect of RBI's AI-powered regulatory analysis system. 8-stage GPU pipeline, hybrid RAG+Text2SQL engine, stateful chat memory, 5 FastAPI microservices — delivered end-to-end in 3 weeks.

PythonFastAPIMilvusvLLMQwen3
Production

Trade Finance Rule Engine

Primary author. 40+ UCP 600 compliance rules with BERT signature verification, spatial coordinate extraction, and cross-document numeric reconciliation. Deployed at banking clients.

PythonFastAPIMongoDBBERTTesseract
Research

Multi-Agent Deep Research

Multi-agent system operating over 1,000+ web pages with parallel sub-agents and context management. Achieved 85% accuracy on the BrowseComp benchmark.

PythonLangGraphMulti-Agent
Production

3-Tier Retrieval & Reranking Engine

Authority-ranked retrieval over 5 Milvus collections. Alpha/Beta/Gamma tier search with Qwen3-Reranker cross-encoder scoring, query decomposition, and parallel sub-query execution.

FastAPIMilvusvLLMReranking
Production

DocVeda — Document Intelligence

Modular enterprise RAG platform: OmniDocs fetch → DOTS OCR → doclayout-YOLO parsing → semantic chunking → Milvus insert → LLM metadata filter → reranking → synthesis. CUDA 12.6 + ONNX backend.

FastAPIMilvusYOLOQwen3ONNX
Research

Embedding Fine-Tuning & Benchmarks

Fine-tuned domain embeddings with synthetic QA data — +4.2% Precision@1 over base GTE-large. Benchmarked 8 models across 20 metrics; findings drove production architecture decisions.

SentenceTransformersMilvusHuggingFace

04. Skills

Languages

PythonSQL

ML & Deep Learning

PyTorchHugging FaceScikit-learnNumPyPandasOpenCV

LLMs & NLP

RAGText2SQLMulti-hop QAPrompt EngineeringEmbedding Fine-tuningQwen3LLaMAvLLMOllamaQLoRANLTK

Frameworks

LangChainLangGraphLlamaIndexDSPyAutoGenOpenAI

Databases & Vector Stores

MilvusFAISSSQLiteMongoDBMySQL

Engineering

FastAPIGradioDockerGitLangfuseTesseract OCRMLflow

06. Publication

Fraud Detection on Bank Payments using Machine Learning

IEEE — International Conference for Advancement in Technology (ICAT 2022)

ML classification on Banksim dataset achieving 96.64% accuracy.

DOI: 10.1109/ICAT54021.2022.9726104 →

07. Education

Indian Institute of Information Technology and Management, Gwalior

Integrated Dual Degree (BTech + MTech) — Information Technology

Aug 2018 — June 2023