GenAI Engineer
Santhosh Kammari
Sole architect of AI systems deployed at the Reserve Bank of India.
Shipping LLM pipelines, hybrid retrieval engines, and multi-agent systems to production.
01. About
I'm a GenAI Engineer at Newgen Software (Number Theory Group) with 2+ years building production AI/ML systems. I specialize in LLM-powered document intelligence, hybrid retrieval systems (RAG + Text2SQL), and multi-agent pipelines for financial and regulatory document processing.
I've been the sole architect of AI systems deployed at the Reserve Bank of India and banking clients — handling everything from GPU-orchestrated ingestion pipelines to stateful multi-turn conversational AI.
I hold an Integrated Dual Degree (BTech + MTech) in Information Technology from IIITM Gwalior.
02. Experience
Data Scientist
Newgen Software — Number Theory Group- Sole architect of RBI's AI Decision Support System — 114 commits, 57 modules, ~13.8K LOC built in 3 weeks. Covers ingestion pipeline, hybrid query engine, similarity rules engine, 5 FastAPI GPU microservices, and annotation UIs.
- Built 8-stage GPU-orchestrated PDF ingestion with dual-model ensemble extraction (Qwen3-14B text + Qwen3-VL vision) and Pydantic-enforced JSON arbitration — handles stamps, handwriting, and tables that OCR alone misses.
- Designed hybrid RAG + Text2SQL query engine: LLM classifier routing → GTE dense + BGE-M3 sparse Milvus search → Qwen3-Reranker re-scoring → majority-vote SQL generation with automatic SQL-to-vector fallback.
- Developed multi-stage stateful chat memory with history-intent pre-classification, semantic retrieval, 4-variant query refinement, and sufficiency voting — optimizing from 14 LLM calls to 2 on history-only paths.
- Built 5-rule NBFC Similarity Engine running concurrent vector + SQL matching across 3 databases for regulatory compliance checking at application-receive time.
AI/ML Engineer
Number Theory- Primary author of Trade Finance Rule Engine — 3,949 lines implementing 40+ UCP 600 compliance rules deployed at banking clients. Dynamic rule routing through configurable schema dispatch.
- Built BERT-based signature verification combining Tesseract OCR coordinate extraction, sentence-transformer cosine similarity, and fuzzywuzzy fallback for real-world signature variants.
- Designed spatial coordinate engine (1,845 lines) for structured document extraction — declarative geolocation schemas resolved at runtime against live OCR coordinates, eliminating retraining per bank form variant.
- Co-authored Bundle API orchestrating 8 downstream microservices with race condition fixes, JWT auth, rate limiting, and per-service retry logic.
03. Featured Work
RBI Decision Support System
Sole architect of RBI's AI-powered regulatory analysis system. 8-stage GPU pipeline, hybrid RAG+Text2SQL engine, stateful chat memory, 5 FastAPI microservices — delivered end-to-end in 3 weeks.
Trade Finance Rule Engine
Primary author. 40+ UCP 600 compliance rules with BERT signature verification, spatial coordinate extraction, and cross-document numeric reconciliation. Deployed at banking clients.
Multi-Agent Deep Research
Multi-agent system operating over 1,000+ web pages with parallel sub-agents and context management. Achieved 85% accuracy on the BrowseComp benchmark.
3-Tier Retrieval & Reranking Engine
Authority-ranked retrieval over 5 Milvus collections. Alpha/Beta/Gamma tier search with Qwen3-Reranker cross-encoder scoring, query decomposition, and parallel sub-query execution.
DocVeda — Document Intelligence
Modular enterprise RAG platform: OmniDocs fetch → DOTS OCR → doclayout-YOLO parsing → semantic chunking → Milvus insert → LLM metadata filter → reranking → synthesis. CUDA 12.6 + ONNX backend.
Embedding Fine-Tuning & Benchmarks
Fine-tuned domain embeddings with synthetic QA data — +4.2% Precision@1 over base GTE-large. Benchmarked 8 models across 20 metrics; findings drove production architecture decisions.
04. Skills
Languages
ML & Deep Learning
LLMs & NLP
Frameworks
Databases & Vector Stores
Engineering
05. Open Source
zero-to-hero-ai
Comprehensive AI/ML knowledge book as an interactive website. 18 chapters, 224 sections, 6,972 concepts.
HTMLmulti-agent-deepresearch
Multi-agent deep research system using LLMs. 85% accuracy on BrowseComp benchmark.
Pythonopenai-agent-terminal
Streaming AI coding REPL with multi-agent orchestration — local vLLM, Claude, OpenCode, Copilot in a unified session.
Pythondemystifying-ai
Making AI concepts accessible — educational content and interactive explanations.
HTML06. Publication
Fraud Detection on Bank Payments using Machine Learning
IEEE — International Conference for Advancement in Technology (ICAT 2022)
ML classification on Banksim dataset achieving 96.64% accuracy.
DOI: 10.1109/ICAT54021.2022.9726104 →07. Education
Indian Institute of Information Technology and Management, Gwalior
Integrated Dual Degree (BTech + MTech) — Information Technology
Aug 2018 — June 2023