Ashwin Mathur awinml

Ashwin Mathur

AI Engineer · Agentic RAG & Reranking · LLM Fine-Tuning & RL · Domain-Specific AI

I work on LLM systems for domain-specific applications in Finance, Bio-Medical, and Legal AI, spanning retrieval, agents and model training. I’ve contributed to Haystack, MTEB, HuggingFace, and scikit-learn, and co-authored MMTEB, published at ICLR 2025. Developing open-source AI at AVNLP.

Developing Open-Source AI @ AVNLP

LLM Training & Alignment

Repository	Description
BioThink	Self-Reflective Bio-Medical Question Answering system - trains Qwen3-1.7B with QLoRA + GRPO using 5 custom reward functions (Relevance, Grounding, Utility token enforcement + XML structure + GEval correctness); evaluated across 7 metrics including faithfulness and answer correctness via LLM-as-a-Judge.
LLM-Finetuning	Fine-tuning pipelines covering SFT, DPO, ORPO, KTO, and PPO; comparative benchmarking of QLoRA, LoRA, DoRA, P-Tuning, and Prefix-Tuning across ARC, FactScore, TriviaQA, and PopQA.
GRPO	GRPO implementations comparing reward functions (format/correctness), training frameworks (DeepSpeed and PyTorch), and reference-model handling strategies.
RAG-Model-Training	Fine-tuning LLMs for 6 RAG paradigms - Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, ReZero - via SFT and GRPO; uses Llama-3.2, and Llama-3-8B across finance, biomedical, and open-domain QA datasets.

Retrieval-Augmented Generation

Repository	Description
RAG-Pipelines	Agentic RAG pipelines with metadata enrichment, contextual reranking and structured generation.
DSPy-Optimizers	DSPy-based RAG optimization framework using MIPRO, COPRO, and BootstrapFewShot on FreshQA, HotpotQA, TriviaQA, PubMedQA.
VectorDB	Production Haystack and LangChain pipelines for Hybrid Search, Parent-Child Retrieval, MMR, Metadata Filtering, Multi-Tenancy, and Re-ranking across Pinecone, Weaviate, Milvus, Qdrant, and Chroma - with benchmarks on TriviaQA, ARC, PopQA, FactScore, and Earnings Calls.

Information Retrieval & Ranking

Repository	Description
LLM Rankers	LLM re-ranking library for IR and RAG. Implements Pairwise, Setwise, and Listwise ranking with RankZephyr and RankLlama; supports sliding windows, efficient sorting, and zero-shot inference.
Pairwise Ranking Prompting	Zero-shot pairwise reranking library (Heapsort, Sliding Window, All-Pairs strategies) with bidirectional comparison for position-bias mitigation; Pydantic-validated.
Reciprocal Rank Fusion and LLM Rankers	Hybrid retrieval with Reciprocal Rank Fusion (RRF); evaluates Diversity, Lost-in-the-Middle, and Similarity rankers against the BEIR suite (NDCG, MAP, Recall, Precision).
LLM-Blender	Ensembling framework combining PairRanker (pairwise ranking) and GenFuser (output merging) to synthesize superior responses from multiple open-source models.

Open-Source Contributions

Project	Contributions
Haystack	Evaluation Framework: Designed and built Haystack's pipeline evaluation from scratch - `StatisticalEvaluator`, `EvaluationResult`, and six metrics: Exact Match, F1, Semantic Answer Similarity, Recall, MRR, and MAP HuggingFace TEI Embedders: Components supporting self-hosted Docker, free Inference API, and paid HF Inference Endpoints Diversity Ranker: Document reranker optimizing for maximum semantic diversity via sentence-transformer embeddings
Haystack Core Integrations	INSTRUCTOR Embedders: Task- and domain-specific embedding components with instructable prompt prefixes HF Optimum: Embedding inference with ONNX and TensorRT runtimes Llama.cpp Generator: Text generation with quantized models Pinecone: Vector DB integration with advanced metadata filtering
voyage-embedders-haystack	Haystack integration for Voyage AI embedding and reranking models
MTEB	LegalBench: Added the complete LegalBench benchmark suite - 160+ legal domain classification and retrieval datasets; Integrated Japanese embedding benchmarks JMTEB and JSICK.
HuggingFace Transformers	`BioGPTForSequenceClassification` implementation; ViT pre-training scripts without the Trainer class; HuggingFace Evaluate + scikit-learn integration docs.
scikit-learn, imbalanced-learn	Out-of-bag scores for Gradient Boosting; sparse matrix support for Silhouette Score; multi-class Average Precision (One-vs-Rest).

Publications

MMTEB: Massive Multilingual Text Embedding Benchmark (ICLR 2025)

Largest multilingual text embedding benchmark: 500+ tasks across 250+ languages and 10 task categories. Contributed the complete LegalBench suite - 160+ legal domain classification and retrieval datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ashwin Mathur awinml

Achievements

Achievements

Organizations

Block or report awinml