Skip to content
View awinml's full-sized avatar

Organizations

@avnlp

Block or report awinml

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
awinml/README.md

Ashwin Mathur

AI Engineer · Agentic RAG & Reranking · LLM Fine-Tuning & RL · Domain-Specific AI

LinkedInEmail

I work on LLM systems for domain-specific applications in Finance, Bio-Medical, and Legal AI, spanning retrieval, agents and model training. I’ve contributed to Haystack, MTEB, HuggingFace, and scikit-learn, and co-authored MMTEB, published at ICLR 2025. Developing open-source AI at AVNLP.

Developing Open-Source AI @ AVNLP

LLM Training & Alignment

Repository Description
BioThink Self-Reflective Bio-Medical Question Answering system - trains Qwen3-1.7B with QLoRA + GRPO using 5 custom reward functions (Relevance, Grounding, Utility token enforcement + XML structure + GEval correctness); evaluated across 7 metrics including faithfulness and answer correctness via LLM-as-a-Judge.
LLM-Finetuning Fine-tuning pipelines covering SFT, DPO, ORPO, KTO, and PPO; comparative benchmarking of QLoRA, LoRA, DoRA, P-Tuning, and Prefix-Tuning across ARC, FactScore, TriviaQA, and PopQA.
GRPO GRPO implementations comparing reward functions (format/correctness), training frameworks (DeepSpeed and PyTorch), and reference-model handling strategies.
RAG-Model-Training Fine-tuning LLMs for 6 RAG paradigms - Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, ReZero - via SFT and GRPO; uses Llama-3.2, and Llama-3-8B across finance, biomedical, and open-domain QA datasets.

Retrieval-Augmented Generation

Repository Description
RAG-Pipelines Agentic RAG pipelines with metadata enrichment, contextual reranking and structured generation.
DSPy-Optimizers DSPy-based RAG optimization framework using MIPRO, COPRO, and BootstrapFewShot on FreshQA, HotpotQA, TriviaQA, PubMedQA.
VectorDB Production Haystack and LangChain pipelines for Hybrid Search, Parent-Child Retrieval, MMR, Metadata Filtering, Multi-Tenancy, and Re-ranking across Pinecone, Weaviate, Milvus, Qdrant, and Chroma - with benchmarks on TriviaQA, ARC, PopQA, FactScore, and Earnings Calls.

Information Retrieval & Ranking

Repository Description
LLM Rankers LLM re-ranking library for IR and RAG. Implements Pairwise, Setwise, and Listwise ranking with RankZephyr and RankLlama; supports sliding windows, efficient sorting, and zero-shot inference.
Pairwise Ranking Prompting Zero-shot pairwise reranking library (Heapsort, Sliding Window, All-Pairs strategies) with bidirectional comparison for position-bias mitigation; Pydantic-validated.
Reciprocal Rank Fusion and LLM Rankers Hybrid retrieval with Reciprocal Rank Fusion (RRF); evaluates Diversity, Lost-in-the-Middle, and Similarity rankers against the BEIR suite (NDCG, MAP, Recall, Precision).
LLM-Blender Ensembling framework combining PairRanker (pairwise ranking) and GenFuser (output merging) to synthesize superior responses from multiple open-source models.

Open-Source Contributions

Project Contributions
Haystack Evaluation Framework: Designed and built Haystack's pipeline evaluation from scratch - StatisticalEvaluator, EvaluationResult, and six metrics: Exact Match, F1, Semantic Answer Similarity, Recall, MRR, and MAP
HuggingFace TEI Embedders: Components supporting self-hosted Docker, free Inference API, and paid HF Inference Endpoints
Diversity Ranker: Document reranker optimizing for maximum semantic diversity via sentence-transformer embeddings
Haystack Core Integrations INSTRUCTOR Embedders: Task- and domain-specific embedding components with instructable prompt prefixes
HF Optimum: Embedding inference with ONNX and TensorRT runtimes
Llama.cpp Generator: Text generation with quantized models
Pinecone: Vector DB integration with advanced metadata filtering
voyage-embedders-haystack Haystack integration for Voyage AI embedding and reranking models
MTEB LegalBench: Added the complete LegalBench benchmark suite - 160+ legal domain classification and retrieval datasets; Integrated Japanese embedding benchmarks JMTEB and JSICK.
HuggingFace Transformers BioGPTForSequenceClassification implementation; ViT pre-training scripts without the Trainer class; HuggingFace Evaluate + scikit-learn integration docs.
scikit-learn, imbalanced-learn Out-of-bag scores for Gradient Boosting; sparse matrix support for Silhouette Score; multi-class Average Precision (One-vs-Rest).

Publications

MMTEB: Massive Multilingual Text Embedding Benchmark (ICLR 2025)

Largest multilingual text embedding benchmark: 500+ tasks across 250+ languages and 10 task categories. Contributed the complete LegalBench suite - 160+ legal domain classification and retrieval datasets.

Pinned Loading

  1. avnlp/biothink avnlp/biothink Public

    Self-Reflective Question Answering for Biomedical Reasoning

    Python 5 1

  2. avnlp/llm-finetuning avnlp/llm-finetuning Public

    Pipelines for Fine-Tuning LLMs using SFT and RLHF

    Python 6 3

  3. avnlp/rag-model-training avnlp/rag-model-training Public

    Training code for advanced RAG techniques - Adaptive-RAG, Corrective RAG, RQ-RAG, Self-RAG, Agentic RAG, and ReZero. Reproduces paper methodologies to fine-tune LLMs via SFT and GRPO for adaptive r…

    Python 7 2

  4. avnlp/dspy-opt avnlp/dspy-opt Public

    Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and …

    Python 7 1

  5. avnlp/rag-pipelines avnlp/rag-pipelines Public

    Advanced RAG Pipelines and Evaluation

    Python 12 1

  6. avnlp/rankers avnlp/rankers Public

    Modular LLM ranking library for Information Retrieval and RAG. Implements state-of-the-art Pairwise, Setwise, and Listwise ranking with structured generation and specialized models (RankZephyr, Ran…

    Python 5 1