Senior Data Scientist focused on building production-grade AI systems that drive measurable business outcomes.
My work sits at the intersection of predictive modeling, NLP, and LLM-powered applications, with a strong emphasis on deployment, evaluation, and real-world impact. I specialize in turning messy, high-volume data (text, PDFs, behavioral signals) into systems that improve revenue, retention, and decision-making.
- Churn modeling and revenue risk estimation
- Causal inference and uplift modeling
- Retrieval-augmented generation (RAG) systems
- Large-scale text and document processing (PDF pipelines)
- Production ML systems and MLOps
- Decision intelligence and experimentation
End-to-end system for estimating treatment effect (who to target) and translating it into expected business impact.
- Uplift modeling to identify persuadable vs non-persuadable users
- Policy targeting simulation (who to intervene on)
- ROI estimation based on incremental lift
- Interactive dashboard for decision-making
- Designed to mirror real-world marketing / retention use cases
Tech: Python • Scikit-Learn • XGBoost • Streamlit
👉 https://causal-uplift-modeling.streamlit.app/
Production-style ML system for predicting churn and quantifying revenue at risk.
- Modular training + inference pipelines
- Behavioral feature engineering (engagement, activity, value trends)
- SHAP-based explainability for business users
- Revenue impact modeling and segmentation
- Streamlit dashboard for stakeholder insights
Tech: Python • XGBoost • Scikit-Learn • MLflow • Streamlit
👉 https://retention-risk-workbench.streamlit.app/
Production-style RAG application for answering questions over internal knowledge sources.
- Document ingestion + chunking pipelines
- Vector search + retrieval optimization
- LLM-based answer generation with grounding
- Deployed demo with end-to-end flow
Tech: OpenAI • FAISS • LangChain • Python
👉 https://grounded-conversation-rag.streamlit.app/
Lightweight toolkit for improving NLP model robustness via data augmentation.
- Back translation
- Synonym replacement
- Embedding-based perturbations
Languages
Python • SQL • R • PySpark
Machine Learning
Scikit-Learn • XGBoost • PyTorch • TensorFlow
LLM / NLP
OpenAI API • LangChain • HuggingFace • spaCy • RAG systems
Data & MLOps
Databricks • MLflow • Airflow • Spark • Feature Stores • Docker
Cloud
AWS • Azure • GCP
Visualization
Tableau • PowerBI • Matplotlib
- Built NLP + LLM systems that significantly improved conversion in customer conversations
- Designed large-scale PDF ingestion pipelines for structured data extraction
- Developed RAG systems for internal knowledge retrieval and decision support
- Delivered ML systems with clear business framing (revenue at risk, uplift, targeting strategy)

