Senior Data Scientist & AI Automation Lead with 8+ years of experience delivering end-to-end AI, NLP, and Computer Vision solutions that drive measurable business impact in eCommerce and catalog automation. Proven ability to design, deploy, and scale AI-driven pipelines that improve efficiency, reduce cost, and enhance content quality across millions of SKUs.
At Noon.com, I architected production-grade multimodal ML pipelines integrating text (BERT, Gemma) and vision (CLIP, ResNet) models, achieving 20+ point accuracy lift and 26% relative improvement in UCR. My Python, PyTorch, and Vertex AI expertise enabled the automation of catalog ingestion, deduplication, and taxonomy classification workflows—scaling to 20M+ SKUs with high precision.
Key achievements include:
AI-Driven Catalog Ingestion & LLM Enrichment: Built a Python ETL + LLM pipeline to process 6M SKUs from 1688.com with regex + LLM-based policy detection, reducing QC rejections by 31% and onboarding time by 4x.
Multimodal Deduplication System: Combined Gemma text embeddings and image embeddings with HDBSCAN clustering, driving 0.8% absolute UCR gain through duplicate resolution.
Hierarchical Taxonomy Classification: Designed a BERT–CLIP fusion model for 5,000+ product subtypes, improving classification accuracy from 58% → 78%.
Generative AI for Infographics: Automated A+ content creation using Stable Diffusion (SD2), Flux 2.1, and ComfyUI, reducing manual design hours by 90%.
Core Stack: Python, PyTorch, TensorFlow, Transformers, Docker, Vertex AI, BigQuery, SQL, HDBSCAN, PEFT/LoRA, Stable Diffusion, ComfyUI, MLOps, Multimodal Fusion.
Business Impact: Delivered measurable ROI via automation, enhanced SKU accuracy, reduced time-to-market, and scalable AI-first catalog ecosystems for high-traffic marketplaces.
No employment history.