Arjun Jagdale ArjunJagdale

👋 About Me

Turning research into production-ready ML systems. I'm an AI engineer who codes at the intersection of deep learning research and production engineering — building everything from anti-spoofing CNNs to parameter-efficient transformers, while actively contributing to core Hugging Face libraries.

class ArjunJagdale:
    def __init__(self):
        self.role = "AI Engineer & Open Source Contributor"
        self.focus_areas = ["Deep Learning", "Computer Vision", "NLP", "MLOps"]
        self.current_work = "Contributing to Hugging Face core libraries"
        self.interests = ["RAG Systems", "Model Compression", "Cloud-Native ML"]
    
    def get_expertise(self):
        return {
            "frameworks": ["PyTorch", "HuggingFace", "scikit-learn", "TensorFlow"],
            "specializations": ["Parameter-Efficient Fine-Tuning", "CNNs", "Transformers"],
            "cloud": ["IBM Cloud", "Google Cloud", "Docker", "Kubernetes"],
            "tools": ["LangChain", "LlamaIndex", "Gradio", "Git"]
        }

🔥 What I'm Working On

🚀 Contributing to Hugging Face — datasets & dataset-viewer libraries (7 merged PRs)
🧠 Research — Published paper on Retrieval-Augmented Systems with Dynamic Learning
🛠️ Building — Production ML pipelines with real-time inference and GPU optimization
📚 Learning — Parameter-efficient methods, vision-language models, cloud-native deployments

🛠️ Tech Arsenal

Languages & Core

ML & AI Frameworks

Cloud & DevOps

Tools & Libraries

🚀 Featured Projects

🎭 Anti-Spoof Face Classification

Custom CNN Architecture for Real vs. Spoof Detection

results = {
    "validation_accuracy": "99.39%",
    "test_accuracy": "99.29%",
    "dataset_size": "38K images",
    "parameters": "4.88M"
}

Built SpoofNet from scratch with BatchNorm, Dropout2D, AdaptiveAvgPooling
Real-time inference with MediaPipe + OpenCV on GPU
Automated face extraction pipeline with quality filters
Production-ready with only 2 false negatives, 40 false positives

Tech: PyTorch • OpenCV • MediaPipe • Gradio

🔬 LoRA Fine-Tuning for BERT

Parameter-Efficient Transformer Fine-Tuning

efficiency = {
    "parameter_reduction": "99.87%",
    "trainable_params": "147K vs 110M",
    "accuracy": "91.74%",
    "speedup": "750x faster training"
}

Custom LoRA implementation with rank-4 low-rank matrices
Injected into Q/V projections across 12 BERT encoder layers
Built end-to-end pipeline with HuggingFace Transformers
Optimized with AdamW + gradient freezing

Tech: PyTorch • HuggingFace • LoRA • SST-2

🎬 LSTM Sentiment Analysis

Bidirectional Sequence Modeling for IMDB Reviews

performance = {
    "test_accuracy": "85.19%",
    "train_test_gap": "4%",
    "architecture": "2-layer BiLSTM",
    "embedding_dim": 128
}

Bidirectional LSTM with 50% dropout for regularization
Gradient clipping (max norm=1.0) for stable training
Gradio deployment for real-time inference
Handles sequences up to 256 tokens with OOV support

Tech: PyTorch • LSTM • Gradio • NLP

🍷 Wine Quality Prediction

Classical ML with Advanced Feature Engineering

model_comparison = {
    "Random_Forest": "97.2%",
    "Logistic_Regression": "96.5%",
    "SVM": "95.8%",
    "after_tuning": "+1-2%"
}

Comprehensive EDA with correlation heatmaps and PCA
GridSearchCV hyperparameter optimization
Multi-class classification on 178 samples, 13 features
Full evaluation with confusion matrices & classification reports

Tech: scikit-learn • Pandas • Matplotlib • NumPy

🌟 Open Source Contributions

Active contributor to Hugging Face — focusing on datasets infrastructure, compatibility fixes, and developer experience

📦 huggingface/datasets

#7831 • Fix ValueError in train_test_split with NumPy 2.0+

Resolved compatibility issue with NumPy 2.0+ by wrapping stratify column array access with np.asarray(). Maintains backward compatibility with NumPy 1.x while fixing array copy errors.

bug-fix compatibility numpy

#7648 • Fix misleading docstring examples across multiple methods

Updated docstrings for add_column(), select_columns(), select(), filter(), shard(), and flatten() to clarify that these methods return new datasets instead of modifying in-place. Significantly improves API documentation clarity.

documentation api-improvement datasets

#7623 • Fix: Raise error when data_dir and data_files are missing

Added validation check in FolderBasedBuilder to prevent silent fallback to current directory when loading folder-based datasets without required parameters. Improves user experience by catching errors early.

bug-fix validation datasets

🔍 huggingface/dataset-viewer

	#3223 • Add support for Date features in Croissant schema Implemented support for Date, UTCDate, and UTCTime features in Croissant schema generation. Automatically infers correct dataType (sc:Date, sc:Time, or sc:DateTime) based on format string. `feature` `croissant` `schema`
	#3219 • Refactor: Replace get_empty_str_list with CONSTANT.copy Eliminated shared mutable default values in dataclass fields by replacing helper functions with explicit constant copies. Makes configuration behavior more explicit and prevents subtle bugs. `refactor` `best-practices` `config`
	#3218 • Test: Add unit tests for get_previous_step_or_raise Implemented comprehensive unit tests for cache retrieval function covering successful cache hits, missing cache scenarios, and error status handling. Improves code coverage and reliability. `testing` `unit-tests` `coverage`
	#3206 • Refactor: Use HfApi.update_repo_settings for gated datasets Removed redundant custom implementations of `update_repo_settings()` across test utilities by leveraging official huggingface_hub API. Cleaned up 222 lines of code while maintaining full functionality. `refactor` `code-cleanup` `testing`

📊 GitHub Analytics

📚 Research & Publications

Retrieval-Augmented System with Dynamic Learning from Web Content
Published research on RAG systems that dynamically learn from web content, combining retrieval mechanisms with adaptive learning strategies.

🎓 Certifications

💬 Let's Connect

Building something interesting? I'm always open to collaborating on ML research, open source contributions, or production ML systems.

💡 Currently Exploring: RAG Systems • Model Compression • Cloud-Native ML Deployments
📍 Location: Pune, Maharashtra, India
🎓 Education: B.E. Electronics & Telecommunication @ Savitribai Phule Pune University

Provide feedback

Saved searches

Use saved searches to filter your results more quickly