Turning research into production-ready ML systems. I'm an AI engineer who codes at the intersection of deep learning research and production engineering β building everything from anti-spoofing CNNs to parameter-efficient transformers, while actively contributing to core Hugging Face libraries.
class ArjunJagdale:
def __init__(self):
self.role = "AI Engineer & Open Source Contributor"
self.focus_areas = ["Deep Learning", "Computer Vision", "NLP", "MLOps"]
self.current_work = "Contributing to Hugging Face core libraries"
self.interests = ["RAG Systems", "Model Compression", "Cloud-Native ML"]
def get_expertise(self):
return {
"frameworks": ["PyTorch", "HuggingFace", "scikit-learn", "TensorFlow"],
"specializations": ["Parameter-Efficient Fine-Tuning", "CNNs", "Transformers"],
"cloud": ["IBM Cloud", "Google Cloud", "Docker", "Kubernetes"],
"tools": ["LangChain", "LlamaIndex", "Gradio", "Git"]
}- π Contributing to Hugging Face β datasets & dataset-viewer libraries (7 merged PRs)
- π§ Research β Published paper on Retrieval-Augmented Systems with Dynamic Learning
- π οΈ Building β Production ML pipelines with real-time inference and GPU optimization
- π Learning β Parameter-efficient methods, vision-language models, cloud-native deployments
|
Custom CNN Architecture for Real vs. Spoof Detection results = {
"validation_accuracy": "99.39%",
"test_accuracy": "99.29%",
"dataset_size": "38K images",
"parameters": "4.88M"
}
Tech: PyTorch β’ OpenCV β’ MediaPipe β’ Gradio |
Parameter-Efficient Transformer Fine-Tuning efficiency = {
"parameter_reduction": "99.87%",
"trainable_params": "147K vs 110M",
"accuracy": "91.74%",
"speedup": "750x faster training"
}
Tech: PyTorch β’ HuggingFace β’ LoRA β’ SST-2 |
|
Bidirectional Sequence Modeling for IMDB Reviews performance = {
"test_accuracy": "85.19%",
"train_test_gap": "4%",
"architecture": "2-layer BiLSTM",
"embedding_dim": 128
}
Tech: PyTorch β’ LSTM β’ Gradio β’ NLP |
Classical ML with Advanced Feature Engineering model_comparison = {
"Random_Forest": "97.2%",
"Logistic_Regression": "96.5%",
"SVM": "95.8%",
"after_tuning": "+1-2%"
}
Tech: scikit-learn β’ Pandas β’ Matplotlib β’ NumPy |
Active contributor to Hugging Face β focusing on datasets infrastructure, compatibility fixes, and developer experience
|
|
#7831 β’ Fix ValueError in train_test_split with NumPy 2.0+ Resolved compatibility issue with NumPy 2.0+ by wrapping stratify column array access with
|
|
|
#7648 β’ Fix misleading docstring examples across multiple methods Updated docstrings for
|
|
|
#7623 β’ Fix: Raise error when data_dir and data_files are missing Added validation check in FolderBasedBuilder to prevent silent fallback to current directory when loading folder-based datasets without required parameters. Improves user experience by catching errors early.
|
|
|
#3223 β’ Add support for Date features in Croissant schema Implemented support for Date, UTCDate, and UTCTime features in Croissant schema generation. Automatically infers correct dataType (sc:Date, sc:Time, or sc:DateTime) based on format string.
|
|
|
#3219 β’ Refactor: Replace get_empty_str_list with CONSTANT.copy Eliminated shared mutable default values in dataclass fields by replacing helper functions with explicit constant copies. Makes configuration behavior more explicit and prevents subtle bugs.
|
|
|
#3218 β’ Test: Add unit tests for get_previous_step_or_raise Implemented comprehensive unit tests for cache retrieval function covering successful cache hits, missing cache scenarios, and error status handling. Improves code coverage and reliability.
|
|
|
#3206 β’ Refactor: Use HfApi.update_repo_settings for gated datasets Removed redundant custom implementations of
|
Retrieval-Augmented System with Dynamic Learning from Web Content
Published research on RAG systems that dynamically learn from web content, combining retrieval mechanisms with adaptive learning strategies.
Building something interesting? I'm always open to collaborating on ML research, open source contributions, or production ML systems.

