Skip to content
View ArjunJagdale's full-sized avatar
πŸ•ΆοΈ
Let us do a PR
πŸ•ΆοΈ
Let us do a PR

Block or report ArjunJagdale

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ArjunJagdale/README.md

Typing SVG

Portfolio LinkedIn Email LeetCode


πŸ‘‹ About Me

Turning research into production-ready ML systems. I'm an AI engineer who codes at the intersection of deep learning research and production engineering β€” building everything from anti-spoofing CNNs to parameter-efficient transformers, while actively contributing to core Hugging Face libraries.

class ArjunJagdale:
    def __init__(self):
        self.role = "AI Engineer & Open Source Contributor"
        self.focus_areas = ["Deep Learning", "Computer Vision", "NLP", "MLOps"]
        self.current_work = "Contributing to Hugging Face core libraries"
        self.interests = ["RAG Systems", "Model Compression", "Cloud-Native ML"]
    
    def get_expertise(self):
        return {
            "frameworks": ["PyTorch", "HuggingFace", "scikit-learn", "TensorFlow"],
            "specializations": ["Parameter-Efficient Fine-Tuning", "CNNs", "Transformers"],
            "cloud": ["IBM Cloud", "Google Cloud", "Docker", "Kubernetes"],
            "tools": ["LangChain", "LlamaIndex", "Gradio", "Git"]
        }

πŸ”₯ What I'm Working On

  • πŸš€ Contributing to Hugging Face β€” datasets & dataset-viewer libraries (7 merged PRs)
  • 🧠 Research β€” Published paper on Retrieval-Augmented Systems with Dynamic Learning
  • πŸ› οΈ Building β€” Production ML pipelines with real-time inference and GPU optimization
  • πŸ“š Learning β€” Parameter-efficient methods, vision-language models, cloud-native deployments

πŸ› οΈ Tech Arsenal

Languages & Core

Python C++ JavaScript

ML & AI Frameworks

PyTorch HuggingFace scikit-learn TensorFlow LangChain LlamaIndex

Cloud & DevOps

IBM Cloud Google Cloud Docker Kubernetes

Tools & Libraries

Git GitHub Actions Pandas NumPy OpenCV


πŸš€ Featured Projects

Custom CNN Architecture for Real vs. Spoof Detection

results = {
    "validation_accuracy": "99.39%",
    "test_accuracy": "99.29%",
    "dataset_size": "38K images",
    "parameters": "4.88M"
}
  • Built SpoofNet from scratch with BatchNorm, Dropout2D, AdaptiveAvgPooling
  • Real-time inference with MediaPipe + OpenCV on GPU
  • Automated face extraction pipeline with quality filters
  • Production-ready with only 2 false negatives, 40 false positives

Tech: PyTorch β€’ OpenCV β€’ MediaPipe β€’ Gradio

Parameter-Efficient Transformer Fine-Tuning

efficiency = {
    "parameter_reduction": "99.87%",
    "trainable_params": "147K vs 110M",
    "accuracy": "91.74%",
    "speedup": "750x faster training"
}
  • Custom LoRA implementation with rank-4 low-rank matrices
  • Injected into Q/V projections across 12 BERT encoder layers
  • Built end-to-end pipeline with HuggingFace Transformers
  • Optimized with AdamW + gradient freezing

Tech: PyTorch β€’ HuggingFace β€’ LoRA β€’ SST-2

Bidirectional Sequence Modeling for IMDB Reviews

performance = {
    "test_accuracy": "85.19%",
    "train_test_gap": "4%",
    "architecture": "2-layer BiLSTM",
    "embedding_dim": 128
}
  • Bidirectional LSTM with 50% dropout for regularization
  • Gradient clipping (max norm=1.0) for stable training
  • Gradio deployment for real-time inference
  • Handles sequences up to 256 tokens with OOV support

Tech: PyTorch β€’ LSTM β€’ Gradio β€’ NLP

Classical ML with Advanced Feature Engineering

model_comparison = {
    "Random_Forest": "97.2%",
    "Logistic_Regression": "96.5%",
    "SVM": "95.8%",
    "after_tuning": "+1-2%"
}
  • Comprehensive EDA with correlation heatmaps and PCA
  • GridSearchCV hyperparameter optimization
  • Multi-class classification on 178 samples, 13 features
  • Full evaluation with confusion matrices & classification reports

Tech: scikit-learn β€’ Pandas β€’ Matplotlib β€’ NumPy


🌟 Open Source Contributions

PRs Repos Impact

Active contributor to Hugging Face β€” focusing on datasets infrastructure, compatibility fixes, and developer experience

πŸ“¦ huggingface/datasets

#7831 β€’ Fix ValueError in train_test_split with NumPy 2.0+

Resolved compatibility issue with NumPy 2.0+ by wrapping stratify column array access with np.asarray(). Maintains backward compatibility with NumPy 1.x while fixing array copy errors.

bug-fix compatibility numpy

#7648 β€’ Fix misleading docstring examples across multiple methods

Updated docstrings for add_column(), select_columns(), select(), filter(), shard(), and flatten() to clarify that these methods return new datasets instead of modifying in-place. Significantly improves API documentation clarity.

documentation api-improvement datasets

#7623 β€’ Fix: Raise error when data_dir and data_files are missing

Added validation check in FolderBasedBuilder to prevent silent fallback to current directory when loading folder-based datasets without required parameters. Improves user experience by catching errors early.

bug-fix validation datasets

πŸ” huggingface/dataset-viewer

#3223 β€’ Add support for Date features in Croissant schema

Implemented support for Date, UTCDate, and UTCTime features in Croissant schema generation. Automatically infers correct dataType (sc:Date, sc:Time, or sc:DateTime) based on format string.

feature croissant schema

#3219 β€’ Refactor: Replace get_empty_str_list with CONSTANT.copy

Eliminated shared mutable default values in dataclass fields by replacing helper functions with explicit constant copies. Makes configuration behavior more explicit and prevents subtle bugs.

refactor best-practices config

#3218 β€’ Test: Add unit tests for get_previous_step_or_raise

Implemented comprehensive unit tests for cache retrieval function covering successful cache hits, missing cache scenarios, and error status handling. Improves code coverage and reliability.

testing unit-tests coverage

#3206 β€’ Refactor: Use HfApi.update_repo_settings for gated datasets

Removed redundant custom implementations of update_repo_settings() across test utilities by leveraging official huggingface_hub API. Cleaned up 222 lines of code while maintaining full functionality.

refactor code-cleanup testing

View All Contributions


πŸ“Š GitHub Analytics

GitHub Streak

πŸ“š Research & Publications

Retrieval-Augmented System with Dynamic Learning from Web Content
Published research on RAG systems that dynamically learn from web content, combining retrieval mechanisms with adaptive learning strategies.


πŸŽ“ Certifications

IBM ML Specialist PyTorch Git & GitHub


πŸ’¬ Let's Connect

Building something interesting? I'm always open to collaborating on ML research, open source contributions, or production ML systems.

LinkedIn Email Portfolio LeetCode


πŸ’‘ Currently Exploring: RAG Systems β€’ Model Compression β€’ Cloud-Native ML Deployments
πŸ“ Location: Pune, Maharashtra, India
πŸŽ“ Education: B.E. Electronics & Telecommunication @ Savitribai Phule Pune University

Profile Views

Pinned Loading

  1. YTRAG YTRAG Public

    Python

  2. SPOOF SPOOF Public

    Jupyter Notebook

  3. CHAT CHAT Public

    JavaScript

  4. URAG URAG Public

    Python

  5. NER NER Public

    Python