A curated list of awesome frameworks, libraries, tools, tutorials, datasets, and research papers in machine learning. This list covers a wide array of topics, from foundational algorithms to modern techniques in supervised, unsupervised, and reinforcement learning.
- Frameworks and Libraries
- Tools and Utilities
- Algorithms and Techniques
- Model Evaluation and Tuning
- Feature Engineering
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Datasets
- Research Papers
- Learning Resources
- Books
- Community
- Contribute
- License
- Scikit-learn - A comprehensive Python library for machine learning with efficient tools for data analysis.
- TensorFlow - An open-source platform for machine learning and deep learning by Google.
- PyTorch - An open-source machine learning framework popular for its dynamic computation graph.
- XGBoost - A scalable, efficient, and widely-used gradient boosting library.
- LightGBM - A fast, distributed, high-performance gradient boosting framework.
- CatBoost - A gradient boosting library with built-in support for categorical features.
- MLflow - An open-source platform for managing the end-to-end machine learning lifecycle.
- Weights & Biases - A tool for experiment tracking, model monitoring, and hyperparameter optimization.
- DVC (Data Version Control) - A version control system for machine learning projects.
- Optuna - An automatic hyperparameter optimization framework.
- Streamlit - A library for creating interactive machine learning web apps quickly.
- Linear Regression - A simple, yet powerful, supervised learning algorithm for regression tasks.
- Logistic Regression - A classification algorithm based on the logistic function.
- Decision Trees - A non-parametric supervised learning algorithm used for classification and regression tasks.
- Random Forest - An ensemble learning method using multiple decision trees.
- Gradient Boosting - A technique for building predictive models through an ensemble of weak learners.
- Cross-Validation - A statistical method used to estimate the performance of a model.
- Confusion Matrix - A tool for evaluating the performance of classification algorithms.
- Precision, Recall, F1 Score - Metrics for evaluating the accuracy of a classification model.
- Grid Search - A method for hyperparameter optimization through exhaustive search.
- Bayesian Optimization - A method for optimizing hyperparameters using probabilistic models.
- Pandas - A Python library for data manipulation and analysis.
- FeatureTools - An open-source library for automated feature engineering.
- Missingno - A Python library for visualizing missing data.
- Category Encoders - A collection of scikit-learn compatible transformers for encoding categorical features.
- Principal Component Analysis (PCA) - A technique for dimensionality reduction.
- Support Vector Machines (SVM) - A powerful algorithm for classification tasks.
- K-Nearest Neighbors (KNN) - A simple, instance-based learning algorithm.
- Naive Bayes - A family of probabilistic classifiers based on Bayes' theorem.
- Ensemble Methods - Techniques like bagging and boosting for improving model accuracy.
- Neural Networks - A class of models inspired by the human brain's structure.
- K-Means Clustering - A popular clustering algorithm for partitioning data into K clusters.
- Hierarchical Clustering - A method of cluster analysis that builds a hierarchy of clusters.
- DBSCAN (Density-Based Spatial Clustering) - A clustering algorithm that identifies dense regions of data points.
- Gaussian Mixture Models (GMM) - A probabilistic model for representing normally distributed subpopulations within an overall population.
- Dimensionality Reduction - Techniques like PCA and t-SNE for reducing the number of features.
- Q-Learning - A value-based reinforcement learning algorithm.
- Deep Q-Network (DQN) - A deep learning approach for reinforcement learning tasks.
- Proximal Policy Optimization (PPO) - A policy gradient method for reinforcement learning.
- Actor-Critic Methods - A family of reinforcement learning algorithms that use both policy and value functions.
- OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
- UCI Machine Learning Repository - A collection of datasets for machine learning research.
- Kaggle Datasets - A platform for accessing diverse datasets and participating in competitions.
- Google Dataset Search - A search engine for discovering datasets across the web.
- OpenML - An open platform for sharing datasets and machine learning experiments.
- Data.gov - A portal for accessing public datasets.
- A Few Useful Things to Know About Machine Learning (2012) - A paper discussing important concepts in machine learning.
- The Elements of Statistical Learning (2001) - A comprehensive book on statistical learning.
- Gradient Boosting Machine Learning (2001) - The original paper introducing Gradient Boosting.
- Coursera: Machine Learning by Andrew Ng - A comprehensive course on machine learning.
- Fast.ai - Free courses and resources for practical machine learning.
- Google Machine Learning Crash Course - A fast-paced introduction to machine learning.
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron - A practical guide to machine learning.
- Pattern Recognition and Machine Learning by Christopher Bishop - A book covering the fundamentals of machine learning.
- Machine Learning Yearning by Andrew Ng - A guide on structuring machine learning projects effectively.
- Reddit: r/MachineLearning - A subreddit for discussions on machine learning.
- Kaggle - A platform for data science competitions and community interaction.
- Scikit-learn Mailing List - A place to discuss issues and features in scikit-learn.
Contributions are welcome. Please ensure your submission fully follows the requirements outlined in CONTRIBUTING.md, including formatting, scope alignment, and category placement.
Pull requests that do not adhere to the contribution guidelines may be closed.