Article Categories

Selected Reading

Auto Machine Learning Python Equivalent code explained

Machine Learning Artificial Intelligence Python

Automated Machine Learning (AutoML) simplifies the process of building machine learning models by automating tasks like feature engineering, model selection, and hyperparameter tuning. This tutorial demonstrates how to use Auto-sklearn, a powerful Python library built on scikit-learn that automatically finds the best model and hyperparameters for your dataset.

What is Auto-sklearn?

Auto-sklearn is an open-source framework that automates machine learning pipeline creation. It uses Bayesian optimization and meta-learning to efficiently search through possible machine learning pipelines, automatically selecting the best combination of preprocessing steps, algorithms, and hyperparameters for your specific dataset.

Key features include:

Automatic model selection and hyperparameter optimization
Built-in ensemble methods for improved performance
Support for both classification and regression tasks
Easy-to-use scikit-learn compatible API

Basic AutoML Example with Digits Dataset

Let's implement a complete AutoML solution using the handwritten digits dataset. This example demonstrates how Auto-sklearn automatically handles the entire machine learning pipeline ?

import autosklearn.classification
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load the digits dataset
X, y = load_digits(return_X_y=True)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# Create AutoML classifier with time constraints
automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=180,  # Total time budget in seconds
    per_run_time_limit=30,        # Time limit per model evaluation
    memory_limit=3072             # Memory limit in MB
)

# Fit the AutoML model
print("Training AutoML model...")
automl.fit(X_train, y_train)

# Make predictions
y_pred = automl.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy:.4f}")

# Display model statistics
print(f"\nModels evaluated: {len(automl.leaderboard())}")
print("Best model found:")
print(automl.show_models())

Dataset shape: (1797, 64)
Number of classes: 10
Training AutoML model...
Test Accuracy: 0.9867

How Auto-sklearn Works

Auto-sklearn follows this automated workflow:

Meta-learning: Uses knowledge from previous datasets to warm-start the optimization process
Bayesian Optimization: Efficiently searches the hyperparameter space
Ensemble Selection: Combines multiple models to create a robust final predictor
Automated Preprocessing: Handles feature scaling, encoding, and selection automatically

Advanced Configuration Example

You can customize Auto-sklearn's behavior with various parameters ?

import autosklearn.classification
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load a different dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Configure AutoML with custom settings
automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,
    per_run_time_limit=60,
    memory_limit=4096,
    ensemble_size=20,          # Number of models in final ensemble
    initial_configurations_via_metalearning=10,
    include_estimators=['random_forest', 'extra_trees', 'gradient_boosting'],
    exclude_preprocessors=['kitchen_sinks']
)

# Train and evaluate
automl.fit(X_train, y_train)
accuracy = automl.score(X_test, y_test)

print(f"Breast Cancer Dataset Accuracy: {accuracy:.4f}")
print(f"Final ensemble contains {automl.get_models_with_weights()} models")

Key Parameters Explained

Parameter	Description	Default
`time_left_for_this_task`	Total time budget in seconds	3600
`per_run_time_limit`	Time limit per model evaluation	1/10 of total time
`memory_limit`	Memory limit in MB	3072
`ensemble_size`	Number of models in final ensemble	50

Best Practices

When using Auto-sklearn effectively:

Set appropriate time budgets: Allow sufficient time for model exploration (at least 5-10 minutes for small datasets)
Monitor memory usage: Increase memory_limit for large datasets
Use cross-validation: Auto-sklearn uses cross-validation internally for robust model selection
Consider data preprocessing: While Auto-sklearn handles preprocessing, clean data still produces better results

Conclusion

Auto-sklearn democratizes machine learning by automating the complex process of model selection and hyperparameter tuning. It's particularly valuable for beginners and situations where you need quick, reliable results without extensive ML expertise. The library's Bayesian optimization and ensemble methods often produce competitive results with minimal manual intervention.

Premansh Sharma

Updated on: 2026-03-27T01:09:36+05:30

266 Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next