Auto Machine Learning Python Equivalent code explained

Automated Machine Learning (AutoML) simplifies the process of building machine learning models by automating tasks like feature engineering, model selection, and hyperparameter tuning. This tutorial demonstrates how to use Auto-sklearn, a powerful Python library built on scikit-learn that automatically finds the best model and hyperparameters for your dataset.

What is Auto-sklearn?

Auto-sklearn is an open-source framework that automates machine learning pipeline creation. It uses Bayesian optimization and meta-learning to efficiently search through possible machine learning pipelines, automatically selecting the best combination of preprocessing steps, algorithms, and hyperparameters for your specific dataset.

Key features include:

  • Automatic model selection and hyperparameter optimization

  • Built-in ensemble methods for improved performance

  • Support for both classification and regression tasks

  • Easy-to-use scikit-learn compatible API

Basic AutoML Example with Digits Dataset

Let's implement a complete AutoML solution using the handwritten digits dataset. This example demonstrates how Auto-sklearn automatically handles the entire machine learning pipeline ?

import autosklearn.classification
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load the digits dataset
X, y = load_digits(return_X_y=True)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# Create AutoML classifier with time constraints
automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=180,  # Total time budget in seconds
    per_run_time_limit=30,        # Time limit per model evaluation
    memory_limit=3072             # Memory limit in MB
)

# Fit the AutoML model
print("Training AutoML model...")
automl.fit(X_train, y_train)

# Make predictions
y_pred = automl.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy:.4f}")

# Display model statistics
print(f"\nModels evaluated: {len(automl.leaderboard())}")
print("Best model found:")
print(automl.show_models())
Dataset shape: (1797, 64)
Number of classes: 10
Training AutoML model...
Test Accuracy: 0.9867

How Auto-sklearn Works

Auto-sklearn follows this automated workflow:

  1. Meta-learning: Uses knowledge from previous datasets to warm-start the optimization process

  2. Bayesian Optimization: Efficiently searches the hyperparameter space

  3. Ensemble Selection: Combines multiple models to create a robust final predictor

  4. Automated Preprocessing: Handles feature scaling, encoding, and selection automatically

Advanced Configuration Example

You can customize Auto-sklearn's behavior with various parameters ?

import autosklearn.classification
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load a different dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Configure AutoML with custom settings
automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,
    per_run_time_limit=60,
    memory_limit=4096,
    ensemble_size=20,          # Number of models in final ensemble
    initial_configurations_via_metalearning=10,
    include_estimators=['random_forest', 'extra_trees', 'gradient_boosting'],
    exclude_preprocessors=['kitchen_sinks']
)

# Train and evaluate
automl.fit(X_train, y_train)
accuracy = automl.score(X_test, y_test)

print(f"Breast Cancer Dataset Accuracy: {accuracy:.4f}")
print(f"Final ensemble contains {automl.get_models_with_weights()} models")

Key Parameters Explained

Parameter Description Default
time_left_for_this_task Total time budget in seconds 3600
per_run_time_limit Time limit per model evaluation 1/10 of total time
memory_limit Memory limit in MB 3072
ensemble_size Number of models in final ensemble 50

Best Practices

When using Auto-sklearn effectively:

  • Set appropriate time budgets: Allow sufficient time for model exploration (at least 5-10 minutes for small datasets)

  • Monitor memory usage: Increase memory_limit for large datasets

  • Use cross-validation: Auto-sklearn uses cross-validation internally for robust model selection

  • Consider data preprocessing: While Auto-sklearn handles preprocessing, clean data still produces better results

Conclusion

Auto-sklearn democratizes machine learning by automating the complex process of model selection and hyperparameter tuning. It's particularly valuable for beginners and situations where you need quick, reliable results without extensive ML expertise. The library's Bayesian optimization and ensemble methods often produce competitive results with minimal manual intervention.

Updated on: 2026-03-27T01:09:36+05:30

266 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements