As a full-stack developer, being able to efficiently build, deploy and manage machine learning models is a crucial skill. Pyfunc is an MLflow flavor that makes it simpler to save Python functions as models, allowing you to port them easily across environments.
In this comprehensive guide, we‘ll explore practical pyfunc examples for operationalizing models with MLflow.
Overview of Pyfunc and MLflow
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It offers capabilities like:
- Tracking experiments with metrics and parameters
- Packaging models in standardized formats
- Deploying models to diverse serving environments
- Instrumenting models for performance monitoring
The key benefit of MLflow is its model packaging formats called flavors. Flavors allow you to export models in a reusable way for different downstream tools.
One such useful flavor is Pyfunc, which saves Python functions as models. As a full-stack developer, Pyfunc enables you to:
- Encapsulate any Python function as an MLflow model
- Load Pyfunc models for inference in Python environments
- Deploy Pyfunc models to various production platforms
In essence, Pyfunc makes model portability and reuse easier. Next, we‘ll look at concrete examples to demonstrate this capability.
End-to-End Example for Logging and Loading a Pyfunc Model
Let‘s walk through a simple ML model we‘ll create with Pyfunc:
- We‘ll build a
CarModelclass with attributes like brand, model, year - Log it as a Pyfunc model with MLflow
- Then load the model back for inference
Here are the key steps:
1. Import Required Modules
We import pyfunc from mlflow along with the main mlflow module:
import mlflow.pyfunc
import mlflow
2. Define the Model Class
Next, we define a CarModel class that subclasses PythonModel:
class CarModel(mlflow.pyfunc.PythonModel):
def __init__(self, car_brand, model, year):
self.car = Car(car_brand, model, year)
def load_context(self, context):
pass
def predict(self, context, model_input):
return [self.car.display_info()]
The key methods are:
__init__: Initialize car attributesload_context: Can load artifacts like dictionariespredict: Generate prediction
3. Create a Car Object
We can create a simple Car class to represent each car:
class Car:
def __init__(self, brand, model, year):
self.brand = brand
self.model = model
self.year = year
def display_info(self):
return f"{self.year} {self.brand} {self.model}"
And initialize a sample car instance:
car = Car(brand="Toyota", model="Prius", year=2018)
4. Log the Model with MLflow
We instantiate our CarModel with the car object then log it:
model = CarModel(car.brand, car.model, car.year)
mlflow.pyfunc.log_model("car_model", python_model=model)
This persists the model with the Pyfunc flavor.
5. Load and Test the Model
In a separate Python session, we can load the model using the run ID:
loaded_model = mlflow.pyfunc.load_model("runs:/<run_id>/car_model")
car_info = loaded_model.predict(None)
print(car_info) # [‘2018 Toyota Prius‘]
And we are able to invoke predict() on the loaded model for inference!
This is a simple example, but it illustrates the core workflow. Next let‘s discuss some best practices when using Pyfunc.
Best Practices for Pyfunc Models
When leveraging Pyfunc and MLflow here are some recommendations to follow:
1. Idempotent predict() method
The predict() method should be idempotent, meaning multiple calls should return the same result. It should not have side effects either. This ensures prediction behavior is consistent across runs.
2. No external dependencies
Avoid relying on external modules or files within the model. Dependencies should be included with the model artifact using the python_env argument when logging with log_model().
3. Local code organization
Structure code into modules like model.py, predict.py, etc for easier testing and logging.
4. Input validation
Check for valid input types and data shapes within predict(). Raise exceptions on invalid inputs.
5. Output standardization
Standardize the output format across models. Return Numpy arrays or DataFrames instead of raw Python types.
Adhering to these best practices will ensure your Pyfunc models are scalable, portable and production-ready.
Deploying Pyfunc Models to TensorFlow Serving
Once a model is packaged with Pyfunc, we can deploy it to various model serving platforms. Here we‘ll look at deployment to TensorFlow Serving specifically.
TensorFlow Serving is a scalable server for production model deployment. Some benefits include:
- High-performance predictions via TensorFlow
- Scales to any traffic volume
- Serves multiple models
- A/B testing capabilities
The mlflow TensorFlow Serving plugin enables one-click deployment of Pyfunc models. The steps are:
1. Containerize Model
First containerize the Pyfunc model. This bundles all its dependencies into a Docker image:
mlflow models build-docker -m runs:/<run-id>/model
2. Serve It with TensorFlow Serving
Next invoke TensorFlow Serving on the image:
docker run -p 8500:8500 \
-e MODEL_NAME=model -e TF_SERVER_TIMEOUT=3600 \
<image> \
tensorflow_model_server --rest_api_port=8500 --model_name=model
This launches TensorFlow Serving with the model loaded, listening on port 8500.
3. Send Prediction Requests
We can now send prediction requests to TensorFlow Serving at localhost:8500/v1/models/${MODEL_NAME}:
import json
import requests
headers = {"content-type": "application/json"}
data = json.dumps({"car_year": 2018, "car_brand": "Toyota", "model": "Prius"})
response = requests.post(url, data=data, headers=headers)
print(response.text) # [‘2018 Toyota Prius‘]
And we successfully served our original Pyfunc model!
This demonstrates how portable these models are for productionization. Some other serving platforms like SageMaker, Azure ML and Cloud Run also have integration with MLflow.
Going Further with Pyfunc Models
We covered the fundamentals of saving, loading and serving Pyfunc models with MLflow. Here are some additional directions for leveraging them:
- Model composition: Chain together Pyfunc models into pipelines
- Stream processing: Build models that handle real-time data
- Type checking: Add type hints and checks for resilience
- Caching: Enable caching for performance
- Packaging: Containerize models using framework-specific model servers like Seldon Core
The options are endless when harnessing Pyfunc models thanks to their flexibility!
Conclusion
Pyfunc is an invaluable capability for operationalizing models with MLflow. As we saw, it empowers you to encapsulate models as portable Python functions.
This guide provided end-to-end examples of:
- Logging custom Python models
- Reloading them for reuse
- Deploying them via TensorFlow Serving
We also covered best practices when leveraging Pyfunc models. Adopting these will ensure smooth deployment and serving.
To build truly scalable machine learning pipelines, Pyfunc is an essential tool in your full-stack toolbox! With it, you can migrate models seamlessly across the whole devops lifecycle.
I hope you found these Pyfunc examples useful. Please feel free to reach out in the comments with any other use cases you‘ve built leveraging its capabilities!


