Clarifai Model Deployment

Clarifai provides an easy-to-use platform to serve AI/ML models in production.

This guide walks you through deploying models on Clarifai — from scaffolding a project to running predictions in production.

Quick Start

Three commands to go from zero to a deployed model:

pip install -U clarifai

# 1. Authenticate
clarifai login

# 2. Scaffold a model (auto-selects GPU based on model size)
clarifai model init --toolkit vllm --model-name Qwen/Qwen3-0.6B

# 3. Deploy to production (auto-creates all infrastructure)
clarifai model deploy ./Qwen3-0.6B

The CLI handles compute cluster creation, nodepool setup, image building, and deployment automatically. When it finishes you'll see:

── Ready ──────────────────────────────────────────────
  Model deployed successfully!

  Model:       https://clarifai.com/you/main/models/qwen3-0-6b
  Deployment:  deploy-qwen3-0-6b-abc123
  Instance:    g5.xlarge

── Next Steps ─────────────────────────────────────────
  Predict:     clarifai model predict you/main/models/qwen3-0-6b "Hello"
  Logs:        clarifai model logs --deployment "deploy-qwen3-0-6b-abc123"
  Undeploy:    clarifai model undeploy --deployment "deploy-qwen3-0-6b-abc123"

Which path is right for you?

I want to...	Do this
Deploy a HuggingFace LLM (Qwen, Llama, Gemma, etc.)	`clarifai model init --toolkit vllm --model-name <hf-repo>` then `clarifai model deploy`
Deploy an MCP tool server	`clarifai model init --toolkit mcp` then write your tools
Wrap an existing API or local server	`clarifai model init --toolkit openai` or `--toolkit ollama`
Build a fully custom model (vision, embeddings, etc.)	`clarifai model init` then edit model.py — see Writing a Custom Model

Installation & Authentication

Install the SDK

pip install -U clarifai

Authenticate

# Interactive (prompts for your PAT)
clarifai login

# Non-interactive
clarifai login --pat $MY_PAT

# Org account
clarifai login --pat $PAT --user-id my-org

# Dev environment
clarifai login https://api-dev.clarifai.com

The CLI saves your credentials locally. Verify with:

clarifai whoami

You can manage multiple environments (prod, staging, org accounts) using named contexts:

clarifai config ls                  # List all contexts
clarifai config use <name>          # Switch active context

Note: You can generate a PAT in your Clarifai account under Personal Settings → Security.

Model Development Workflow

init  →  serve (optional)  →  deploy  →  predict  →  manage
scaffold   test locally       push to      run         status / logs /
project    before deploying   production   inference   undeploy

Step 1: Scaffold a Model (`clarifai model init`)

Generate a ready-to-deploy model project with a single command:

# Deploy a HuggingFace LLM with vLLM
clarifai model init --toolkit vllm --model-name Qwen/Qwen3-0.6B

# Deploy with SGLang
clarifai model init --toolkit sglang --model-name Qwen/Qwen2-7B

# Deploy a HuggingFace model directly
clarifai model init --toolkit huggingface --model-name google/gemma-2b

# Wrap a local Ollama model
clarifai model init --toolkit ollama --model-name llama3.1

# Create an MCP tool server
clarifai model init --toolkit mcp my-mcp-server

# Wrap an OpenAI-compatible API
clarifai model init --toolkit openai my-wrapper

# Blank Python model (full control)
clarifai model init my-model

Available toolkits:

Toolkit	Use Case
`vllm`	High-throughput LLM serving with vLLM
`sglang`	Fast LLM serving with SGLang
`huggingface`	HuggingFace Transformers (direct inference)
`ollama`	Wrap a local Ollama model
`lmstudio`	Wrap a local LM Studio model
`mcp`	MCP tool server (FastMCP)
`openai`	OpenAI-compatible API wrapper
`python`	Blank Python model (default)

What it creates:

my-model/
├── config.yaml          # Simplified config (auto-filled at deploy time)
├── requirements.txt     # Dependencies
└── 1/
    └── model.py         # Model implementation

Smart GPU selection: For HuggingFace models, the CLI queries the model's metadata (parameter count, quantization, architecture) and auto-selects the smallest GPU instance that fits:

$ clarifai model init --toolkit vllm --model-name Qwen/Qwen3-4B
  Instance: g5.xlarge (Estimated 15.9 GiB VRAM, fits g5.xlarge (22 GiB))

Step 2: Test Locally (`clarifai model serve`)

Before deploying, test your model locally:

# Run in current Python environment (fastest)
clarifai model serve ./my-model

# Auto-create virtualenv and install deps
clarifai model serve ./my-model --mode env

# Build and run inside Docker (recommended for production parity)
clarifai model serve ./my-model --mode container

# Standalone gRPC server (no login required, offline development)
clarifai model serve ./my-model --grpc --port 9000

What happens (default API-connected mode):

Validates your config.yaml
Starts the model server locally
Registers with Clarifai so you can test via the Playground UI
Displays a predict command and Playground URL
Cleans up on Ctrl+C

Option	Default	Description
`MODEL_PATH`	`.`	Model directory containing config.yaml
`--mode`	`none`	`none` (current env), `env` (auto-venv), `container` (Docker)
`--grpc`	off	Standalone gRPC server, no API connection needed
`--port`	8000	Server port (with `--grpc`)
`--concurrency`	32	Max concurrent requests
`--keep-image`	off	Keep Docker image after exit (container mode)

Step 3: Deploy to Production (`clarifai model deploy`)

Deploy your model to Clarifai cloud compute in one command. All infrastructure (compute cluster, nodepool, deployment) is created automatically:

# Deploy (uses instance from config.yaml)
clarifai model deploy ./my-model

# Override instance at deploy time
clarifai model deploy ./my-model --instance g6e.2xlarge

# GPU shorthands work too
clarifai model deploy ./my-model --instance a10g

# Specific cloud and region
clarifai model deploy ./my-model --instance g5.xlarge --cloud aws --region us-west-2

# Autoscaling
clarifai model deploy ./my-model --min-replicas 2 --max-replicas 10

# Deploy an already-uploaded model by URL
clarifai model deploy --model-url https://clarifai.com/user/app/models/id --instance g5.xlarge

Browse available instances:

$ clarifai list-instances

  Instance          Cloud   Region       GPUs  Accelerator  GPU Mem
  ──────────────────────────────────────────────────────────────────
  g5.xlarge         AWS     us-east-1    1     NVIDIA-A10G  24 GiB
  g6e.2xlarge       AWS     us-east-1    1     NVIDIA-L40S  48 GiB
  t3a.2xlarge       AWS     us-east-1    0     CPU          -
  n1-standard-4-t4  GCP     us-central1  1     NVIDIA-T4    16 GiB
  ...

Option	Default	Description
`MODEL_PATH`	`.`	Local model directory
`--instance`	auto	Hardware instance (e.g., `g5.xlarge`, `a10g`)
`--model-url`	-	Deploy an already-uploaded model (skip upload)
`--min-replicas`	1	Minimum running replicas
`--max-replicas`	5	Maximum replicas for autoscaling
`--cloud`	auto	Cloud provider (`aws`, `gcp`, `vultr`)
`--region`	auto	Cloud region (e.g., `us-east-1`)

Deploy phases:

── Validate ──  Config validation, HuggingFace repo access check
── Upload ────  Build and push model image
── Deploy ────  Create compute cluster, nodepool, deployment
── Monitor ───  Stream pod events until ready
── Ready ─────  Show model URL, predict command, status/log commands

Step 4: Run Predictions (`clarifai model predict`)

Once deployed, run predictions directly from the CLI:

# Text input
clarifai model predict user/app/models/my-model "Hello world"

# Pipe from stdin
echo "Explain quantum computing" | clarifai model predict user/app/models/my-model

# Image/video/audio file
clarifai model predict user/app/models/detector --file photo.jpg

# Image URL
clarifai model predict user/app/models/detector --url https://example.com/img.jpg

# Named parameters
clarifai model predict user/app/models/llm -i prompt="Hello" -i max_tokens=200

# OpenAI chat mode
clarifai model predict openai/chat-completion/models/GPT-4 --chat "What is AI?"

# JSON output
clarifai model predict user/app/models/my-model "Hello" -o json

# Inspect available methods and their parameters
clarifai model predict user/app/models/my-model --info

Option	Description
`MODEL`	Model as `user/app/models/id` or full URL
`INPUT`	Text input (positional)
`--file PATH`	Local file (image, audio, video)
`--url URL`	Remote media URL
`-i KEY=VALUE`	Named parameter (repeatable)
`--chat TEXT`	OpenAI chat mode
`--info`	Show available methods, then exit
`-o text\|json`	Output format (default: `text`)
`--deployment ID`	Route to a specific deployment

Step 5: Manage Your Deployment

Check status:

clarifai model status --deployment deploy-abc123
clarifai model status user/app/models/my-model

Stream logs:

# Model stdout/stderr
clarifai model logs --deployment deploy-abc123

# Kubernetes scheduling/scaling events
clarifai model logs --deployment deploy-abc123 --log-type events

# Print current logs and exit
clarifai model logs --deployment deploy-abc123 --no-follow

Remove a deployment:

clarifai model undeploy --deployment deploy-abc123

Writing a Custom Model

Most users don't need this section. If you're deploying a HuggingFace model, clarifai model init --toolkit vllm generates everything for you. Read on if you want to customize the generated code or build a model from scratch.

Model Folder Structure

my-model/
├── config.yaml          # Model configuration
├── requirements.txt     # Python dependencies
└── 1/
    └── model.py         # Model implementation

config.yaml

The simplified config requires only a few fields. Everything else is auto-filled at deploy time:

model:
  id: "my-model"

build_info:
  python_version: "3.11"

compute:
  instance: g5.xlarge

checkpoints:
  repo_id: Qwen/Qwen3-0.6B
  type: huggingface
  when: runtime

Key fields:

Field	Required	Description
`model.id`	Yes	Model identifier
`model.model_type_id`	No	Defaults to `any-to-any`
`compute.instance`	Yes (for GPU)	Instance type (run `clarifai list-instances` to see options)
`build_info.python_version`	No	Defaults to `3.12`
`checkpoints.repo_id`	No	HuggingFace repo for auto-download
`checkpoints.when`	No	`runtime` (default), `build`, or `upload`

Tip: user_id and app_id are resolved automatically from your CLI context — you don't need to put them in config.yaml.

Tip: For large models, use when: runtime to download checkpoints at runtime. This keeps Docker images small and speeds up builds.

Using a custom Docker base image (for vLLM, SGLang, etc.):

build_info:
  image: "lmsysorg/sglang:latest"

checkpoints:
  repo_id: Qwen/Qwen3-4B
  type: huggingface
  when: runtime

compute:
  instance: g5.xlarge

model:
  id: qwen3-4b

model.py

Your model must define a class that inherits from one of the base classes (ModelClass, OpenAIModelClass, or MCPModelClass) and expose at least one method decorated with @<BaseClass>.method.

from typing import Iterator
from clarifai.runners.models.model_class import ModelClass

class MyModel(ModelClass):

    def load_model(self):
        """Called once at startup. Load weights, initialize pipelines, etc."""
        self.model = ...

    @ModelClass.method
    def predict(self, prompt: str = "", max_tokens: int = 256) -> str:
        """Single request → single response."""
        return self.model(prompt, max_tokens=max_tokens)

    @ModelClass.method
    def generate(self, prompt: str = "") -> Iterator[str]:
        """Single request → streamed response."""
        for token in self.model.stream(prompt):
            yield token

Method types (determined by type hints):

Pattern	Signature	Use Case
Unary	`def predict(self, ...) -> str`	Standard request-response
Server streaming	`def generate(self, ...) -> Iterator[str]`	LLM token streaming
Bidirectional streaming	`def stream(self, input: Iterator[str]) -> Iterator[str]`	Real-time processing

load_model() (optional): Use this for expensive one-time initialization — loading model weights, creating pipelines, downloading checkpoints. Called once when the container starts.

Testing: When you run clarifai model serve, it loads your model and you can test via the Playground UI or CLI predict.

requirements.txt

List all Python dependencies your model needs:

torch==2.5.1
transformers
accelerate

Tip: If your model requires PyTorch, specify the version explicitly (e.g., torch==2.5.1 or torch==2.4.1). Clarifai provides optimized pre-built images for these versions that significantly speed up builds and reduce cold start times.

Model Types

Clarifai provides three base classes. Pick the one that matches your use case:

Base Class	When to Use
`ModelClass`	Custom Python inference — classification, detection, embeddings, any logic
`OpenAIModelClass`	LLM serving via OpenAI-compatible API (vLLM, SGLang, LMDeploy)
`MCPModelClass`	MCP tool servers for AI agents (FastMCP)

Python Model (ModelClass)

For custom inference logic — classification, detection, embeddings, or any Python code.

from clarifai.runners.models.model_class import ModelClass
from clarifai.runners.utils.data_types import Image, Text

class ImageCaptioner(ModelClass):

    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")

    @ModelClass.method
    def predict(self, image: Image) -> str:
        pil_image = image.to_pil()
        result = self.pipe(pil_image)
        return result[0]["generated_text"]

See: hello-world/ for a minimal example.

LLM / OpenAI-Compatible Model (OpenAIModelClass)

For models served via an OpenAI-compatible API (vLLM, SGLang, LMDeploy, etc.):

from typing import Iterator, List
from openai import OpenAI
from clarifai.runners.models.openai_class import OpenAIModelClass
from clarifai.runners.utils.data_utils import Param
from clarifai.runners.utils.openai_convertor import build_openai_messages

class MyLLM(OpenAIModelClass):
    client = OpenAI(api_key="local-key", base_url="http://localhost:8000/v1")
    model = client.models.list().data[0].id

    def load_model(self):
        """Start the inference server (vLLM, SGLang, etc.)."""
        pass

    @OpenAIModelClass.method
    def predict(
        self,
        prompt: str = "",
        chat_history: List[dict] = None,
        max_tokens: int = Param(default=256, description="Max tokens to generate."),
        temperature: float = Param(default=1.0, description="Sampling temperature."),
    ) -> str:
        messages = build_openai_messages(prompt, chat_history)
        response = self.client.chat.completions.create(
            model=self.model, messages=messages,
            max_completion_tokens=max_tokens, temperature=temperature,
        )
        return response.choices[0].message.content

    @OpenAIModelClass.method
    def generate(
        self,
        prompt: str = "",
        chat_history: List[dict] = None,
        max_tokens: int = Param(default=256, description="Max tokens to generate."),
        temperature: float = Param(default=1.0, description="Sampling temperature."),
    ) -> Iterator[str]:
        messages = build_openai_messages(prompt, chat_history)
        for chunk in self.client.chat.completions.create(
            model=self.model, messages=messages,
            max_completion_tokens=max_tokens, temperature=temperature, stream=True,
        ):
            if chunk.choices:
                yield chunk.choices[0].delta.content or ""

See: llm/vllm-qwen3-0.6B/ for a vLLM example, or llm/sglang-qwen3.5-4b/ for SGLang.

Tip: The fastest way to deploy an LLM is clarifai model init --toolkit vllm --model-name <hf-repo> — it generates the full model.py, config.yaml, and requirements.txt with the right server setup.

MCP Tool Server (MCPModelClass)

For deploying MCP tool servers that AI agents can use:

from fastmcp import FastMCP
from pydantic import Field
from clarifai.runners.models.mcp_class import MCPModelClass

server = FastMCP("my-tools", instructions="Useful tools for agents.", stateless_http=True)

@server.tool("hello", description="Say hello to someone")
def hello(name: str = Field(description="Name to greet")) -> str:
    return f"Hello, {name}!"

@server.resource("config://version")
def get_version():
    return "1.0.0"

class MyModel(MCPModelClass):
    def get_server(self) -> FastMCP:
        return server

See: mcp/math/ for a minimal example, or the mcp/ directory for real-world MCP servers (GitHub, Postgres, browser tools, code execution, etc.).

Using the Python SDK Client

In addition to the CLI (clarifai model predict), you can call your deployed model from Python code. The SDK client mirrors the exact method signatures defined in your model.py:

from clarifai.client.model import Model

# Initialize by model URL
model = Model(model_url="https://clarifai.com/user/app/models/my-model")

# Or by IDs
model = Model(user_id="user", app_id="app", model_id="my-model")

Discover methods and signatures:

# List available methods
print(model.available_methods())

# Get method signature (parameter names, types, defaults)
print(model.method_signature(method_name="predict"))

# Generate a ready-to-use code snippet
print(model.generate_client_script())

Unary prediction (request → response):

result = model.predict(prompt="Hello world")
print(result)

# Batch prediction (multiple inputs)
results = model.predict([
    {"prompt": "Hello"},
    {"prompt": "World"},
])

Streaming prediction (request → stream of responses):

for chunk in model.generate(prompt="Explain quantum computing"):
    print(chunk, end="", flush=True)

Bidirectional streaming:

from clarifai.runners.utils.data_types import Audio

for transcript in model.transcribe(audio=iter(Audio(bytes=b"..."))):
    print(transcript.text)

Available Examples

LLM Models (`llm/`)

Example	Toolkit	Description
`vllm-qwen3-0.6B`	vLLM	Qwen3 0.6B with vLLM
`sglang-qwen3.5-4b`	SGLang	Qwen3.5 4B with SGLang
`lmdeploy-qwen3-0.6b`	LMDeploy	Qwen3 0.6B with LMDeploy
`hf-llama-3_2-1b-instruct`	HuggingFace	Llama 3.2 1B with Transformers
`agentic-gpt-5_1`	Agentic	Agentic model with tool use
`agentic-gpt-oss-20b`	Agentic	Open-source agentic model

MCP Tool Servers (`mcp/`)

Example	Description
`browser-tools`	Web browsing, scraping, and search
`github-mcp-server`	GitHub operations (issues, PRs, repos)
`postgres`	PostgreSQL database operations
`google-drive`	Google Drive file management
`code-execution-docker-version`	Sandboxed code execution
`slack-tools-server`	Slack integration
`web-search`	Web search functionality

Multimodal Models (`multimodal-models/`)

Example	Description
`qwen2_5-VL-3B-Instruct-vllm`	Qwen2.5 Vision 3B with vLLM
`sglang-deepseek-ocr`	DeepSeek OCR with SGLang

Other Models

Example	Description
`hello-world`	Minimal starter (text in → text out)
`image-classifier`	Image classification
`image-detector`	Object detection
`image-segmenter`	Image segmentation
`text-embedder`	Text embeddings
`ocr`	Optical character recognition
`multimodal-models`	Multi-modal examples

Supported Data Types

Clarifai's model framework supports rich data typing for both inputs and outputs. Each method parameter and return type must be annotated.

Common types:

Type	Import	Example
`str`, `int`, `float`, `bool`	built-in	`prompt: str = ""`
`Text`	`clarifai.runners.utils.data_types`	`Text("hello")`, `Text(url="...")`
`Image`	`clarifai.runners.utils.data_types`	`Image(url="...")`, `Image.from_pil(img)`
`Audio`	`clarifai.runners.utils.data_types`	`Audio(bytes=b"...")`
`Video`	`clarifai.runners.utils.data_types`	`Video(url="...")`
`List[T]`, `Dict[K, V]`	`typing`	`List[Image]`, `Dict[str, float]`
`Iterator[T]`	`typing`	`Iterator[str]` (for streaming)

For the full reference including Concept, Region, Frame, NamedFields, and container types, see SUPPORTED_DATATYPE.md.

Advanced Topics

HuggingFace Token for Gated Models

Models like Llama, Gemma, and other gated repos require an HuggingFace token:

# Set in environment (recommended)
export HF_TOKEN=hf_...

# Or add to config.yaml
checkpoints:
  repo_id: meta-llama/Llama-3.1-8B-Instruct
  hf_token: hf_...

The CLI validates HuggingFace access early (before building the Docker image) and provides actionable error messages if access is denied.

Note: When HF_TOKEN is in your environment, the CLI automatically persists it to config.yaml during deploy so the container can access it.

Compute Instance Discovery

Browse all available GPU instances across cloud providers:

clarifai list-instances                      # All instances
clarifai list-instances --cloud aws          # AWS only
clarifai list-instances --gpu H100           # H100 instances
clarifai list-instances --min-gpus 2         # Multi-GPU
clarifai list-instances --min-gpu-mem 48Gi   # 48+ GiB GPU memory

Upload Without Deploying

If you want to upload a model image without immediately deploying it:

clarifai model upload ./my-model

You can deploy it later with:

clarifai model deploy --model-url https://clarifai.com/user/app/models/my-model --instance g5.xlarge

Model Concepts / Labels

For classification or detection models that output concepts/labels, define them in config.yaml:

concepts:
  - id: "0"
    name: bus
  - id: "1"
    name: person
  - id: "2"
    name: car

Note: If your model loads from HuggingFace with checkpoints defined, concepts are inferred automatically.

Per-Command Context Override

Any command can use a different authentication context for a single invocation:

clarifai --context staging model deploy ./my-model
clarifai --context prod-openai model predict openai/chat-completion/models/GPT-4 "Hello"

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
hello-world		hello-world
image-classifier/nsfw-image-classifier		image-classifier/nsfw-image-classifier
image-detector		image-detector
image-segmenter/mask2former-ade		image-segmenter/mask2former-ade
image-text-to-image/stable-diffusion-2-depth		image-text-to-image/stable-diffusion-2-depth
llm		llm
local-runners		local-runners
mcp		mcp
multimodal-models		multimodal-models
ocr/nanonets-ocr-s		ocr/nanonets-ocr-s
text-embedder		text-embedder
text-to-image/flux-schnell		text-to-image/flux-schnell
.gitignore		.gitignore
.ruff.toml		.ruff.toml
README.md		README.md
SUPPORTED_DATATYPE.md		SUPPORTED_DATATYPE.md

Folders and files

Latest commit

History

Repository files navigation

Clarifai Model Deployment

Quick Start

Which path is right for you?

Contents

Installation & Authentication

Install the SDK

Authenticate

Model Development Workflow

Step 1: Scaffold a Model (clarifai model init)

Step 2: Test Locally (clarifai model serve)

Step 3: Deploy to Production (clarifai model deploy)

Step 4: Run Predictions (clarifai model predict)

Step 5: Manage Your Deployment

Writing a Custom Model

Model Folder Structure

config.yaml

model.py

requirements.txt

Model Types

Python Model (ModelClass)

LLM / OpenAI-Compatible Model (OpenAIModelClass)

MCP Tool Server (MCPModelClass)

Using the Python SDK Client

Available Examples

LLM Models (llm/)

MCP Tool Servers (mcp/)

Multimodal Models (multimodal-models/)

Other Models

Supported Data Types

Advanced Topics

HuggingFace Token for Gated Models

Compute Instance Discovery

Upload Without Deploying

Model Concepts / Labels

Per-Command Context Override

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Step 1: Scaffold a Model (`clarifai model init`)

Step 2: Test Locally (`clarifai model serve`)

Step 3: Deploy to Production (`clarifai model deploy`)

Step 4: Run Predictions (`clarifai model predict`)

LLM Models (`llm/`)

MCP Tool Servers (`mcp/`)

Multimodal Models (`multimodal-models/`)

Packages