Hanzo

LLM Gateway

Unified proxy for 200+ LLM providers. One API, all models. Load balancing, caching, rate limiting, and observability.

LLM Gateway

One API for 200+ language models from every major provider. OpenAI-compatible interface with load balancing, caching, rate limiting, fallbacks, and full observability.

curl https://llm.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Why Hanzo LLM Gateway?

  • 200+ Models — Claude, GPT, Gemini, Llama, Mistral, and more
  • OpenAI Compatible — Drop-in replacement, use any OpenAI SDK
  • Smart Routing — Automatic fallbacks, load balancing, model selection
  • Cost Control — Per-key budgets, rate limits, usage analytics
  • Caching — Semantic and exact caching to reduce costs up to 90%
  • Observability — Full request logging, latency tracking, token analytics

Supported Providers

ProviderModelsFeatures
AnthropicClaude 4.x, Claude 3.xVision, tool use, extended context
OpenAIGPT-4o, o3, o4-miniFunction calling, vision, DALL-E
GoogleGemini 2.x, PaLMMultimodal, grounding
MetaLlama 3.x, Llama 4Open source, self-hosted
MistralMistral Large, CodestralEuropean, code generation
Together AI50+ open modelsFast inference, fine-tuning
GroqLlama, MixtralFastest inference
Zen LM600M-480BFrontier open-weight models

SDK Usage

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-hanzo-api-key",
    base_url="https://llm.hanzo.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.choices[0].message.content)

TypeScript

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: process.env.HANZO_API_KEY,
  baseURL: 'https://llm.hanzo.ai/v1',
})

const completion = await client.chat.completions.create({
  model: 'claude-sonnet-4-5-20250929',
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
})
console.log(completion.choices[0].message.content)

Key Features

Smart Routing

Automatic fallbacks between providers when one is down

Cost Management

Per-key budgets, rate limits, and usage analytics

Semantic Caching

Cache similar requests to reduce costs up to 90%

Guardrails

Content filtering, PII detection, and safety controls

Observability

Full request logging with latency and token analytics

Fine-tuning

Custom model training via Together AI and Zen LM

Configuration

model_list:
  - model_name: "default"
    litellm_params:
      model: "anthropic/claude-sonnet-4-5-20250929"
      api_key: "os.environ/ANTHROPIC_API_KEY"

  - model_name: "default"
    litellm_params:
      model: "openai/gpt-4o"
      api_key: "os.environ/OPENAI_API_KEY"

  - model_name: "fast"
    litellm_params:
      model: "groq/llama-3.1-70b"
      api_key: "os.environ/GROQ_API_KEY"

router_settings:
  routing_strategy: "latency-based-routing"
  fallbacks:
    - default: ["fast"]

API Endpoints

EndpointDescription
POST /v1/chat/completionsChat completions (streaming supported)
POST /v1/completionsText completions
POST /v1/embeddingsText embeddings
POST /v1/images/generationsImage generation
GET /v1/modelsList available models
POST /key/generateCreate API keys with budgets
GET /key/infoKey usage and budget info

Getting Started

Create an account, get an API key, and make your first Hanzo API call in under five minutes.

Authentication

Single sign-on, OAuth 2.0 + OIDC flows, session management, and account linking across all Hanzo services.

API Keys

Key types, scopes, creation, rotation, and production best practices for Hanzo API keys.

Organizations

Multi-org setup, resource scoping, team management, and white-label login for Hanzo organizations.

Platform

Hanzo Cloud Platform — 33 unified services for AI, automation, infrastructure, and operations.

API Reference

Complete API reference for all Hanzo Cloud services. Generated from OpenAPI 3.1.0 specifications.

SDKs

Official Hanzo SDKs for Python, TypeScript, Go, Rust, and C/C++.

Overview

Hanzo Commerce - Full-stack e-commerce platform with AI-powered recommendations

Hanzo Studio

Visual AI engine for building image, video, 3D, and audio workflows. Node-based editor with GPU scaling, multi-tenant isolation, and enterprise billing.

MCP

Model Context Protocol as a Service — Launch any MCP server with one command. 260+ tools available.

Hanzo Dev

AI-powered coding assistant for your terminal. Install, configure, and operate Hanzo Dev.

ZAP Protocol

Zero-Copy App Proto — High-performance Cap'n Proto RPC for AI agent communication. 10-100x faster than MCP with post-quantum security.

Hanzo Tasks

Durable workflow execution engine

Hanzo Functions

Serverless compute platform

Hanzo PubSub

High-performance messaging and streaming

Hanzo O11y

Full-stack observability platform

Hanzo Zero Trust

Programmable zero-trust networking

How is this guide?

Last updated on

On this page