5M+ monthly downloadsOpen weights · commercial use

The production-ready vision language model.

Production VLMs need more than just accuracy. They need to be fast enough for real-time decisions, and run anywhere you deploy. That's what Moondream is built for.

Got questions?
>model.detect("Misoriented box")
Misoriented box 0.94
{label: "Misoriented box", conf: 0.94}
Moondream 3 Preview · Photon · Fine-tuned with Lens
01
Step 1 · Try it

Try the open models. You might already be done.

Moondream might already nail your use case out of the box. The open models are commercially friendly and can run anywhere. Use our playground to try it out or download it and run it yourself.

Open playground
$pip install moondream

No credit card required. $5 in credits added monthly.

python
# Caption an image in four lines
import moondream as md
from PIL import Image

model = md.vl(model="moondream-2b")
image = Image.open("shelf.jpg")
print(model.caption(image).caption)
# → 'A warehouse shelf with six cardboard cartons…'
02
Step 2 · Fine-tune it

Need more? Lens gets you to production-grade accuracy

Your data is specific, so the model has to be. Lens is a fine-tuning platform with a simple API. No dataset uploads, no infrastructure, no ML team required.

Self-serve API

A simple hosted API — no hardware to rent or manage. Supports SFT and RL. Vibe-code your fine-tune script in minutes.

Tune and go

Your fine-tuned model is instantly ready to run on Moondream Cloud or locally with Photon. No cumbersome download or install step.

White-glove option

Our team handles the labeling protocol, loss design, and evaluation. You keep the weights, the training code, and the data. Unlike ML consulting, you walk away self-sufficient.

No massive dataset required

Our reinforcement-learning fine-tune API can dramatically improve accuracy with as few as 20 labeled images — not thousands.

03
Step 3 · Run it anywhere

Fast, efficient, runs everwhere you need it.

Once your model is accurate, performance and cost become the next wall. Photon is the inference engine we built to run Moondream in production. Moondream Cloud and partner clouds give you a hosted path if you want one.

Speed

Under 500 ms is the difference between a useful answer and a late one. Photon runs Moondream in roughly half the time vLLM does on the same hardware.

~2×vs. vLLM · H100
Cost

A VLM running across a fleet of cameras at the wrong efficiency costs thousands a day. Moondream is the lowest-cost VLM we have measured across the inference providers we tested.

$0.06per 1K images, cloud
Flexibility

Your deployment story will change. Start in the cloud, move to the edge, or run air-gapped. You pick the hardware. The model and APIs stay the same.

8supported hardware tiers
Internal benchmark
Time per request
median of 200 runs
Moondream 3 Preview + PhotonH100 · batch 1
34 msbaseline
Qwen 3.5 4B + vLLMH100 · batch 1
73 ms2.1× slower
GPT-5.4 MiniOpenAI API
2.78 s82× slower
Gemini 2.5 FlashGoogle API
3.79 s111× slower
1920×1080 input · single-turn detect · NVIDIA H100 80GB · 2026-02 build
Photon · hardware
Same model, every tier.

Measured on the ChartQA test split with prefix caching enabled. Latency is the P50 of a single direct-answer query call; throughput is sustained requests per second at batch 64.

Jetson AGX Orin
Edge · 32 GB · Moondream 2
543ms
P50 · batch 1
3.66req/s
sustained · batch 64
NVIDIA L4
Workstation · Moondream 3
358ms
P50 · batch 1
4.85req/s
sustained · batch 64
NVIDIA A10
Cloud · Ampere · Moondream 2
223ms
P50 · batch 1
6.83req/s
sustained · batch 64
NVIDIA A100 80GB
Cloud · Ampere · Moondream 2
104ms
P50 · batch 1
21.36req/s
sustained · batch 64
NVIDIA L40S
Server · Moondream 3
121ms
P50 · batch 1
18.81req/s
sustained · batch 64
Source: kestrel/PERFORMANCE.md · Moondream 3 requires sm89+, so Ampere parts (A100, A10, Jetson AGX Orin) report Moondream 2.
Available on
FAL
Self-hosted
Moondream Cloud
Photon
Same code. Edge, workstation, server.
Read the Photon docs
python
import moondream as md
from PIL import Image

# Initialize with local GPU inference
model = md.vl(api_key="YOUR_API_KEY", local=True)

# Load an image
image = Image.open("path/to/image.jpg")

# Generate a caption
caption = model.caption(image)["caption"]
print("Caption:", caption)
04
Step 4 · Keep it running

Launch is just the start

One vendor for the full stack. Models drift. Engineers leave. New use cases appear. With stitched-together vendors, nobody owns the outage. With Moondream, we do.

Competitor stack
  • Model vendor (weights only)
  • Fine-tuning vendor (your data goes elsewhere)
  • Inference provider (different SLA)
  • Your on-call engineer (owns everything)
Moondream
  • Model, weights, and roadmap
  • Lens fine-tuning and evals
  • Photon and Moondream Cloud
  • One team on call, 24/7 on enterprise plans
See Plans
Two ways to start

Try the open model. Or talk to us about production.

The model is free, open, and the fastest way to see if Moondream fits. If you already know it does, we can skip ahead and talk about fine-tuning, inference, and a support plan.

Try in the playground

Moondream is trusted by

CalPoly
CalPoly