Launch bonus

Open-source inference,20% below the rest.

The 9 most popular open-source models — DeepSeek V4 Flash & Pro, V3, R1, Qwen 3.7 Max + 3.6 Plus + 3.6 + 3.5, Kimi K2.6 — through an OpenAI-compatible API. Cheaper than every other reseller. Change one line of code.

or try the models live on HuggingFace — no signup required.

  • No subscription
  • OpenAI compatible
  • Pay as you go
  • Text + images
Drop-in with
  • OpenAI SDK
  • Aider
  • Cursor
  • Cline
  • Continue.dev
  • LangChain
  • Vercel AI SDK
python
1# One line change. That's it.
2from openai import OpenAI
3 
4client = OpenAI(
5 base_url="https://api.quicksilverpro.io/v1",
6 api_key="your-api-key",
7)
Pricing

Cheapest open-source inference

Per 1M tokens for text models · per image or per audio minute where noted.

Model
Context
Input
Output
Savings
deepseek-v4-flash
fast chat & coding, 1M context, thinking on by default
1M
$0.08$0.10
$0.16$0.20
−20%
deepseek-v4-pro
premium reasoning, 1M context
1M
$0.348$0.435
$0.696$0.87
−20%
deepseek-v3
chat, coding, structured output
128K
$0.16$0.20
$0.616$0.77
−20%
DeepSeek R1Reasoning
deepseek-r1
math, multi-step reasoning, logic
128K
$0.56$0.70
$2.00$2.50
−20%
qwen3.7-max
Qwen 3.7 flagship, agent / coding, 1M context
1M
$2.00$2.50
$6.00$7.50
−20%
qwen3.6-plus
thinks-by-default flagship, 1M context
1M
$0.26$0.325
$1.56$1.95
−20%
qwen3.6-35b
long-context RAG, drop-in 3.5 upgrade
262K
$0.12$0.15
$0.80$1.00
−20%
qwen3.5-35b
long-context RAG, summarization
262K
$0.111$0.139
$0.80$1.00
−20%
kimi-k2.6
Opus-class agentic / planning
256K
$0.584$0.73
$2.79$3.49
−20%
Claude Opus 4.8New
claude-opus-4-8
top-tier reasoning, coding, agentic
200K
$4.00$5.00
$20.00$25.00
−20%
Claude Opus 4.6New
claude-opus-4-6
deep reasoning and coding
200K
$4.00$5.00
$20.00$25.00
−20%
Claude Sonnet 4.6New
claude-sonnet-4-6
balanced mid-tier, fast & capable
200K
$2.40$3.00
$12.00$15.00
−20%
Claude Haiku 4.5New
claude-haiku-4-5
fast, low-cost, high-volume tasks
200K
$0.80$1.00
$4.00$5.00
−20%
Whisper Large V3 TurboNew
whisper-large-v3-turbo
fast audio transcription via /v1/audio/transcriptions
Audio
$0.0004/min$0.000667/min
−40%
flux.2-pro
flagship image generation
$0.027/img$0.031/img
−13%
flux.1-schnell
fast, high-volume image generation
$0.0025/img$0.003/img
−17%
gemini-2.5-flash
multimodal chat, 1M context
1M
$0.255$0.30
$2.125$2.50
−15%
Gemini 2.5 Flash ImageNew
gemini-2.5-flash-image
image generation
1M
$0.255$0.30
$25.50$30.00
−15%
gemini-2.5-flash-lite
high-volume cheap tasks
1M
$0.085$0.10
$0.34$0.40
−15%
gemini-3-flash-preview
next-gen flash reasoning
1M
$0.425$0.50
$2.55$3.00
−15%
Gemini 3 Pro Image PreviewNew
gemini-3-pro-image-preview
pro-grade image generation
1M
$1.70$2.00
$102.00$120.00
−15%
gemini-3.1-pro-preview
flagship reasoning, 1M context
1M
$1.70$2.00
$10.20$12.00
−15%
gemini-3.5-flash
next-gen Flash GA, 1M context
1M
$1.275$1.50
$7.65$9.00
−15%
gemini-2.5-pro
pro mid-tier reasoning, 1M context
1M
$1.0625$1.25
$8.50$10.00
−15%
gemini-3.1-flash-lite
newest low-cost workhorse, high volume
1M
$0.2125$0.25
$1.275$1.50
−15%

Text-model comparisons use OpenRouter, Together AI, and Fireworks AI. Image and audio rows show QSP list pricing as of May 2026.

Calculator

How much would you save?

Plug in your monthly usage — see the cost on QSP vs every competitor.

Common totals (10:1 input/output):
Thinking model — output token counts include the reasoning trace, which is typically 3-10× the visible reply.
1M
300K
QuickSilver Pro
$0.13cheapest
OpenRouter
$0.16+25%
OpenAIclosed model analog
$0.33+2.6×
QSP saves 3¢/month vs OpenRouter (20% cheaper).
CLIqsp

Built for terminals and AI agents. --json output with stable exit codes — Claude Code, Cursor, Aider can call it without parsing HTML.

FAQ

Common questions

An OpenAI-compatible HTTP API for 9 top open-source LLMs — DeepSeek V4 Flash & Pro, V3, R1, Qwen 3.7 Max, 3.6 Plus, 3.6 + 3.5-35B-A3B, and Kimi K2.6. Point the official OpenAI SDK at our base URL and get the same chat-completions interface, 20% below competing resellers.

V4 Flash is DeepSeek's newest model (released April 2026): ~74% cheaper output than V3, 1M context vs 128K, and thinks by default (chain-of-thought reasoning) — so a one-token "Hi" can return ~175 reasoning tokens. For V3-style cheap chat without the thinking overhead, pass `reasoning: { enabled: false }` in the request body. Existing V3 keeps working unchanged.

20% below the public per-token rates at OpenRouter, Together AI, Fireworks AI, and DeepInfra on the same open-source models. V4 Flash: $0.08 / $0.16. V4 Pro: $0.348 / $0.696. V3: $0.16 / $0.616. R1: $0.56 / $2.00. Qwen 3.7 Max: $2.00 / $6.00. Qwen 3.6 Plus: $0.26 / $1.56. Qwen 3.6: $0.12 / $0.80. Qwen 3.5: $0.111 / $0.80. Kimi K2.6: $0.584 / $2.79. We don't serve closed models (GPT-4, Claude).

Yes. Change base_url to https://api.quicksilverpro.io/v1 in the official openai Python / Node / Swift SDKs. Streaming, tool calling, json_schema strict mode, and usage.cost accounting all work out of the box.

Launch bonus: any first purchase between $5 and $50 doubles. Pay $5 and get $10. Pay $50 and get $100. One-time bonus on your first credit purchase. After that it's standard pay-as-you-go.

Start saving on inference today

Create an account, buy credits, get your API key in 30 seconds.

Get API Key