QuickSilver Pro Docs
OpenAI-compatible inference API. Open-source LLMs (DeepSeek V4 Flash & Pro, V3, R1, Qwen 3.6 & 3.5-35B-A3B, Kimi K2.6), Whisper 1 for audio transcription, plus Google's Gemini 2.5 / 3 family for multimodal chat and image generation. Listed 20% below OpenRouter, Together AI, Fireworks, and DeepInfra on every shared open-source model; 15% below Google list on Gemini.
What is QuickSilver Pro?
QSP exposes an OpenAI-compatible chat-completions endpoint. Point the official openai SDK (Python, Node, Swift, or any compatible client) at https://api.quicksilverpro.io/v1, supply your QSP API key, and the chat-completions surface works unchanged — streaming, tool calling, json_schema strict mode, and usage accounting are all preserved. Other SDK surfaces (the Responses API, client.embeddings.create, client.images.generate, the Assistants API) are not in scope; image-generation Gemini models still go through standard chat-completions calls.
The catalog focuses on open-source LLMs plus Google's Gemini family — chat, coding, reasoning, multimodal, and image generation. We do not currently serve OpenAI's closed models or Anthropic Claude; for those, stay on your existing provider.
Start here
- Quickstart — first call in Python, Node, Swift, and curl.
- Models — IDs, context windows, pricing, when to use which.
- Audio transcription — Whisper 1 through the OpenAI-compatible
/v1/audio/transcriptionsendpoint. - Rate limits — default per-key throughput and how to request more.
- Streaming — SSE chunk format and client-side handling.
- Tool calling — function calling, parallel tool calls, JSON-arg streaming.
- Structured output —
json_schemastrict mode for typed responses. - Errors — status codes, common gotchas, and how to retry safely.
Conventions used in these docs
- Code examples use the official OpenAI SDK for each language. The QSP-specific change is always the
base_urland the API key — model IDs are listed in Models. - Pricing is per 1M tokens, USD. See the homepage pricing table for the current rates.
- Anything not documented behaves like the OpenAI chat-completions endpoint. If you find a divergence from OpenAI semantics that isn't documented here, it's a bug — report it.