Introduction

QuickSilver Pro Docs

OpenAI-compatible inference API. Open-source LLMs (DeepSeek V4 Flash & Pro, V3, R1, Qwen 3.6 & 3.5-35B-A3B, Kimi K2.6), Whisper 1 for audio transcription, plus Google's Gemini 2.5 / 3 family for multimodal chat and image generation. Listed 20% below OpenRouter, Together AI, Fireworks, and DeepInfra on every shared open-source model; 15% below Google list on Gemini.

What is QuickSilver Pro?

QSP exposes an OpenAI-compatible chat-completions endpoint. Point the official openai SDK (Python, Node, Swift, or any compatible client) at https://api.quicksilverpro.io/v1, supply your QSP API key, and the chat-completions surface works unchanged — streaming, tool calling, json_schema strict mode, and usage accounting are all preserved. Other SDK surfaces (the Responses API, client.embeddings.create, client.images.generate, the Assistants API) are not in scope; image-generation Gemini models still go through standard chat-completions calls.

The catalog focuses on open-source LLMs plus Google's Gemini family — chat, coding, reasoning, multimodal, and image generation. We do not currently serve OpenAI's closed models or Anthropic Claude; for those, stay on your existing provider.

Start here

  • Quickstart — first call in Python, Node, Swift, and curl.
  • Models — IDs, context windows, pricing, when to use which.
  • Audio transcription — Whisper 1 through the OpenAI-compatible /v1/audio/transcriptions endpoint.
  • Rate limits — default per-key throughput and how to request more.
  • Streaming — SSE chunk format and client-side handling.
  • Tool calling — function calling, parallel tool calls, JSON-arg streaming.
  • Structured output json_schema strict mode for typed responses.
  • Errors — status codes, common gotchas, and how to retry safely.

Conventions used in these docs

  • Code examples use the official OpenAI SDK for each language. The QSP-specific change is always the base_url and the API key — model IDs are listed in Models.
  • Pricing is per 1M tokens, USD. See the homepage pricing table for the current rates.
  • Anything not documented behaves like the OpenAI chat-completions endpoint. If you find a divergence from OpenAI semantics that isn't documented here, it's a bug — report it.