QuickSilver Pro system status
Services
Model availability
deepseek-v4-flashdeepseek-v4-prodeepseek-v3deepseek-r1qwen3.7-maxqwen3.6-plusqwen3.6-35bqwen3.5-35bkimi-k2.6Roadmap - how we become a real inference company
Now - launched on a curated catalog
LiveCustomers save 20% today on a curated catalog at low list prices. The narrow operational surface is what keeps the gap honest, and the same surface scales straight into Phase 2.
Q2 2026 - our own inference stack on H100/H200
PlannedSelf-hosted serving on dedicated GPUs using SGLang + continuous batching, EAGLE-3 speculative decoding, FP8 quantization via DeepGEMM, and SageAttention / ThunderMLA custom kernels. At that point system_fingerprint becomes stable (it changes only when we rev the stack), and repeatable-seed workflows start working properly. Target: 30-50% below current prices on DeepSeek V3.
H2 2026 - colocated data center + AIDC partnerships
FutureMove from rented (Vast.ai) to self-owned or colocated racks. Partner with AI-datacenter operators where that makes sense. The goal is the cheapest reliable inference for open-source models on the planet - full stack, our engineering.
About this page
Service rows run client-side probes from your browser. Model rows reflect a real 1-token probe sent server-side every 3 minutes from our backend. Historical bars show the results of recent probes stored in this browser's localStorage; cleared if you switch devices.
Public uptime tracking began 2026-04-16. For a contractual SLA and third-party-monitored history, contact us.