Skip to main content
USDper 1M tokens
ModelWindowInputCachedOutput
Kimi-K2.6
moonshotai/Kimi-K2.6
Priority0.450.203.00
ASAP1.000.204.00
GLM-5.1
zai-org/GLM-5.1-FP8
Standard0.500.122.50
Priority0.700.183.00
Flex0.400.081.80
ASAP1.400.264.40
gpt-oss-120b
openai/gpt-oss-120b
Priority0.040.020.30
ASAP0.060.030.40
Gemma 4 31B IT
google/gemma-4-31B-it
Standard0.120.080.60
Flex0.060.020.30
ASAP0.400.200.60
Gemma 4 31B IT (NVFP4)
nvidia/Gemma-4-31B-IT-NVFP4
Standard0.070.050.40
ASAP0.140.070.40
DeepSeek V4 Pro
deepseek-ai/DeepSeek-V4-Pro
ASAP1.750.154.50
MiniMax M2.7
MiniMaxAI/MiniMax-M2.7
ASAP0.300.061.20
  • Sail supports four completion windows: standard, priority, flex, and asap. See Completion Windows for details.
    • Not all models support all windows. We regularly bring up new models and expand completion window support for existing ones based on demand. If you have a need that’s not represented above, get in touch.
  • Prompt caching is implicit, based on prefix matching. Optionally, you may use prompt_cache_key as a routing hint to help maximize cache hit rates.
  • See Models for capabilities and other details on supported models.
  • To see what these rates add up to on a full agent workload, use the agent cost calculator.