Pricing - Sail Research

USDper 1M tokens

Model	Window	Input	Cached	Output
Kimi-K2.6 `moonshotai/Kimi-K2.6`	Priority	0.45	0.20	3.00
Kimi-K2.6 `moonshotai/Kimi-K2.6`	ASAP	1.00	0.20	4.00
GLM-5.1 `zai-org/GLM-5.1-FP8`	Standard	0.50	0.12	2.50
	Priority	0.70	0.18	3.00
	Flex	0.40	0.08	1.80
	ASAP	1.40	0.26	4.40
gpt-oss-120b `openai/gpt-oss-120b`	Priority	0.04	0.02	0.30
gpt-oss-120b `openai/gpt-oss-120b`	ASAP	0.06	0.03	0.40
Gemma 4 31B IT `google/gemma-4-31B-it`	Standard	0.12	0.08	0.60
	Flex	0.06	0.02	0.30
	ASAP	0.40	0.20	0.60
Gemma 4 31B IT (NVFP4) `nvidia/Gemma-4-31B-IT-NVFP4`	Standard	0.07	0.05	0.40
Gemma 4 31B IT (NVFP4) `nvidia/Gemma-4-31B-IT-NVFP4`	ASAP	0.14	0.07	0.40
DeepSeek V4 Pro `deepseek-ai/DeepSeek-V4-Pro`	ASAP	1.75	0.15	4.50
MiniMax M2.7 `MiniMaxAI/MiniMax-M2.7`	ASAP	0.30	0.06	1.20

Sail supports four completion windows: standard, priority, flex, and asap. See Completion Windows for details.
- Not all models support all windows. We regularly bring up new models and expand completion window support for existing ones based on demand. If you have a need that’s not represented above, get in touch.
Prompt caching is implicit, based on prefix matching. Optionally, you may use prompt_cache_key as a routing hint to help maximize cache hit rates.
See Models for capabilities and other details on supported models.
To see what these rates add up to on a full agent workload, use the agent cost calculator.