DeepInfra (@DeepInfra) / X

DeepInfra

648 posts

DeepInfra

@DeepInfra

Fast ML inference. Run top AI models using a simple API.

Palo Alto

Joined February 2023

DeepInfra
@DeepInfra
Jul 24, 2025
🔥Price drop Qwen3‑Coder‑480B‑A35B‑Instruct: • $0.40/M input tokens • $1.60/M output tokens One of the best open coding models — now more accessible via DeepInfra. (Best prices, as always.)
29K
DeepInfra
@DeepInfra
Dec 15, 2023
Just got some new GPUs so we are lowering our prices for 7b modes to $0.13 / 1M tokens. We also offer the lower price for the latest mixtral-8x7b of $0.27 / 1M tokens. Deep Infra will always provide the most cost effective inference service.
281K
DeepInfra
@DeepInfra
Jul 28, 2025
GLM-4.5 is here — latest drop from @Zai_org 🚀 Built for agentic workflows: reasoning, coding, tools. ✅ GLM-4.5 → 355B total / 32B active → $0.60 / $2.20 per Mtoken ✅ GLM-4.5-Air → 106B total / 12B active → $0.20 / $1.10 Smart models, smart prices. Cheapest at DeepInfra!
16K
DeepInfra
@DeepInfra
Aug 3, 2025
🚀OlmOCR on DeepInfra🚀 🔥 New LLM-based OCR model by @allen_ai 💸 Scrape 1000-page PDFs for just $0.15 📊 300x cheaper than competitor price
9.6K
DeepInfra
@DeepInfra
Jul 15, 2025
Moonshot AI's Kimi 2 is now live on DeepInfra, as always at the best price of $0.55/$2.20, full tool call and context support. Best open source non-reasoning model available according to multiple benchmarks. Running on Nvidia Blackwell🇺🇸.
15K
DeepInfra
@DeepInfra
Jul 16, 2025
Up to 100 tps Moonshot AI's Kimi K2, as always at the best price of $0.55/$2.20. Zero data retention, generous rate limits.
12K
DeepInfra
@DeepInfra
Aug 14, 2025
Qwen3‑Coder now 200 TPS on DeepInfra at the best prices of $0.30/M input and $1.20/M output
35K
DeepInfra
@DeepInfra
Jan 28, 2025
Deepseek R1 is now live on the DeepInfra inference platform. 🌎 Hosted in the US with zero data retention. 💸 Always the best price: $0.85/$2.50 per 1M in/out tokens. Get started now!
21K
DeepInfra
@DeepInfra
Jun 16, 2025
Claude 4 Opus & Sonnet now live on DeepInfra. Run them via our OpenAI-compatible API — fast, scalable, and infra-friendly.
36K
DeepInfra
@DeepInfra
Jul 26, 2025
🚀 We now have a Turbo version of Qwen3‑Coder at $0.30/M input tokens $1.20/M output tokens. ⚡️Same accuracy (within 1% of original) ⚡️2× faster & cheaper One of the best open coding models - now faster & more affordable on DeepInfra 👇
6.4K
DeepInfra
@DeepInfra
Jun 17, 2025
Gemini 2.5 Pro & Flash are now live on DeepInfra. OpenAI-compatible API. Full control over reasoning. ⚡ Flash: $0.105 / $2.45 🚀 Pro: $0.875 / $7.00 Cheapest on the market (prove us wrong).
9.8K
DeepInfra
@DeepInfra
Jul 23, 2025
Qwen3-235B-A22B-Instruct-2507 is now live on DeepInfra. 🔧 Upgraded version of the original 235B “non-thinking” model 🧠 Better at reasoning, math, comprehension, tool use 💰 $0.13 / $0.60 per Mtoken (in/out) The scale is real. #Qwen3 #LLMs #InferenceInfra #DeepInfra
7.5K
DeepInfra
@DeepInfra
Jun 2, 2025
We just broke 1000 TPS on our Llama 4 Maverick Turbo API endpoint! Hosted in the US on @nvidia Blackwell, delivering blazing-fast performance for your AI needs. Ready to scale?
7.4K
DeepInfra
@DeepInfra
Jan 31, 2025
We have the best prices for DeepSeek R1, the best AI model on the market, to just $0.75/$2.40 per 1M tokens. 🎉 🔥More capacity 🚀Higher rate limits 🇺🇸Hosted on H200 in the US and EU 🇪🇺 Unleash the Deep AI Infrastructure at scale.
6.2K