LLM + Embeddings
33 LLM models across 6 backends. DeepSeek, Qwen3, Llama 4, Kimi K2.5, GLM-5, MiniMax. Embeddings via BGE-M3, Qwen3-Embedding. Sub-second responses with smart routing.
Call one API for LLM, image, video, audio, vision, OCR, embeddings, reranking, and docs. GPU-Bridge routes across 6 backends with automatic failover. AI agents pay per request with x402 on Base.
Direct providers are great for one backend. GPU-Bridge is for teams that want one integration, automatic failover, and agent-native payments.
| GPU-Bridge | Direct provider | |
|---|---|---|
| One API across multiple backends | Yes | No |
| Automatic failover when a provider degrades | Yes | You build it |
| Hide provider choice from your users | Yes | No |
| x402 pay-per-request for AI agents | Yes | No |
| MCP tools for model-native integration | Yes | Varies |
| Upfront pricing + auto refund for credits | Yes | Varies |
If your agent can send HTTP and hold USDC on Base, it can pay for inference directly with x402. No signup, no API key. Or connect through MCP.
Send USDC on Base, include tx proof as X-Payment header. Instant on-chain settlement.
POST /run endpointConnect GPU-Bridge as an MCP server. Your LLM gets native tools for AI compute.
gpu_run
gpu_catalog
gpu_status
gpu_balance
gpu_estimate
Same POST /run endpoint, same auth, same status flow — across every modality
33 LLM models across 6 backends. DeepSeek, Qwen3, Llama 4, Kimi K2.5, GLM-5, MiniMax. Embeddings via BGE-M3, Qwen3-Embedding. Sub-second responses with smart routing.
FLUX 2, Imagen 4, Seedream, SD 3.5. Video from text or image. ControlNet, inpainting, outpainting, portraits, stickers, and product ads.
Whisper STT in under 1 second. TTS with 40+ voices. Music generation. Voice cloning. Speaker diarization.
Image captioning, visual Q&A, CLIP interrogation, OCR, segmentation (SAM 2), and NSFW detection.
PDF/DOCX/PPTX to structured markdown. Jina reranking for RAG. Background removal, face restoration, video upscale to 4K.
Multi-backend failover with circuit breakers. X-Priority: fast|cheap header. Idempotency keys. Webhook delivery with 3x retry.
Estimate cost before you send the request. No idle cost. No hidden fees.
Volume discounts: 5% at $100, 10% at $500, 15% at $1,000+. Full catalog with all 30 services →
Choose the path that matches how you build.
For teams integrating GPU-Bridge into their apps.
For autonomous agents that can hold USDC.
GPU-Bridge routes every job across multiple independent GPU backends. If the primary backend fails or times out, your request is automatically retried on the next available backend. Circuit breakers prevent routing to degraded backends. Credit-based accounts receive automatic refunds on failure. x402 requests are pre-validated before payment is consumed. Zero manual intervention needed.
Three options: (1) x402 USDC direct on Base — for AI agents, no account needed, pay per request with on-chain tx proof; (2) Crypto top-up — register an account, add credits with USDC (0.5% fee); (3) Credit card via Stripe — register with email, buy credits from $10 with bonus on larger packages. Volume discounts of 5-15% apply automatically.
LLM inference completes in under 1 second. Image generation takes 2-15 seconds. Video generation takes 60-300 seconds. Speech-to-text is faster than real-time. Most utility services complete in 2-10 seconds.
Yes. Any AI agent with USDC on Base can call any GPU-Bridge endpoint without an account. Include an x402 payment header and payment settles instantly on-chain. MCP tools are also available for model-native integration.
Yes. Every account has a configurable daily spending limit (default $50/day). Adjust via POST /account/spending-limit. Range: $1 to $10,000 per day. Prevents runaway costs from automated workflows.
Start free with an API key, or call GPU-Bridge today with x402 if you're building autonomous agents.