GPU-Bridge API Documentation

GPU-Bridge is an orchestration layer for AI workloads: one API for 30 services and 99 models across 6 GPU backends, with automatic failover, upfront pricing, and x402 payments for autonomous agents. Base URL: https://api.gpubridge.io

Start here: Use POST /run for the universal multi-modal contract, POST /inference for dedicated LLM requests, and /mcp if your client prefers MCP tools over raw HTTP.

Quick Start

First successful request in under 2 minutes. Choose the flow that matches how you build.

Developers: API key + credits

# 1. Register (free, instant)
curl -X POST https://api.gpubridge.io/account/register \
  -H "Content-Type: application/json" \
  -d '{"email":"you@example.com"}'
# Returns: {"api_key":"gpub_...", ...}

# 2. Add credits ($10 minimum)
curl -X POST https://api.gpubridge.io/account/topup \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"package":"credits_25"}'

# 3. Run any service
curl -X POST https://api.gpubridge.io/run \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -H "X-Priority: fast" \
  -H "Idempotency-Key: req_001" \
  -d '{"service":"llm-4090","input":{"prompt":"Hello world"}}'

# 4. Retrieve output (pass API key for full response)
curl https://api.gpubridge.io/status/{job_id} \
  -H "Authorization: Bearer gpub_your_key"
Copy

AI agents: x402 on Base

# 1. Call without auth to receive HTTP 402 payment details
curl -X POST https://api.gpubridge.io/run \
  -H "Content-Type: application/json" \
  -d '{"service":"ocr","input":{"image_url":"https://example.com/invoice.png"}}'

# 2. Send USDC on Base, then retry with X-Payment
curl -X POST https://api.gpubridge.io/run \
  -H "X-Payment: base64({\"txHash\":\"0x...\",\"from\":\"0xYourWallet\"})" \
  -H "Content-Type: application/json" \
  -d '{"service":"ocr","input":{"image_url":"https://example.com/invoice.png"}}'
Copy
When to use what: Use POST /run for the universal multi-modal contract, POST /inference for dedicated LLM requests, and /mcp if your client wants MCP tools instead of raw HTTP.

TypeScript example

const res = await fetch("https://api.gpubridge.io/run", {
  method: "POST",
  headers: {
    "Authorization": "Bearer gpub_your_key",
    "Content-Type": "application/json",
    "X-Priority": "fast",
    "Idempotency-Key": "req_001",
  },
  body: JSON.stringify({
    service: "llm-4090",
    input: { prompt: "Summarize this changelog" },
  }),
});

const job = await res.json();
console.log(job);
Copy

Python example

import requests

res = requests.post(
    "https://api.gpubridge.io/run",
    headers={
        "Authorization": "Bearer gpub_your_key",
        "Content-Type": "application/json",
        "X-Priority": "fast",
        "Idempotency-Key": "req_001",
    },
    json={
        "service": "llm-4090",
        "input": {"prompt": "Summarize this changelog"},
    },
    timeout=30,
)

print(res.json())
Copy

Authentication & Payments

GPU-Bridge supports two request-time access patterns and one account funding option:

Option A: API key + credits (developers)

Register once, top up credits, then send Authorization: Bearer gpub_... on each request. Best for apps, backends, and teams that want account balance, job history, refunds, and spending limits.

Authorization: Bearer gpub_your_api_key
Copy

Option B: x402 on Base (AI agents)

No account or API key required. Call the endpoint, receive HTTP 402 payment details, send USDC on Base, then retry with X-Payment. GPU-Bridge pre-validates the target service before consuming the payment proof. See x402 Protocol for the full flow.

X-Payment: base64({"txHash":"0x...","from":"0xYourWallet"})
Copy

Option C: Crypto top-up (account funding)

If you want account-based usage but prefer crypto, top up your credit balance with USDC on Base via POST /account/topup-crypto. This is a funding method for API-key usage, not a separate per-request auth flow. 0.5% fee vs 2.9% for card.

Media URLs: When providing audio_url or image_url, use direct CDN links (e.g. soundhelix.com, imgbb.com). GitHub raw and Wikimedia URLs often return 403 from compute nodes.

MCP Server

GPU-Bridge exposes a remote MCP endpoint at /mcp for Smithery and other MCP-compatible clients. Use MCP when you want tool-native AI compute instead of hand-writing HTTP calls.

POST /mcp
Remote MCP endpoint for tool discovery and tool execution
ToolDescription
gpu_runRun any GPU-Bridge service
gpu_catalogBrowse the live service catalog with pricing and model info
gpu_statusCheck job status and retrieve results
gpu_balanceCheck balance, daily spend, and volume discount tier
gpu_estimateEstimate the cost of a service before running it
Auth model: Tool discovery is open. Running account-backed tools uses the same API key you use for HTTP requests. If you need permissionless agent payments, use the x402 HTTP flow instead.

POST /run

Universal orchestration endpoint. Use it when you want one contract across text, image, video, audio, vision, OCR, embeddings, reranking, and document parsing. GPU-Bridge selects the best available backend for the service and can reroute automatically when a backend degrades.

POST /run
Requires API key + credits or x402

Request body

{
  "service": "llm-4090",
  "input": { "prompt": "Explain quantum computing", "max_tokens": 512 },
  "webhook_url": "https://your-server.com/callback"  // optional
}
Copy

Optional headers

HeaderValuesDescription
X-Priorityfast | cheapRouting hint. fast prefers the lowest-latency healthy backend. cheap prefers the lowest-cost healthy backend.
Idempotency-KeyAny unique stringPrevents duplicate jobs for API-key / credit-based requests. Reuse the same key when retrying after a network error.
Routing: GPU-Bridge abstracts the provider layer. The same service key may route to different healthy backends over time. Circuit breakers remove degraded backends automatically.
Idempotency note: Idempotency-Key currently applies to API-key / credit-based requests. x402 clients should follow the x402 retry flow using the same request body and payment proof semantics.

Response (202)

{
  "job_id": "a1b2c3d4-...",
  "service": "llm-4090",
  "status": "pending",
  "status_url": "/status/a1b2c3d4-...",
  "estimated_cost_usd": 0.003
}
Copy

POST /inference

Dedicated LLM route with a simpler body than POST /run. Use this when you only need text generation and want a prompt-centric schema. GPU-Bridge still routes across available LLM backends behind the scenes.

POST /inference
Requires API key + credits or x402

Request body

{
  "model": "deepseek-ai/DeepSeek-V3.2",
  "prompt": "Summarize this changelog in 5 bullets",
  "system": "You are a concise release assistant.",
  "max_tokens": 512,
  "temperature": 0.2,
  "gpu": "4090"
}
Copy

Use GET /catalog for the live list of supported model IDs. The gpu field selects the GPU tier, not the provider.

Response

{
  "job_id": "a1b2c3d4-...",
  "status": "completed",
  "status_url": "/status/a1b2c3d4-...",
  "output": { "text": "..." },
  "estimated_cost_usd": 0.0032,
  "execution_time_seconds": 0.42
}
Copy
Behavior: Low-latency LLM backends often return inline with status: "completed". If GPU-Bridge routes to a non-inline path, you may receive pending and poll GET /status/:job_id.

GET /status/:job_id

GET /status/:job_id
Status endpoint for submitted jobs. For credit-based jobs, pass your API key to retrieve full output. For x402 jobs, the job_id itself acts as the retrieval token; treat it like a secret. Without the matching credential/token, you receive status, timing, and hints only.

Poll until status is completed or failed. Some low-latency routes return inline in the initial response and may not require polling at all.

{
  "id": "a1b2c3d4-...",
  "status": "completed",
  "output": { "text": "Quantum computing leverages..." },
  "execution_time_seconds": 0.45
}
Copy
Security note: For x402 jobs, anyone holding the job_id can retrieve the result. Do not log or expose x402 job IDs publicly.

GET /catalog

GET /catalog
Public — live source of truth for services, model availability, pricing, input schemas, dedicated routes, and payment metadata

GET /catalog/estimate

GET /catalog/estimate?service=llm-4090&seconds=30
Public — pre-flight cost estimator. For credit-based requests, the final net charge can be lower after reconciliation.
{
  "service": "llm-4090",
  "estimated_seconds": 25,
  "price_per_second": 0.0024,
  "estimated_cost_usd": 0.06
}
Copy

Use this before submission when you want an upfront cost reference. For credit-based requests, final net cost may be lower after execution is reconciled. For x402, request validity is checked before payment is consumed.

GET /account/balance

GET /account/balance
Returns balance, daily spend, volume discount tier
curl https://api.gpubridge.io/account/balance \
  -H "Authorization: Bearer gpub_your_key"
Copy
{
  "balance": 8.50,
  "email": "you@example.com",
  "daily_spend": 1.50,
  "daily_limit": 50,
  "volume_discount": { "tier": "Standard", "discount_percent": 0 }
}
Copy

POST /account/topup

POST /account/topup
Create Stripe checkout session for credit card top-up
PackagePriceCreditsBonus
credits_10$10$10.00
credits_25$25$26.25+5%
credits_50$50$55.00+10%
credits_100$100$115.00+15%
curl -X POST https://api.gpubridge.io/account/topup \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"package":"credits_25"}'
# Returns: {"checkout_url":"https://checkout.stripe.com/..."}
Copy

POST /account/topup-crypto

POST /account/topup-crypto
Top up with USDC on Base. Same packages, 0.5% fee
curl -X POST https://api.gpubridge.io/account/topup-crypto \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"package":"credits_25"}'
# Returns payment address — send USDC, credits added automatically
Copy

GET /account/jobs

GET /account/jobs?limit=50&offset=0
Job history with costs, refunds, execution times

POST /account/spending-limit

POST /account/spending-limit
Set daily spending limit ($1–$10,000). Default: $50/day
curl -X POST https://api.gpubridge.io/account/spending-limit \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"daily_limit":100}'
Copy

POST /account/auto-topup

POST /account/auto-topup
Automatically buy credits when balance drops below threshold
'{"enabled":true,"threshold":1.00,"package":"credits_10"}'
Copy

API Key Management

Self-service key recovery is intentionally limited for security. If you still have a valid key, rotate it with POST /account/regenerate-key. If you lost your key, use POST /account/recover to get the support-assisted recovery flow.

POST /account/recover
Support-assisted recovery. Always returns a generic response to prevent account enumeration.
POST /account/regenerate-key
Rotate your API key using the current valid key. The old key stops working immediately.
GET /account/dashboard
Web dashboard for balance, top-ups, job history, and key rotation. Sign in with email + API key.
Tip: Bookmark api.gpubridge.io/account/dashboard for quick access to your balance, top-ups, and job history.
Live catalog: The tables below show representative services. For the current live list of service keys, model IDs, pricing, and availability, use GET /catalog.

Text & Intelligence

Representative text, embedding, vision-language, and reranking services
ServiceKeyInputFrom
LLM Inference
33 models across 6 backends
llm-4090{"prompt":"...","model":"deepseek-ai/DeepSeek-V3.2","max_tokens":512}$0.003
Text Embeddings
1024-dim vectors
embedding-l4{"text":"..."}$0.01
Visual Q&A
Moondream2
llava-4090{"image_url":"...","prompt":"..."}$0.05
Image Captioning
BLIP
caption{"image_url":"..."}$0.01
CLIP Interrogator
Image to text prompt
clip{"image_url":"..."}$0.02
Document Reranking
Jina, for RAG
rerank{"query":"...","documents":["..."],"top_n":3}$0.001
LLM models: 33 models across Groq, Together, DeepInfra, Fireworks, and Replicate fallback. Representative families include DeepSeek V3.2 / V3.1 / R1, Qwen3 32B / 235B / Coder 480B, Llama 4 Scout / Maverick, Kimi K2 / K2.5, GLM-5, and GPT-OSS 120B / 20B. Use GET /catalog for the live list.

Image & Video

Representative image generation, editing, and video services
ServiceKeyInputFrom
Image Generation
13 models across Together + Replicate
image-4090{"prompt":"...","model":"black-forest-labs/FLUX.2-dev"}$0.003
Video Generationvideo{"prompt":"...","image_url":"..."}$0.30
Inpaintinginpaint{"image_url":"...","mask_url":"...","prompt":"..."}$0.04
ControlNetcontrolnet{"image_url":"...","prompt":"..."}$0.05
Image-to-Imageimg2img{"image_url":"...","prompt":"..."}$0.04
Image Variationsimage-variation{"image_url":"..."}$0.04
AI Portraitsphotomaker{"image_url":"...","prompt":"..."}$0.05
Sticker Makersticker{"image_url":"...","prompt":"..."}$0.02
Product Adsad-inpaint{"image_url":"...","prompt":"..."}$0.05
Image Animationanimate{"image_url":"..."}$0.10
Video Enhancement
Up to 4K
video-enhance{"video_url":"...","resolution":"1080p","fps":60}$0.50
Image models: 13 models across Together and Replicate. Representative models include FLUX 2 Dev / Pro, FLUX Schnell, Imagen 4 Fast / Ultra, Seedream 3.0 / 4.0, FLUX 1.1 Pro, SD 3.5 Large, SDXL Lightning, and Playground v2.5.

Audio & Speech

Representative speech, voice, and music services
ServiceKeyInputFrom
Speech-to-Text
Whisper, sub-second
whisper-l4{"audio_url":"..."}$0.05
Diarized STT
Speaker separation
whisperx{"audio_url":"..."}$0.05
Text-to-Speech
40+ voices
tts-l4{"text":"...","voice":"af_alloy"}$0.02
Expressive TTS
XTTS v2, sound effects
bark{"text":"Hello [laughter]"}$0.03
Music Generationmusicgen-l4{"prompt":"...","duration_seconds":10}$0.05
Voice Cloningvoice-clone{"audio_url":"..."}$0.10
TTS voices: af_alloy, af_nova, af_sky, am_adam, am_echo, am_onyx, bf_emma, bm_george, and 30+ more. See catalog for full list.

Utilities & Document

Representative OCR, document, moderation, and image utility services
ServiceKeyInputFrom
Background Removalrembg-l4{"image_url":"..."}$0.01
Image Upscale
2x or 4x
upscale-l4{"image_url":"...","scale":4}$0.04
Face Restoration
CodeFormer
face-restore{"image_url":"..."}$0.02
OCR
Florence-2
ocr{"image_url":"..."}$0.01
Segmentation
SAM-2
segmentation{"image_url":"..."}$0.02
PDF/Doc Parsing
Marker
pdf-parse{"file_url":"...","mode":"fast"}$0.05
NSFW Detection
Content moderation
nsfw-detect{"image_url":"..."}$0.005

Pricing & Billing

GPU-Bridge presents one pricing surface even when requests route across different backends behind the scenes.

x402 Protocol

For AI agent developers building autonomous clients. Spec-compliant with x402.org.

Pre-validation: GPU-Bridge validates the target service before consuming the payment proof. If the service key is invalid, fix the request and retry without sending a new payment.
ParameterValue
NetworkBase (Chain ID 8453)
AssetUSDC (6 decimals)
Contract0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
Recipient0xB0FdC6030B9f30652e8B221B8090d443Dd3C6381
Payment window300 seconds (5 minutes)
Confirmations1 (≤$10), 5 (>$10)
Anti-replayEach tx hash used once
OFAC screeningChainalysis Sanctions Oracle

Flow

  1. Call any endpoint without auth → receive HTTP 402
  2. Parse accepts[0].maxAmountRequired (USDC units) and accepts[0].payTo
  3. Send USDC on Base to payTo (amount ≥ maxAmountRequired)
  4. Base64-encode {"txHash":"0x...","from":"0x..."} as X-Payment header
  5. Retry the same request → job executes

402 response format

{
  "x402Version": 1,
  "accepts": [{
    "scheme": "exact",
    "network": "base",
    "maxAmountRequired": "10000",  // varies per service
    "payTo": "0xB0FdC6030B9f30652e8B221B8090d443Dd3C6381",
    "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
    "maxTimeoutSeconds": 300
  }]
}
Copy

Error Format

GPU-Bridge does not yet use a single rigid error envelope for every endpoint. Treat error responses as a common base shape with optional helper fields.

{
  "error": "Unknown service: foo",
  "hint": "Use GET /catalog to see all available services.",
  "details": [],
  "available_services": ["llm-4090", "image-4090", "..."]
}
Copy
Compatibility note: Clients should key off HTTP status first, then parse error as required and treat all other fields as optional.

Important Notes