GPU-Bridge API Documentation

GPU-Bridge is an orchestration layer for AI workloads: one API for 30 services and 99 models across 6 GPU backends, with automatic failover, upfront pricing, and x402 payments for autonomous agents. Base URL: https://api.gpubridge.io

Start here: Use POST /run for the universal multi-modal contract, POST /inference for dedicated LLM requests, and /mcp if your client prefers MCP tools over raw HTTP.

Quick Start

First successful request in under 2 minutes. Choose the flow that matches how you build.

Developers: API key + credits

# 1. Register (free, instant)
curl -X POST https://api.gpubridge.io/account/register \
  -H "Content-Type: application/json" \
  -d '{"email":"you@example.com"}'
# Returns: {"api_key":"gpub_...", ...}

# 2. Add credits ($10 minimum)
curl -X POST https://api.gpubridge.io/account/topup \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"package":"credits_25"}'

# 3. Run any service
curl -X POST https://api.gpubridge.io/run \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -H "X-Priority: fast" \
  -H "Idempotency-Key: req_001" \
  -d '{"service":"llm-4090","input":{"prompt":"Hello world"}}'

# 4. Retrieve output (pass API key for full response)
curl https://api.gpubridge.io/status/{job_id} \
  -H "Authorization: Bearer gpub_your_key"
Copy

AI agents: x402 on Base

# 1. Call without auth to receive HTTP 402 payment details
curl -X POST https://api.gpubridge.io/run \
  -H "Content-Type: application/json" \
  -d '{"service":"ocr","input":{"image_url":"https://example.com/invoice.png"}}'

# 2. Send USDC on Base, then retry with X-Payment
curl -X POST https://api.gpubridge.io/run \
  -H "X-Payment: base64({\"txHash\":\"0x...\",\"from\":\"0xYourWallet\"})" \
  -H "Content-Type: application/json" \
  -d '{"service":"ocr","input":{"image_url":"https://example.com/invoice.png"}}'
Copy

When to use what: Use POST /run for the universal multi-modal contract, POST /inference for dedicated LLM requests, and /mcp if your client wants MCP tools instead of raw HTTP.

TypeScript example

const res = await fetch("https://api.gpubridge.io/run", {
  method: "POST",
  headers: {
    "Authorization": "Bearer gpub_your_key",
    "Content-Type": "application/json",
    "X-Priority": "fast",
    "Idempotency-Key": "req_001",
  },
  body: JSON.stringify({
    service: "llm-4090",
    input: { prompt: "Summarize this changelog" },
  }),
});

const job = await res.json();
console.log(job);
Copy

Python example

import requests

res = requests.post(
    "https://api.gpubridge.io/run",
    headers={
        "Authorization": "Bearer gpub_your_key",
        "Content-Type": "application/json",
        "X-Priority": "fast",
        "Idempotency-Key": "req_001",
    },
    json={
        "service": "llm-4090",
        "input": {"prompt": "Summarize this changelog"},
    },
    timeout=30,
)

print(res.json())
Copy

Authentication & Payments

GPU-Bridge supports two request-time access patterns and one account funding option:

Option A: API key + credits (developers)

Register once, top up credits, then send Authorization: Bearer gpub_... on each request. Best for apps, backends, and teams that want account balance, job history, refunds, and spending limits.

Authorization: Bearer gpub_your_api_key
Copy

Option B: x402 on Base (AI agents)

No account or API key required. Call the endpoint, receive HTTP 402 payment details, send USDC on Base, then retry with X-Payment. GPU-Bridge pre-validates the target service before consuming the payment proof. See x402 Protocol for the full flow.

X-Payment: base64({"txHash":"0x...","from":"0xYourWallet"})
Copy

Option C: Crypto top-up (account funding)

If you want account-based usage but prefer crypto, top up your credit balance with USDC on Base via POST /account/topup-crypto. This is a funding method for API-key usage, not a separate per-request auth flow. 0.5% fee vs 2.9% for card.

Media URLs: When providing audio_url or image_url, use direct CDN links (e.g. soundhelix.com, imgbb.com). GitHub raw and Wikimedia URLs often return 403 from compute nodes.

MCP Server

GPU-Bridge exposes a remote MCP endpoint at /mcp for Smithery and other MCP-compatible clients. Use MCP when you want tool-native AI compute instead of hand-writing HTTP calls.

POST /mcp

Remote MCP endpoint for tool discovery and tool execution

Tool	Description
`gpu_run`	Run any GPU-Bridge service
`gpu_catalog`	Browse the live service catalog with pricing and model info
`gpu_status`	Check job status and retrieve results
`gpu_balance`	Check balance, daily spend, and volume discount tier
`gpu_estimate`	Estimate the cost of a service before running it

Auth model: Tool discovery is open. Running account-backed tools uses the same API key you use for HTTP requests. If you need permissionless agent payments, use the x402 HTTP flow instead.

POST /run

Universal orchestration endpoint. Use it when you want one contract across text, image, video, audio, vision, OCR, embeddings, reranking, and document parsing. GPU-Bridge selects the best available backend for the service and can reroute automatically when a backend degrades.

POST /run

Requires API key + credits or x402

Request body

{
  "service": "llm-4090",
  "input": { "prompt": "Explain quantum computing", "max_tokens": 512 },
  "webhook_url": "https://your-server.com/callback"  // optional
}
Copy

Optional headers

Header	Values	Description
`X-Priority`	`fast` \| `cheap`	Routing hint. `fast` prefers the lowest-latency healthy backend. `cheap` prefers the lowest-cost healthy backend.
`Idempotency-Key`	Any unique string	Prevents duplicate jobs for API-key / credit-based requests. Reuse the same key when retrying after a network error.

Routing: GPU-Bridge abstracts the provider layer. The same service key may route to different healthy backends over time. Circuit breakers remove degraded backends automatically.

Idempotency note: Idempotency-Key currently applies to API-key / credit-based requests. x402 clients should follow the x402 retry flow using the same request body and payment proof semantics.

Response (202)

{
  "job_id": "a1b2c3d4-...",
  "service": "llm-4090",
  "status": "pending",
  "status_url": "/status/a1b2c3d4-...",
  "estimated_cost_usd": 0.003
}
Copy

POST /inference

Dedicated LLM route with a simpler body than POST /run. Use this when you only need text generation and want a prompt-centric schema. GPU-Bridge still routes across available LLM backends behind the scenes.

POST /inference

Requires API key + credits or x402

Request body

{
  "model": "deepseek-ai/DeepSeek-V3.2",
  "prompt": "Summarize this changelog in 5 bullets",
  "system": "You are a concise release assistant.",
  "max_tokens": 512,
  "temperature": 0.2,
  "gpu": "4090"
}
Copy

Use GET /catalog for the live list of supported model IDs. The gpu field selects the GPU tier, not the provider.

Response

{
  "job_id": "a1b2c3d4-...",
  "status": "completed",
  "status_url": "/status/a1b2c3d4-...",
  "output": { "text": "..." },
  "estimated_cost_usd": 0.0032,
  "execution_time_seconds": 0.42
}
Copy

Behavior: Low-latency LLM backends often return inline with status: "completed". If GPU-Bridge routes to a non-inline path, you may receive pending and poll GET /status/:job_id.

GET /status/:job_id

Status endpoint for submitted jobs. For credit-based jobs, pass your API key to retrieve full output. For x402 jobs, the job_id itself acts as the retrieval token; treat it like a secret. Without the matching credential/token, you receive status, timing, and hints only.

Poll until status is completed or failed. Some low-latency routes return inline in the initial response and may not require polling at all.

{
  "id": "a1b2c3d4-...",
  "status": "completed",
  "output": { "text": "Quantum computing leverages..." },
  "execution_time_seconds": 0.45
}
Copy

Security note: For x402 jobs, anyone holding the job_id can retrieve the result. Do not log or expose x402 job IDs publicly.

GET /catalog

Public — live source of truth for services, model availability, pricing, input schemas, dedicated routes, and payment metadata

GET /catalog/estimate

GET /catalog/estimate?service=llm-4090&seconds=30

Public — pre-flight cost estimator. For credit-based requests, the final net charge can be lower after reconciliation.

{
  "service": "llm-4090",
  "estimated_seconds": 25,
  "price_per_second": 0.0024,
  "estimated_cost_usd": 0.06
}
Copy

Use this before submission when you want an upfront cost reference. For credit-based requests, final net cost may be lower after execution is reconciled. For x402, request validity is checked before payment is consumed.

GET /account/balance

Returns balance, daily spend, volume discount tier

curl https://api.gpubridge.io/account/balance \
  -H "Authorization: Bearer gpub_your_key"
Copy

{
  "balance": 8.50,
  "email": "you@example.com",
  "daily_spend": 1.50,
  "daily_limit": 50,
  "volume_discount": { "tier": "Standard", "discount_percent": 0 }
}
Copy

POST /account/topup

Create Stripe checkout session for credit card top-up

Package	Price	Credits	Bonus
`credits_10`	$10	$10.00	—
`credits_25`	$25	$26.25	+5%
`credits_50`	$50	$55.00	+10%
`credits_100`	$100	$115.00	+15%

curl -X POST https://api.gpubridge.io/account/topup \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"package":"credits_25"}'
# Returns: {"checkout_url":"https://checkout.stripe.com/..."}
Copy

POST /account/topup-crypto

Top up with USDC on Base. Same packages, 0.5% fee

curl -X POST https://api.gpubridge.io/account/topup-crypto \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"package":"credits_25"}'
# Returns payment address — send USDC, credits added automatically
Copy

GET /account/jobs

GET /account/jobs?limit=50&offset=0

Job history with costs, refunds, execution times

POST /account/spending-limit

Set daily spending limit ($1–$10,000). Default: $50/day

curl -X POST https://api.gpubridge.io/account/spending-limit \
  -H "Authorization: Bearer gpub_your_key" \
  -H "Content-Type: application/json" \
  -d '{"daily_limit":100}'
Copy

POST /account/auto-topup

Automatically buy credits when balance drops below threshold

'{"enabled":true,"threshold":1.00,"package":"credits_10"}'
Copy

API Key Management

Self-service key recovery is intentionally limited for security. If you still have a valid key, rotate it with POST /account/regenerate-key. If you lost your key, use POST /account/recover to get the support-assisted recovery flow.

POST /account/recover

Support-assisted recovery. Always returns a generic response to prevent account enumeration.

POST /account/regenerate-key

Rotate your API key using the current valid key. The old key stops working immediately.

GET /account/dashboard

Web dashboard for balance, top-ups, job history, and key rotation. Sign in with email + API key.

Tip: Bookmark api.gpubridge.io/account/dashboard for quick access to your balance, top-ups, and job history.

Live catalog: The tables below show representative services. For the current live list of service keys, model IDs, pricing, and availability, use GET /catalog.

Text & Intelligence

Representative text, embedding, vision-language, and reranking services

Service	Key	Input	From
LLM Inference 33 models across 6 backends	`llm-4090`	`{"prompt":"...","model":"deepseek-ai/DeepSeek-V3.2","max_tokens":512}`	$0.003
Text Embeddings 1024-dim vectors	`embedding-l4`	`{"text":"..."}`	$0.01
Visual Q&A Moondream2	`llava-4090`	`{"image_url":"...","prompt":"..."}`	$0.05
Image Captioning BLIP	`caption`	`{"image_url":"..."}`	$0.01
CLIP Interrogator Image to text prompt	`clip`	`{"image_url":"..."}`	$0.02
Document Reranking Jina, for RAG	`rerank`	`{"query":"...","documents":["..."],"top_n":3}`	$0.001

LLM models: 33 models across Groq, Together, DeepInfra, Fireworks, and Replicate fallback. Representative families include DeepSeek V3.2 / V3.1 / R1, Qwen3 32B / 235B / Coder 480B, Llama 4 Scout / Maverick, Kimi K2 / K2.5, GLM-5, and GPT-OSS 120B / 20B. Use GET /catalog for the live list.

Image & Video

Representative image generation, editing, and video services

Service	Key	Input	From
Image Generation 13 models across Together + Replicate	`image-4090`	`{"prompt":"...","model":"black-forest-labs/FLUX.2-dev"}`	$0.003
Video Generation	`video`	`{"prompt":"...","image_url":"..."}`	$0.30
Inpainting	`inpaint`	`{"image_url":"...","mask_url":"...","prompt":"..."}`	$0.04
ControlNet	`controlnet`	`{"image_url":"...","prompt":"..."}`	$0.05
Image-to-Image	`img2img`	`{"image_url":"...","prompt":"..."}`	$0.04
Image Variations	`image-variation`	`{"image_url":"..."}`	$0.04
AI Portraits	`photomaker`	`{"image_url":"...","prompt":"..."}`	$0.05
Sticker Maker	`sticker`	`{"image_url":"...","prompt":"..."}`	$0.02
Product Ads	`ad-inpaint`	`{"image_url":"...","prompt":"..."}`	$0.05
Image Animation	`animate`	`{"image_url":"..."}`	$0.10
Video Enhancement Up to 4K	`video-enhance`	`{"video_url":"...","resolution":"1080p","fps":60}`	$0.50

Image models: 13 models across Together and Replicate. Representative models include FLUX 2 Dev / Pro, FLUX Schnell, Imagen 4 Fast / Ultra, Seedream 3.0 / 4.0, FLUX 1.1 Pro, SD 3.5 Large, SDXL Lightning, and Playground v2.5.

Audio & Speech

Representative speech, voice, and music services

Service	Key	Input	From
Speech-to-Text Whisper, sub-second	`whisper-l4`	`{"audio_url":"..."}`	$0.05
Diarized STT Speaker separation	`whisperx`	`{"audio_url":"..."}`	$0.05
Text-to-Speech 40+ voices	`tts-l4`	`{"text":"...","voice":"af_alloy"}`	$0.02
Expressive TTS XTTS v2, sound effects	`bark`	`{"text":"Hello [laughter]"}`	$0.03
Music Generation	`musicgen-l4`	`{"prompt":"...","duration_seconds":10}`	$0.05
Voice Cloning	`voice-clone`	`{"audio_url":"..."}`	$0.10

TTS voices: af_alloy, af_nova, af_sky, am_adam, am_echo, am_onyx, bf_emma, bm_george, and 30+ more. See catalog for full list.

Utilities & Document

Representative OCR, document, moderation, and image utility services

Service	Key	Input	From
Background Removal	`rembg-l4`	`{"image_url":"..."}`	$0.01
Image Upscale 2x or 4x	`upscale-l4`	`{"image_url":"...","scale":4}`	$0.04
Face Restoration CodeFormer	`face-restore`	`{"image_url":"..."}`	$0.02
OCR Florence-2	`ocr`	`{"image_url":"..."}`	$0.01
Segmentation SAM-2	`segmentation`	`{"image_url":"..."}`	$0.02
PDF/Doc Parsing Marker	`pdf-parse`	`{"file_url":"...","mode":"fast"}`	$0.05
NSFW Detection Content moderation	`nsfw-detect`	`{"image_url":"..."}`	$0.005

Pricing & Billing

Pay per request. No monthly minimums and zero cost when idle.
Estimate before you run: call GET /catalog/estimate or inspect GET /catalog for live pricing.
Credit-based requests: some services are pre-charged from an estimate and reconciled after completion. Excess is returned automatically to your credit balance.
Failure behavior: credit-based requests are refunded automatically on failure. x402 requests are pre-validated before the payment proof is consumed.
Volume discounts: 5% at $100 spent, 10% at $500, 15% at $1,000+.

GPU-Bridge presents one pricing surface even when requests route across different backends behind the scenes.

x402 Protocol

For AI agent developers building autonomous clients. Spec-compliant with x402.org.

Pre-validation: GPU-Bridge validates the target service before consuming the payment proof. If the service key is invalid, fix the request and retry without sending a new payment.

Parameter	Value
Network	Base (Chain ID 8453)
Asset	USDC (6 decimals)
Contract	`0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913`
Recipient	`0xB0FdC6030B9f30652e8B221B8090d443Dd3C6381`
Payment window	300 seconds (5 minutes)
Confirmations	1 (≤$10), 5 (>$10)
Anti-replay	Each tx hash used once
OFAC screening	Chainalysis Sanctions Oracle

Flow

Call any endpoint without auth → receive HTTP 402
Parse accepts[0].maxAmountRequired (USDC units) and accepts[0].payTo
Send USDC on Base to payTo (amount ≥ maxAmountRequired)
Base64-encode {"txHash":"0x...","from":"0x..."} as X-Payment header
Retry the same request → job executes

402 response format

{
  "x402Version": 1,
  "accepts": [{
    "scheme": "exact",
    "network": "base",
    "maxAmountRequired": "10000",  // varies per service
    "payTo": "0xB0FdC6030B9f30652e8B221B8090d443Dd3C6381",
    "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
    "maxTimeoutSeconds": 300
  }]
}
Copy

Error Format

GPU-Bridge does not yet use a single rigid error envelope for every endpoint. Treat error responses as a common base shape with optional helper fields.

{
  "error": "Unknown service: foo",
  "hint": "Use GET /catalog to see all available services.",
  "details": [],
  "available_services": ["llm-4090", "image-4090", "..."]
}
Copy

error — always present, human-readable message
hint — optional remediation text
details — optional validation issues, common on 400 responses
available_services — optional list when a service key is invalid

Compatibility note: Clients should key off HTTP status first, then parse error as required and treat all other fields as optional.

Important Notes

Media URLs must be publicly accessible. Use CDN links (soundhelix.com, imgbb.com). GitHub raw and Wikimedia URLs often return 403 from compute nodes.
Webhooks are delivered via POST with 3 retries and exponential backoff. URL must be HTTPS and must not point to private networks.
Job results are retained for 72 hours after completion. Older jobs keep billing metadata, but their input/output payloads are purged.
Rate limits apply at both the API key and IP layers. Build retries with backoff and use Idempotency-Key for credit-based requests.
Failover is automatic. GPU-Bridge can reroute across healthy backends and open circuit breakers when a backend degrades.
x402 job IDs act as retrieval tokens for x402 results. Treat them like secrets and do not expose them publicly.
Live availability changes over time. Use GET /catalog instead of hardcoding service or model lists from this page.