Skip to content

Latest commit

 

History

History
719 lines (571 loc) · 36.3 KB

File metadata and controls

719 lines (571 loc) · 36.3 KB

API Reference

This document lists the local HTTP API exposed by ima2 serve.

Base URL:

http://localhost:3333

Provider Policy

Image generation supports OAuth, API-key, Grok, and Gemini (agy and gemini-api) providers.

  • provider: "oauth" uses the local Codex OAuth proxy.
  • provider: "api" uses the OpenAI Responses API with the hosted image_generation tool.
  • provider: "grok" uses the bundled progrok xAI proxy. Classic, Node, and Agent generation run mandatory xAI Web Search through /v1/responses, then run a grok-4.3 planner call with a forced local generate_image function, then ima2 executes xAI /v1/images/generations. If reference images, a Node parent image, or an Agent current image are attached, the final step switches to xAI /v1/images/edits so image-to-image context is preserved.
  • provider: "agy" spawns the Antigravity CLI (agy -p) to generate images via Google Gemini's default_api:generate_image tool. Model is nano-banana-2. Output is fixed at 1024×1024 JPEG. Max 3 reference images (i2i). No web search, quality, size, or mask controls. Multimode returns a single image. Video is unsupported (AGY_VIDEO_UNSUPPORTED).
  • provider: "grok-api" uses a direct xAI API key instead of the bundled progrok OAuth proxy. Same pipeline as grok (Web Search → planner → /v1/images/generations), same aspect ratio and resolution options. Requires an xAI API key configured via the web UI key management or XAI_API_KEY env var. Also supports video generation.
  • provider: "gemini-api" calls the Google Generative Language API directly (or Vertex AI with a service account JSON). Supports models nano-banana-2 (Gemini 3.1 Flash Image) and nano-banana-pro (Gemini 3 Pro Image). Supports variable aspect ratios (1:1 through 21:9) and four resolution tiers (512px, 1K, 2K, 4K) on both auth paths — the direct API path sends generation_config.response_format.image (snake_case) while the Vertex AI endpoint (aiplatform.googleapis.com) sends generationConfig.imageConfig (camelCase). With size: "auto" the image config is omitted entirely and the model decides ratio/size. Auth: GEMINI_API_KEY env var, web UI key management (/api/keys/gemini), or a Vertex AI service account JSON (VERTEX_SERVICE_ACCOUNT_JSON or /api/keys/vertex). When both Vertex credentials and an API key are configured, Vertex takes priority. The chosen auth mode (apikey or vertex) persists to ~/.ima2/config.json as geminiAuthMode and is restored on server startup. Per-model cost: nano-banana-2 (Flash): 512=$0.001, 1K=$0.003, 2K=$0.004, 4K=$0.006; nano-banana-pro: 1K=$0.007, 2K=$0.007, 4K=$0.013. No web search or mask controls.
  • API-key generation covers classic generate, edit, mask-guided edit, multimode, and node generation.
  • If provider: "api" is requested without an API key, routes fail before upstream with 401 and API_KEY_REQUIRED.
  • Grok generation maps size to xAI aspect_ratio and resolution; it does not send an OpenAI-style size field upstream. Grok edit uses xAI /v1/images/edits; Grok mask edit remains unsupported and returns GROK_MASK_UNSUPPORTED.
  • Mask edits are mask/selection guided edits, not pixel-perfect inpaint guarantees.

Grok video generation uses POST /api/video/generate (SSE). See the Video Generation section below for the full endpoint specification.

Health And Status

Method Path Notes
GET /api/health Server health, version, paths, provider policy
GET /api/providers Provider availability and runtime ports
GET /api/oauth/status OAuth proxy status and visible models
GET /api/grok/status Bundled progrok status and visible xAI image models
GET /api/billing Billing/status probe, including API key source when configured
GET /api/quota Provider quota: returns { codex, grok }. Grok result includes billing: { usedUsd, limitUsd } and a monthly percent window drawn from the xAI billing API.

Account Switching

Method Path Notes
POST /api/auth/switch Start a device-code OAuth flow. Body: { "provider": "grok" | "codex" }. Returns { sessionId, userCode, verificationUrl }.
GET /api/auth/switch/:sessionId Poll switch-account session status. Returns { status } where status is pending, complete, error, or expired.

The Switch Account flow opens a browser verification URL. Once the user completes the device-code step, the server saves the new credentials (Grok: ~/.progrok/auth.json; Codex: via codex login --device-auth) and the session transitions to complete. This endpoint is surfaced as a Switch Account button in the Settings QuotaCard for Grok and Codex providers.

Storage

Method Path Notes
GET /api/storage/status Summarized gallery storage status for support UI
POST /api/storage/open-generated-dir Ask the server process to open the generated image folder

GET /api/storage/status returns a support-safe summary, not raw legacy path arrays by default.

{
  "ok": true,
  "data": {
    "generatedDirLabel": "~/.ima2/generated",
    "generatedCount": 0,
    "legacyCandidatesScanned": 18,
    "legacySourcesFound": 0,
    "legacyFilesFound": 0,
    "state": "not_found",
    "messageKind": "apology",
    "recoveryDocsPath": "docs/RECOVER_OLD_IMAGES.md",
    "doctorCommand": "ima2 doctor",
    "overrides": {
      "generatedDir": false,
      "configDir": false
    }
  }
}

Storage state values:

State Meaning
ok Current gallery has files or no recovery notice is needed
recoverable Legacy folders/files are still present and may be recoverable
not_found Current gallery is empty and no legacy folder was found
unknown Storage status inspection failed or was incomplete

POST /api/storage/open-generated-dir opens the generated image folder on the machine running ima2 serve. If the browser is connected to a remote server, VM, container, WSL instance, or another computer on the network, this action targets that server machine, not necessarily the browser device.

In-Flight Jobs

Method Path Notes
GET /api/inflight Active jobs only by default
GET /api/inflight?includeTerminal=1 Includes recent terminal jobs for debugging
DELETE /api/inflight/:requestId Cancel or forget an active job
GET /api/events Persistent SSE multiplex channel for all async generation progress (see below)

In-flight logs and responses use requestId for correlation. Logs should not include raw prompts, reference data URLs, generated base64, tokens, cookies, auth headers, or raw upstream bodies.

Events (SSE Multiplexing)

GET /api/events (SSE Multiplexing)

Single persistent Server-Sent Events channel that carries progress for all async generation jobs. The browser UI opens one EventSource here instead of holding a per-request SSE connection for each job, avoiding browser per-origin connection limits.

Query Notes
lastEventId Optional. Reconnect cursor; also accepted via the Last-Event-ID request header

Response: text/event-stream (persistent). Each frame uses standard SSE fields id, event, and data (JSON).

Connection limits: When active listeners reach 512, the server returns 503 with SSE_CAPACITY before opening the stream.

Heartbeat: Every 15 seconds the server writes a comment frame:

: ping

Replay: On reconnect, the server replays events from an in-memory ring buffer (size 2000) for IDs newer than lastEventId. Large image payloads (>1000 characters) are omitted from replay with _imageOmitted: true in the data payload. If the requested ID is older than the oldest buffered event, the server emits a replay-gap event before live fan-out:

Event Data Description
replay-gap { lastEventId, oldestAvailableId } Client should reconcile inflight state (for example via GET /api/inflight)

Job routing: Every data payload includes jobId (same value as the job's requestId). Event bodies also carry requestId where applicable. Clients filter events by matching data.jobId or data.requestId to the job they started.

Event types (fan-out to all connected clients):

Event Emitted by Description
phase node, multimode, video Lifecycle phase change
partial node, multimode Progressive preview image (base64 data URL)
image multimode Final saved GenerateItem for one sequence image
done node, multimode, video Terminal success payload (route-specific shape)
error all generation routes Terminal failure
submitted video Job submitted to xAI
progress video Progress fraction 0.0–1.0
planning video Video planner running

Example SSE frame:

id: 42
event: phase
data: {"requestId":"req_abc","jobId":"req_abc","phase":"streaming"}

Async generation mode

POST /api/node/generate, POST /api/generate/multimode, and POST /api/video/generate support an async POST mode for clients that already hold GET /api/events:

{
  "async": true,
  "requestId": "req_xxx",
  "...": "other route fields"
}
Outcome HTTP Body
Accepted 202 { "requestId": "req_xxx" }
Duplicate active requestId 409 REQUEST_ID_IN_USE
More than the configured concurrent active job limit 429 TOO_MANY_JOBS with Retry-After: 5; default limit is 24 via IMA2_MAX_PARALLEL

Progress events are published on GET /api/events. The POST response returns immediately; clients must not expect SSE on the POST connection when async: true.

CLI and legacy clients omit async and keep the original behavior: per-request SSE on the same POST response (Accept: text/event-stream where applicable). The server dual-emits in that mode — it writes SSE to the POST response and also publishes the same events on GET /api/events.

Generation

POST /api/generate

Text-to-image and reference-guided root generation.

{
  "prompt": "a shiba in space",
  "quality": "medium",
  "size": "1024x1024",
  "format": "png",
  "moderation": "low",
  "provider": "oauth",
  "model": "gpt-5.4",
  "references": [],
  "requestId": "optional-client-id",
  "storyboard": false
}

Supported quality values: low, medium, high.

Supported moderation values: auto, low.

When storyboard is true, the server prepends storyboard keyframe instructions so image generations maintain character and scene continuity for multi-shot video production.

Recommended model: gpt-5.4. Current app default: gpt-5.4-mini. gpt-5.5 is the strongest quality option when supported, but callers should expect higher quota pressure and possible Codex CLI/backend capability requirements.

When provider is "grok", supported models are grok-imagine-image and grok-imagine-image-quality. The server uses grok-4.3 as the search/planner model by default (IMA2_GROK_PLANNER_MODEL) and times the mandatory search and planner steps separately from the image call (IMA2_GROK_PLANNER_TIMEOUT_MS). For n > 1, search and planning run once and the planned prompt is reused for the image requests. Successful Grok classic generations report one mandatory web-search call in metadata.

If references are present on a Grok classic request, ima2 still performs the mandatory search and grok-4.3 planning phases. The planner receives the reference images as multimodal image_url inputs, and its forced generate_image.prompt argument is instructed to be English-only except for exact visible text requested by the user. The final image call then uses xAI /v1/images/edits with the same reference images instead of /v1/images/generations. This keeps image-to-image/reference context alive through the three-phase pipeline. xAI currently documents up to three source images for image editing, so Grok classic requests with more than three references return GROK_REF_TOO_MANY.

Grok size mapping:

Requested size xAI aspect_ratio xAI resolution
1024x1024 1:1 1k
1536x1024 3:2 1k
1024x1536 2:3 1k
1360x1024 4:3 1k
1024x1360 3:4 1k
1824x1024 16:9 1k
1024x1824 9:16 1k
2048x2048 1:1 2k
2048x1152 16:9 2k
1152x2048 9:16 2k
3840x2160 16:9 2k
2160x3840 9:16 2k
auto auto omitted

Custom sizes are reduced to the closest xAI-supported aspect ratio and use 2k when the requested longest edge or pixel budget is closer to a 2K image.

POST /api/edit

Image edit / image-to-image generation.

The request includes a prompt and image payload. provider: "api" sends the prompt and image through the shared Responses image adapter. Optional masks are forwarded as mask guidance, not a pixel-perfect edit guarantee.

With provider: "grok", edit requests are sent to xAI /v1/images/edits through the bundled progrok proxy. Masked Grok edits are rejected before upstream with GROK_MASK_UNSUPPORTED.

Grok multimode currently sends each image request directly to xAI Images API with the mapped aspect_ratio/resolution; the mandatory search + planner pipeline is limited to classic /api/generate.

POST /api/node/generate

Node-mode generation and child edits.

Body fields:

{
  "parentNodeId": "optional-server-node-id",
  "prompt": "continue this image",
  "quality": "medium",
  "size": "1024x1024",
  "format": "png",
  "moderation": "low",
  "model": "grok-imagine-image",
  "references": [],
  "externalSrc": "optional-history-url",
  "sessionId": "session-id",
  "clientNodeId": "client-node-id",
  "requestId": "request-id",
  "provider": "grok"
}

When parentNodeId is present, the server loads the stored parent node image and uses the edit path. Node-local references are allowed on both root and child/edit nodes; for child/edit nodes the parent image is sent first, then references, then the text prompt.

With provider: "grok", Node Mode uses the same xAI search + grok-4.3 planner + Images API pipeline as classic generation. A parent node image, externalSrc, or extra references are passed to the planner and then to xAI /v1/images/edits; otherwise the final call uses /v1/images/generations. Grok Node requests are capped at three total input images, counting the parent/current image plus references, and return GROK_REF_TOO_MANY before upstream when that limit is exceeded. quality: "high" promotes the final image model to grok-imagine-image-quality.

The route can stream Server-Sent Events when the client sends Accept: text/event-stream. Possible events include phase, partial, done, and error. Alternatively, send { "async": true, "requestId": "req_xxx" } in the body to receive 202 { requestId } immediately and follow progress on GET /api/events (see Events section).

Grok Node SSE responses do not include Responses API partial image events because the xAI Images API call is synchronous JSON. They still emit phase and done/error events so the Node UI can use the same in-flight lifecycle.

POST /api/generate/multimode (SSE)

Multi-image sequence generation. SSE-only on the POST response unless async mode is used.

{
  "prompt": "a story in four panels",
  "maxImages": 4,
  "quality": "medium",
  "size": "1024x1024",
  "format": "png",
  "moderation": "low",
  "model": "gpt-5.4",
  "provider": "oauth",
  "references": [],
  "requestId": "optional-client-id",
  "async": false
}

Send Accept: text/event-stream for per-request SSE on the POST connection. Or set "async": true with a client requestId to get 202 { requestId } and receive events on GET /api/events.

SSE events:

Event Data Description
phase { requestId, phase, sequenceId?, maxImages? } Lifecycle phase
partial { requestId, image, index } Progressive preview
image full GenerateItem One saved sequence image
done route-specific summary; may include status: "partial" after timeout if at least one image was saved Sequence complete
error { requestId, error, code?, status? } Generation failed

GET /api/node/:nodeId

Fetch stored node metadata and asset URL.

Reference Images

Reference uploads are capped at 5 items. The frontend compresses large JPEG/PNG files before sending them. HEIC/HEIF files are rejected with a user-facing conversion hint.

Server-side validation may return these reference codes:

Code Meaning
REF_NOT_ARRAY references was not an array
REF_TOO_MANY More than the configured reference count
REF_NOT_STRING A reference item was not a string
REF_EMPTY A reference item was empty
REF_TOO_LARGE A reference exceeded the configured base64 size
REF_NOT_BASE64 A reference was not valid base64
GROK_REF_TOO_MANY Grok classic generation received more than three reference images
GROK_MASK_UNSUPPORTED Grok edit was requested with a mask; xAI mask edit is not wired in this release

Video Generation

POST /api/video/generate (SSE)

Generate a video via the Grok video provider. Returns Server-Sent Events on the POST connection, or accepts async mode ({ "async": true, "requestId": "req_xxx" }) for 202 { requestId } with progress on GET /api/events (see Events section).

{
  "prompt": "a cat playing piano",
  "provider": "grok",
  "model": "grok-imagine-video",
  "duration": 5,
  "resolution": "480p",
  "aspectRatio": "auto",
  "sourceImage": "<base64>",
  "referenceImages": ["<base64>", "<base64>"],
  "referenceFilenames": ["existing-file.png"],
  "continueFromVideo": "1780226256355_50252101.mp4",
  "continuityLineage": { "lineageId": "optional-client-hint", "entries": [] },
  "sessionId": "optional",
  "requestId": "optional-client-id"
}

Models: grok-imagine-video (default), grok-imagine-video-1.5-preview.

Mode is auto-detected from reference inputs:

Inputs Mode Duration cap
No images text-to-video 1–15s
1 image (sourceImage or sourceFilename) image-to-video 1–15s
2–7 images (referenceImages / referenceFilenames) reference-to-video 1–10s

Parameters:

Field Type Default Notes
prompt string Required
provider string "grok" "grok" or "grok-api"
model string grok-imagine-video Video model
duration integer 5 1–15 seconds (clamped to 10 for reference-to-video)
resolution string "480p" 480p or 720p
aspectRatio string "auto" 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, auto
sourceImage string Base64 image for image-to-video
sourceFilename string Existing generated file for image-to-video
referenceImages string[] Base64 images for reference-to-video
referenceFilenames string[] Existing generated files for reference-to-video
continueFromVideo string Generated .mp4 parent; server extracts its last frame and rebuilds lineage from sidecar
continuityLineage object Optional client hint; used only when continueFromVideo is absent
plannerModel string grok-4.3 Grok video planner model override (also via settings UI or IMA2_GROK_PLANNER_MODEL)
storyboard boolean false Enable storyboard mode — maintains character/scene continuity across sequential clips

Blank prompts return PROMPT_REQUIRED with a guidance string. The active prompt should describe visual flow, motion flow, sound/music/no-music, dialogue/no-dialogue, ending frame, and duration pacing. The video planner uses the selected duration as the full clip runtime and expands short requests into a production-level sequence with opening composition, connected motion/emotion change, and a stable ending frame suitable for continuation. For multi-character scenes, the planner identifies speakers by visual appearance (clothing, physique, position, props) rather than names, and attributes each dialogue line accordingly.

When continueFromVideo is present, the server treats the generated .mp4 sidecar as authoritative. Client continuityLineage cannot override it. The saved child sidecar includes videoContinuity, a branch-local max-4 stack using keep-start-plus-latest-3 retention.

videoContinuity shape:

{
  "lineageId": "lineage:parent",
  "parentFilename": "parent.mp4",
  "sourceFrame": "last",
  "maxEntries": 4,
  "retention": "keep-start-plus-latest-3",
  "entries": [
    {
      "id": "clip:parent.mp4",
      "ordinal": 1,
      "role": "start",
      "filename": "parent.mp4",
      "userPrompt": "original user prompt",
      "revisedPrompt": "planner prompt actually sent to Grok video",
      "createdAt": 1780300000000
    }
  ]
}

Entry role is start, ancestor, parent, or current. The first clip is kept as the start anchor; later generations keep only the latest three entries. lineageId uses the generated video basename without the .mp4 extension. This metadata is stored in the generated .mp4.json sidecar and returned in history rows and video done events; /generated/*.json remains private.

Grok prompt surfaces used by video APIs:

Surface Model Responsibility
Video planner grok-4.3 (override via plannerModel) Converts user prompt, search context, refs, and optional continuity lineage into the final English video prompt. It must structure core subject, action/motion, camera/composition, environment/style, dialogue/audio, ending-frame handoff, and constraints. Multi-character dialogue uses appearance-based speaker identification.
Video generation xAI video model Receives the planner prompt plus sourceImage or referenceImages when present.
Video analysis grok-4.3 Reads first/last frame images from /api/video/analyze and returns recreation/continuation guidance.

SSE events:

Event Data Description
planning { requestId } Preparing video generation
submitted { requestId, xaiVideoRequestId, requestedModel, effectiveModel, modelFallback } Submitted to xAI
progress { requestId, progress, stalled } Progress 0.0–1.0
done { requestId, filename, url, mediaType, revisedPrompt, elapsed, usage, requestedModel, effectiveModel, modelFallback, video, videoContinuity } Video ready
error { error, code, status, requestId, guidance? } Generation failed

Video error codes:

Code Meaning
VIDEO_PROVIDER_UNSUPPORTED Provider is not "grok"
PROMPT_REQUIRED Empty or missing prompt
INVALID_GROK_VIDEO_MODEL Model not in valid set
INVALID_VIDEO_RESOLUTION Resolution not 480p or 720p
INVALID_VIDEO_ASPECT_RATIO Aspect ratio not in valid set
INVALID_VIDEO_DURATION Duration not 1–15 integer
GROK_VIDEO_REF_TOO_MANY More than 7 reference images
GROK_VIDEO_FAILED Upstream xAI video generation failed
GROK_VIDEO_FRAME_FAILED Server could not extract the parent video's last frame

POST /api/video/edit

Edit an existing video via Grok V2V. This is a blocking JSON endpoint that starts the xAI edit job, polls it, downloads the final MP4, and saves it as a generated video artifact.

{
  "prompt": "make it sunset",
  "videoUrl": "https://vidgen.x.ai/.../clip.mp4",
  "model": "grok-imagine-video"
}

videoUrl may be an HTTPS video URL, xAI file_id, data:video/* URL, or generated .mp4 filename. Generated-file inputs are restricted to real .mp4 files under the generated directory.

POST /api/video/extend

Extend a video from its last frame. This is a blocking JSON endpoint that starts the xAI extension job, polls it, downloads the combined output MP4, and saves it as a generated video artifact.

{
  "prompt": "camera pulls back",
  "videoUrl": "1780226256355_50252101.mp4",
  "duration": 6,
  "model": "grok-imagine-video"
}

duration must be an integer from 2 to 10 seconds. Edit and extension support grok-imagine-video only; grok-imagine-video-1.5-preview is not accepted for these endpoints.

GET /api/video/frame

Extract a PNG frame from a generated .mp4 file.

Query Notes
file Required generated .mp4 filename or generated-dir absolute path
position last (default) or non-negative seconds

POST /api/video/analyze

Analyze first and last frames from a generated .mp4 using Grok 4.3 image understanding. This does not upload the video as temporal video; it extracts two PNG frames and asks the vision model to infer likely motion.

{
  "videoUrl": "1780226256355_50252101.mp4"
}

Remote URLs and data: inputs are intentionally rejected to avoid server-side URL fetching through ffmpeg.

History

Method Path Notes
GET /api/history List generated assets
GET /api/history?groupBy=session Group assets by session title
DELETE /api/history/:filename Tombstone a generated asset
POST /api/history/:filename/restore Restore a recently deleted asset

History rows can include node metadata such as sessionId, nodeId, clientNodeId, requestId, and refsCount.

Sessions And Graphs

Method Path Notes
GET /api/sessions List graph sessions
POST /api/sessions Create a session
GET /api/sessions/:id Load a session and graph
PATCH /api/sessions/:id Rename a session
DELETE /api/sessions/:id Delete a session
PUT /api/sessions/:id/graph Save graph snapshot

PUT /api/sessions/:id/graph requires an If-Match header containing the current graph version.

Version mismatch returns GRAPH_VERSION_CONFLICT and the current version. This only means the client saved against a stale graph version; it is not proof that another browser tab changed the graph.

Graph save requests may include observability headers:

X-Ima2-Graph-Save-Id
X-Ima2-Graph-Save-Reason
X-Ima2-Tab-Id

Style Sheets

Method Path Notes
GET /api/sessions/:id/style-sheet Load session style sheet
PUT /api/sessions/:id/style-sheet Save style sheet
PATCH /api/sessions/:id/style-sheet/enabled Toggle style sheet usage
POST /api/sessions/:id/style-sheet/extract Extract style fields from prompt/reference

Style-sheet extraction can require an API key/openai client. Image generation also supports provider: "api" through the shared Responses API image adapter when an API key is configured.

Common Error Codes

Code Meaning
API_KEY_REQUIRED provider: "api" was requested without a configured API key
APIKEY_DISABLED Legacy/deprecated hard-block code from older builds
INVALID_IMAGE_MODEL Model name is unknown or unsupported
IMAGE_MODEL_UNSUPPORTED Model exists but cannot use image generation
INVALID_REQUEST Upstream request parameters are invalid; raw provider details may be included as upstreamCode, upstreamType, and upstreamParam
INVALID_MODERATION Moderation value is not auto or low
SAFETY_REFUSAL Upstream safety refusal
MODERATION_REFUSED Content generation refused by moderation
AUTH_CHATGPT_EXPIRED Codex/ChatGPT OAuth session expired
AUTH_API_KEY_INVALID API key is invalid, revoked, out of quota, or wrong org
NETWORK_FAILED Network, proxy, VPN, or firewall failure
OAUTH_UNAVAILABLE Local OAuth proxy is not available
OPEN_GENERATED_DIR_FAILED The server could not open the generated image folder
GRAPH_VERSION_REQUIRED Missing graph If-Match header
GRAPH_VERSION_CONFLICT Stale graph version
GRAPH_TOO_LARGE Graph exceeds node/edge limits
NODE_NOT_FOUND Node metadata was not found
INVALID_GROK_IMAGE_MODEL A Grok request used a model outside grok-imagine-image or grok-imagine-image-quality
GROK_RATE_LIMITED xAI returned a rate-limit response through progrok
GROK_AUTH_FAILED progrok could not authenticate the xAI request
GROK_SEARCH_TIMEOUT / GROK_PLANNER_TIMEOUT / GROK_IMAGE_TIMEOUT The Grok search, planner, or image API step exceeded its timeout budget
AGY_GENERATION_FAILED Gemini (agy) image generation failed
AGY_TIMEOUT Agy CLI process exceeded its 360-second timeout
AGY_PROCESS_ERROR Agy CLI binary failed to start or crashed
AGY_QUOTA_EXHAUSTED Gemini API quota exhausted (rate limit)
AGY_PARSE_FAILED Could not parse artifact path from agy output
AGY_ARTIFACT_NOT_FOUND Agy reported an artifact path that does not exist
AGY_PATH_REJECTED Agy artifact path was outside allowed directories
AGY_VIDEO_UNSUPPORTED Video generation is not supported by the Gemini (agy) provider
AGY_MASK_UNSUPPORTED Mask-based editing is not supported by the Gemini (agy) provider
AGY_REF_TOO_MANY Too many reference images for agy (max 3)
GEMINI_API_KEY_MISSING Gemini API key or Vertex AI credentials not configured
GEMINI_API_RATE_LIMITED Gemini API rate limited (429)
GEMINI_API_BAD_REQUEST Gemini API bad request (400/403)
GEMINI_API_SAFETY_BLOCKED Gemini API generation blocked by safety filter
GEMINI_API_NO_IMAGE Gemini API returned no image in response
VIDEO_PROVIDER_UNSUPPORTED Video generation requires provider "grok" or "grok-api"
SSE_CAPACITY More than 512 concurrent GET /api/events listeners
REQUEST_ID_IN_USE Async POST used a requestId that already has an active job
TOO_MANY_JOBS More than the configured concurrent active generation job limit (Retry-After: 5; default 24)

Key Management

API key management endpoints for configuring provider credentials at runtime through the web UI or HTTP API.

Endpoint Method Description
/api/keys/status GET Returns configured/valid/maskedKey status for all providers (openai, xai, gemini, vertex) plus geminiAuthMode ("apikey" or "vertex")
/api/keys/:provider PUT Save an API key. Body: { "apiKey": "..." }. Validates key format and upstream before saving to config.json. Provider: openai, xai, or gemini.
/api/keys/:provider DELETE Remove a config-sourced API key. Env-sourced keys cannot be removed (ENV_KEY_IMMUTABLE).
/api/keys/vertex PUT Save a Vertex AI service account JSON. Body: { "serviceAccountJson": "..." }. Validates JSON structure (type: "service_account", project_id required).
/api/keys/vertex DELETE Remove a config-sourced Vertex AI service account.
/api/keys/gemini-auth-mode PUT Persist the Gemini auth mode chosen in the settings dropdown. Body: { "mode": "apikey" | "vertex" }. Saved to config.json and hot-updated.

Keys saved via PUT are stored in config.json and hot-updated in the runtime context (no server restart required). Keys loaded from environment variables (OPENAI_API_KEY, XAI_API_KEY, GEMINI_API_KEY, VERTEX_SERVICE_ACCOUNT_JSON) take precedence and are immutable through the API.

Thumbnail Backfill

Endpoint Method Description
/api/history/backfill-thumbnails POST Generate missing .thumb.jpg thumbnails for all images and videos in the generated directory. Returns { ok, total, created, skipped, failed }. Also available offline via ima2 backfill-thumbs.

Thumbnails are also generated automatically on server startup for any media files that lack them.

Endpoint → CLI Mapping

Most server routes under /api/* have a CLI wrapper. The exception is Agent Mode (/api/agent/*), which is server + web-UI-only and has no ima2 subcommand. The prompt builder HTTP route (POST /api/prompt-builder/chat) is wrapped by ima2 prompt build. Use this table to find the command that calls a given endpoint. (See README.md "Client" section for full flag lists.)

Endpoint CLI
POST /api/generate ima2 gen
POST /api/edit ima2 edit
POST /api/generate/multimode (SSE) ima2 multimode
POST /api/video/generate (SSE) ima2 video
POST /api/video/generate with continueFromVideo ima2 video continue
POST /api/video/edit ima2 video edit
POST /api/video/extend ima2 video extend
GET /api/video/frame ima2 video frame
POST /api/video/analyze ima2 video analyze
POST /api/node/generate (SSE) / GET /api/node/:id ima2 node generate / ima2 node show
GET /api/history ima2 ls
DELETE /api/history/:name / …/permanent ima2 history rm [--permanent]
POST /api/history/:filename/restore ima2 history restore --trash-id
POST /api/history/favorite ima2 history favorite
POST /api/history/import-local ima2 history import
POST /api/metadata/read ima2 metadata / ima2 show --metadata
GET/POST/PUT/DELETE /api/sessions[/…] ima2 session ls/show/create/rm/rename
GET/PUT /api/sessions/:id/graph ima2 session graph load/save
GET/PUT /api/sessions/:id/style-sheet[/…] ima2 session style-sheet …
GET/PUT/DELETE /api/annotations/:name ima2 annotate get/set/rm
POST /api/canvas-versions / PUT /api/canvas-versions/:name ima2 canvas-versions save/update
GET/POST/PUT/DELETE /api/prompts[/…] ima2 prompt …
GET/POST/PATCH/DELETE /api/prompts/folders[/…] ima2 prompt folder …
…/api/prompts/import/… ima2 prompt import sources/refresh/curated/discovery/folder
…/api/cardnews/… (gated on features.cardNews) ima2 cardnews …
POST /api/comfy/export-image ima2 comfy export
GET /api/inflight / DELETE /api/inflight/:id ima2 inflight ls (alias ps) / ima2 inflight rm (alias cancel)
GET /api/events (SSE multiplex) Web UI only (persistent EventSource; no CLI wrapper)
GET /api/storage/status / POST /api/storage/open-generated-dir ima2 storage status / ima2 storage open
GET /api/billing / GET /api/providers / GET /api/oauth/status / GET /api/grok/status ima2 billing / ima2 providers / ima2 oauth status / ima2 grok status
GET /api/quota ima2 billing (includes Grok usedUsd/limitUsd)
POST /api/auth/switch / GET /api/auth/switch/:sessionId Web UI only (Settings > QuotaCard > Switch Account)
GET /api/health ima2 ping
GET /api/capabilities ima2 capabilities
GET /api/config/grok-planner — (Grok planner model query)
PATCH /api/config/grok-planner — (Grok planner model update)
GET /api/agy/status — (Antigravity CLI install status)
POST /api/history/backfill-thumbnails ima2 backfill-thumbs
GET /api/keys/status, PUT/DELETE /api/keys/:provider, PUT/DELETE /api/keys/vertex Web UI only (Settings > API Keys)
GET/POST/PATCH/DELETE /api/agent/* (sessions, turns, queue) — (Agent Mode; web UI only, no CLI)
POST /api/prompt-builder/chat ima2 prompt build

Notes:

  • ima2 history favorite and ima2 annotate … send X-Ima2-Browser-Id: cli-<sha1prefix> derived from the config dir, so CLI activity does not collide with browser sessions.
  • ima2 session graph save performs a GET-then-PUT with If-Match: "<version>" to guard against GRAPH_VERSION_CONFLICT.
  • ima2 history import and ima2 canvas-versions save/update send raw bytes with Content-Type: image/<png|jpeg|webp>; the SSE endpoints (multimode, node generate, video) use Accept: text/event-stream. The web UI instead uses GET /api/events plus async: true on POST routes.
  • ima2 cardnews … checks runtimeConfig.features.cardNews before calling the gated endpoints; when disabled the CLI exits 2 with a clear message instead of producing a 404.

CLI Discovery

The server writes an advertisement file at:

~/.ima2/server.json

CLI commands such as ima2 ping, ima2 gen, and ima2 ls use this file unless --server or IMA2_SERVER is provided.

Current shape:

{
  "port": 3334,
  "url": "http://localhost:3334",
  "pid": 12345,
  "startedAt": 1777180000000,
  "version": "1.0.0",
  "backend": {
    "configuredPort": 3333,
    "actualPort": 3334,
    "url": "http://localhost:3334"
  },
  "oauth": {
    "configuredPort": 10531,
    "actualPort": 10532,
    "url": "http://127.0.0.1:10532",
    "status": "ready"
  }
}

Top-level port and url are kept for older CLI clients. New code should prefer backend.url.