VLM Run Skills are definitions for visual AI tasks like image understanding, video processing, and document extraction. They are interoperable with Anthropic's Claude Code.
The Skills in this repository follow the standardized Agent Skill format.
In practice, skills are self-contained folders that package instructions, scripts, and resources together for an AI agent to use on a specific use case. Each folder includes a SKILL.md file with YAML frontmatter (name and description) followed by the guidance your coding agent follows while the skill is active.
- Understanding & Captioning: Describe, analyze, and interpret images with state-of-the-art visual intelligence
- Detection & Localization: Detect and locate objects, people, faces, and custom entities with bounding boxes
- Segmentation: Segment objects, scenes, and regions with pixel-level precision
- Generation & Editing: Generate images from text, edit existing images, apply super-resolution, colorize B&W photos
- Tools: Crop, rotate, enhance resolution (4x-8x upscaling), de-oldify (colorization)
- Visual Grounding: Point to and extract specific elements using natural language queries
- UI Parsing: Extract UI elements, layouts, and hierarchies from screenshots
- Understanding & Captioning: Describe video content, generate summaries and detailed scene analysis
- Transcription: Extract audio transcripts with timestamps
- Tools: Trim videos, extract keyframes, sample frames at intervals, detect highlights
- Segmentation: Identify and segment objects across video frames
- Generation & Editing: Generate videos from text prompts, edit existing videos
- Layout Understanding: Detect headers, paragraphs, tables, figures, lists, and structural elements
- Multi-Page Analysis: Process and analyze PDFs with intelligent page-aware extraction
- Markdown Extraction: Convert documents to clean, structured markdown with preserved formatting
- Visual Grounding: Locate and extract specific fields, sections, or data points
- Data Extraction: Extract key information from invoices, receipts, contracts, forms into structured JSON
- Multi-Modal Reasoning: Execute complex multi-step workflows across images, documents, and videos
- Structured Outputs: Get results in validated JSON schemas with automatic retry logic
See docs and technical whitepaper for more information.
- Get your VLM Run API key from app.vlm.run
- Have uv installed for Python environment management
- Register the repository as a plugin marketplace:
/plugin marketplace add vlm-run/skills
- To install a skill, run:
/plugin install <skill-name>@vlm-run/skills
For example:
/plugin install vlmrun-cli-skill@vlm-run/skills
Once the skill is installed, configure your API key using the CLI (get your key from app.vlm.run):
vlmrun config init
vlmrun config set --api-key <your-api-key>
vlmrun config showOnce installed, verify the skill is loaded by asking Claude Code (requires restart):
What skills are available in the /vlmrun-cli-skill?
You should be able to install the skill by simply asking Claude to create a new skill from the SKILL.md file.
This repository contains skills for interacting with VLM Run's Orion visual AI agent. You can also contribute your own skills to the repository.
| Name | Description | Documentation |
|---|---|---|
vlmrun-cli-skill |
Use the VLM Run CLI to interact with Orion visual AI agent. Process images, videos, and documents with natural language. Supports image understanding/generation, object detection, OCR, video summarization, document extraction, and visual AI chat. | SKILL.md |
Once a skill is installed, mention it directly while giving your coding agent instructions:
- "Use the VLM Run CLI skill to describe what's in this image"
- "Use the VLM Run CLI skill to generate an image of a sunset over mountains"
- "Use the VLM Run CLI skill to extract the text from this receipt"
- "Use the VLM Run CLI skill to summarize this meeting recording"
Your coding agent automatically loads the corresponding SKILL.md instructions and helper scripts while it completes the task.
- Copy one of the existing skill folders (for example,
skills/vlmrun-cli-skill/) and rename it. - Update the new folder's
SKILL.mdfrontmatter:--- name: my-skill-name description: Describe what the skill does and when to use it --- # Skill Title Guidance + examples + guardrails
- Add or edit supporting scripts, templates, and documents referenced by your instructions.
- Add an entry to
.claude-plugin/marketplace.jsonwith a concise, human-readable description. - Reinstall or reload the skill bundle in your coding agent so the updated folder is available.
Configure your API key and settings using the VLM Run CLI:
vlmrun config init
vlmrun config set --api-key <your-api-key>
vlmrun config show| Setting | Description |
|---|---|
api_key |
Your VLM Run API key (required) |
base_url |
API base URL (default: https://api.vlm.run/v1) |
cache_dir |
Cache directory (default: ~/.vlmrun/cache/artifacts/) |
See LICENSE for details.