GitHub - vlm-run/skills: Claude skill for VLM Run CLI

VLM Run Skills

Website | Platform | Docs | Blog | Discord

VLM Run Skills are definitions for visual AI tasks like image understanding, video processing, and document extraction. They are interoperable with Anthropic's Claude Code.

The Skills in this repository follow the standardized Agent Skill format.

How do Skills work?

In practice, skills are self-contained folders that package instructions, scripts, and resources together for an AI agent to use on a specific use case. Each folder includes a SKILL.md file with YAML frontmatter (name and description) followed by the guidance your coding agent follows while the skill is active.

Features

Image Intelligence

Understanding & Captioning: Describe, analyze, and interpret images with state-of-the-art visual intelligence
Detection & Localization: Detect and locate objects, people, faces, and custom entities with bounding boxes
Segmentation: Segment objects, scenes, and regions with pixel-level precision
Generation & Editing: Generate images from text, edit existing images, apply super-resolution, colorize B&W photos
Tools: Crop, rotate, enhance resolution (4x-8x upscaling), de-oldify (colorization)
Visual Grounding: Point to and extract specific elements using natural language queries
UI Parsing: Extract UI elements, layouts, and hierarchies from screenshots

Video Intelligence

Understanding & Captioning: Describe video content, generate summaries and detailed scene analysis
Transcription: Extract audio transcripts with timestamps
Tools: Trim videos, extract keyframes, sample frames at intervals, detect highlights
Segmentation: Identify and segment objects across video frames
Generation & Editing: Generate videos from text prompts, edit existing videos

Document Intelligence

Layout Understanding: Detect headers, paragraphs, tables, figures, lists, and structural elements
Multi-Page Analysis: Process and analyze PDFs with intelligent page-aware extraction
Markdown Extraction: Convert documents to clean, structured markdown with preserved formatting
Visual Grounding: Locate and extract specific fields, sections, or data points
Data Extraction: Extract key information from invoices, receipts, contracts, forms into structured JSON

Multi-modal Agents

Multi-Modal Reasoning: Execute complex multi-step workflows across images, documents, and videos
Structured Outputs: Get results in validated JSON schemas with automatic retry logic

See docs and technical whitepaper for more information.

Installation

Prerequisites

Get your VLM Run API key from app.vlm.run
Have uv installed for Python environment management

Claude Code

Register the repository as a plugin marketplace:

/plugin marketplace add vlm-run/skills

To install a skill, run:

/plugin install <skill-name>@vlm-run/skills

For example:

/plugin install vlmrun-cli-skill@vlm-run/skills

Configure your API key

Once the skill is installed, configure your API key using the CLI (get your key from app.vlm.run):

vlmrun config init
vlmrun config set --api-key <your-api-key>
vlmrun config show

Verify Installation

Once installed, verify the skill is loaded by asking Claude Code (requires restart):

What skills are available in the /vlmrun-cli-skill?

Installing in Claude for Desktop

You should be able to install the skill by simply asking Claude to create a new skill from the SKILL.md file.

Skills

This repository contains skills for interacting with VLM Run's Orion visual AI agent. You can also contribute your own skills to the repository.

Available skills

Name	Description	Documentation
`vlmrun-cli-skill`	Use the VLM Run CLI to interact with Orion visual AI agent. Process images, videos, and documents with natural language. Supports image understanding/generation, object detection, OCR, video summarization, document extraction, and visual AI chat.	SKILL.md

Using skills in your coding agent

Once a skill is installed, mention it directly while giving your coding agent instructions:

"Use the VLM Run CLI skill to describe what's in this image"
"Use the VLM Run CLI skill to generate an image of a sunset over mountains"
"Use the VLM Run CLI skill to extract the text from this receipt"
"Use the VLM Run CLI skill to summarize this meeting recording"

Your coding agent automatically loads the corresponding SKILL.md instructions and helper scripts while it completes the task.

Contribute or customize a skill

Copy one of the existing skill folders (for example, skills/vlmrun-cli-skill/) and rename it.

Update the new folder's SKILL.md frontmatter:

---
name: my-skill-name
description: Describe what the skill does and when to use it
---

# Skill Title
Guidance + examples + guardrails

Add or edit supporting scripts, templates, and documents referenced by your instructions.
Add an entry to .claude-plugin/marketplace.json with a concise, human-readable description.
Reinstall or reload the skill bundle in your coding agent so the updated folder is available.

Configuration

Configure your API key and settings using the VLM Run CLI:

vlmrun config init
vlmrun config set --api-key <your-api-key>
vlmrun config show

Setting	Description
`api_key`	Your VLM Run API key (required)
`base_url`	API base URL (default: `https://api.vlm.run/v1`)
`cache_dir`	Cache directory (default: `~/.vlmrun/cache/artifacts/`)

License

See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLM Run Skills

How do Skills work?

Features

Image Intelligence

Video Intelligence

Document Intelligence

Multi-modal Agents

Installation

Prerequisites

Claude Code

Configure your API key

Verify Installation

Installing in Claude for Desktop

Skills

Available skills

Using skills in your coding agent

Contribute or customize a skill

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude-plugin		.claude-plugin
skills/vlmrun-cli-skill		skills/vlmrun-cli-skill
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

VLM Run Skills

How do Skills work?

Features

Image Intelligence

Video Intelligence

Document Intelligence

Multi-modal Agents

Installation

Prerequisites

Claude Code

Configure your API key

Verify Installation

Installing in Claude for Desktop

Skills

Available skills

Using skills in your coding agent

Contribute or customize a skill

Configuration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages