[vision] torchvision import initializes CUDA context, causing gateway/agent to reserve ~7GB VRAM each

## Describe the bug

After upgrading to v0.14.0, both `hermes gateway` and `hermes agent` processes each reserve ~7GB of GPU VRAM at startup, even when idle and not performing any vision tasks. This is caused by `torchvision` being installed in the hermes venv — importing torchvision initializes a CUDA context that pre-allocates memory.

## Impact

- On systems with limited GPU memory (e.g., single GPU shared between inference services and Hermes), this ~14GB reservation significantly reduces available VRAM for other workloads (sglang, ComfyUI, etc.)
- Users who rely on external multimodal LLMs (GPT-4o, Claude, etc.) do not benefit from the local `vision_analyze` pixel-through feature, making this VRAM cost purely wasteful

## Root cause

v0.14.0 introduced `vision_analyze` pixel-through to vision-capable models (#22955), which added `torchvision` as a dependency. When torchvision is imported, it loads `libcudart` and initializes a CUDA context, which reserves ~7GB per process on NVIDIA GPUs.

## Reproduction

1. Upgrade to v0.14.0+
2. Start `hermes gateway run`
3. Run `nvidia-smi --query-compute-apps=pid,used_memory --format=csv,noheader`
4. Observe gateway process reserving ~7GB VRAM

## Environment

- Hermes Agent: v0.14.0
- GPU: NVIDIA RTX PRO 6000 Blackwell 96GB
- PyTorch: 2.11.0+cu130
- torchvision: 0.26.0

## Suggested fix

- Lazy-import torchvision only when `vision_analyze` is actually called with a local vision model, not at gateway/agent startup
- Or add `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` to reduce pre-allocation
- Or make torchvision an optional dependency that is only loaded when the active model supports local vision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vision] torchvision import initializes CUDA context, causing gateway/agent to reserve ~7GB VRAM each #29292

Describe the bug

Impact

Root cause

Reproduction

Environment

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[vision] torchvision import initializes CUDA context, causing gateway/agent to reserve ~7GB VRAM each #29292

Description

Describe the bug

Impact

Root cause

Reproduction

Environment

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions