Skip to content

[vision] torchvision import initializes CUDA context, causing gateway/agent to reserve ~7GB VRAM each #29292

@HH1162

Description

@HH1162

Describe the bug

After upgrading to v0.14.0, both hermes gateway and hermes agent processes each reserve ~7GB of GPU VRAM at startup, even when idle and not performing any vision tasks. This is caused by torchvision being installed in the hermes venv — importing torchvision initializes a CUDA context that pre-allocates memory.

Impact

  • On systems with limited GPU memory (e.g., single GPU shared between inference services and Hermes), this ~14GB reservation significantly reduces available VRAM for other workloads (sglang, ComfyUI, etc.)
  • Users who rely on external multimodal LLMs (GPT-4o, Claude, etc.) do not benefit from the local vision_analyze pixel-through feature, making this VRAM cost purely wasteful

Root cause

v0.14.0 introduced vision_analyze pixel-through to vision-capable models (#22955), which added torchvision as a dependency. When torchvision is imported, it loads libcudart and initializes a CUDA context, which reserves ~7GB per process on NVIDIA GPUs.

Reproduction

  1. Upgrade to v0.14.0+
  2. Start hermes gateway run
  3. Run nvidia-smi --query-compute-apps=pid,used_memory --format=csv,noheader
  4. Observe gateway process reserving ~7GB VRAM

Environment

  • Hermes Agent: v0.14.0
  • GPU: NVIDIA RTX PRO 6000 Blackwell 96GB
  • PyTorch: 2.11.0+cu130
  • torchvision: 0.26.0

Suggested fix

  • Lazy-import torchvision only when vision_analyze is actually called with a local vision model, not at gateway/agent startup
  • Or add PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to reduce pre-allocation
  • Or make torchvision an optional dependency that is only loaded when the active model supports local vision

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildercomp/gatewayGateway runner, session dispatch, deliverytool/visionVision analysis and image generationtype/perfPerformance improvement or optimization

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions