Skip to content

Proposal: Optional desktop computer-use module (noVNC + screenshot + mouse/keyboard control) #15876

@0xMrBlueOps

Description

@0xMrBlueOps

Hi Hermes maintainers,

I've built an optional desktop computer-use module for Hermes Agent and want to check before submitting a PR if this aligns with the project direction.

What it adds

A new tool computer_use_tool.py plus a containerized desktop environment (hermes-desktop/) that gives Hermes optional access to:

  • Persistent Chromium browser with logged-in session support
  • Full desktop screenshots via xdotool
  • Vision analysis of the desktop
  • Mouse and keyboard control
  • noVNC takeover for visual debugging or human-in-the-loop intervention

The whole module is gated behind COMPUTER_USE_ENABLED=true. When the env var is unset (default), Hermes behavior is completely unchanged — no new dependencies loaded, no tool registered, no behavior drift.

Why it's useful

The existing browser_* tools are excellent for stateless web automation, DOM/ref interactions, and structured data extraction. They don't cover:

  • Tasks requiring persistent logged-in browser sessions (authenticated workflows)
  • Visual interpretation of complex pages (charts, dashboards, dense UIs)
  • Multi-app desktop workflows (browser + spreadsheet + notes)
  • Human-in-the-loop debugging via noVNC takeover
  • Any agent task involving non-browser desktop applications

This module complements browser_* rather than replacing it. The right tool for each job stays clear:

  • browser_* for clean stateless web automation
  • computer_use for stateful desktop sessions

Implementation

  • ~470-line tools/computer_use_tool.py
  • hermes-desktop/ Docker setup with isolated profile storage
  • One generic skill file skills/computer-use-basics/SKILL.md
  • Gated behind env var, default off
  • No changes to existing Hermes core files
  • No changes to default Hermes behavior

Use case demo

I can record a 30-60 second demo showing an agent using the new tool to handle a workflow that requires logged-in browser state — happy to wait until you confirm interest before recording.

Question

Does this fit Hermes's roadmap? Happy to scope down, restructure, or adjust based on guidance before opening a PR.

Branch is ready locally on feature/desktop-computer-use — just want to make sure I'm building what you'd actually want to merge.

Thanks for Hermes — it's been the foundation for everything I'm working on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havearea/dockerDocker image, Compose, packagingcomp/toolsTool registry, model_tools, toolsetstype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions