Skip to content

Fork with local MLX inference, WebGPU browser inference, clipboard image paste, Rust prompt scanner #467

@m0at

Description

@m0at

Hey — I've been maintaining a fork at m0at/hermes-agent with several features I'd love to contribute back if there's interest. ~109 commits, +4,900 lines, all backward-compatible with upstream.

What's in the fork

1. Local Qwen3.5-9B on Apple Silicon (--provider local)

  • MLX-VLM server auto-starts on port 8800, serves OpenAI-compatible /v1/chat/completions
  • Vision + text, 4-bit quantized, runs on unified GPU memory
  • Server tied to hermes session (atexit + SIGTERM cleanup, no orphan processes)
  • Files: local_models/serve.py, changes to runtime_provider.py

2. WebGPU Client-Side Browser Inference (--provider webgpu)

  • Starlette bridge server (port 8801) connects Hermes CLI to WebLLM running in the user's browser
  • Models load via WebGPU — zero server-side compute, nothing leaves the machine
  • 5 quantized models (Qwen3 4B, Qwen2.5 3B, Llama 3.1 8B, Mistral 7B, SmolLM2 1.7B)
  • Files: web_client/bridge.py, web_client/index.html

3. Clipboard Image Paste (Cmd+V / Ctrl+V)

  • Paste screenshots into the chat for VLM analysis — shows [Image #N] widget, base64-encodes as OpenAI image_url content part
  • Required running PyObjC's NSPasteboard in a subprocess because prompt_toolkit's asyncio loop starves CFRunLoop, causing NSPasteboard to silently return nil in-process
  • Extraction chain: pngpaste → PyObjC subprocess → osascript fallback (macOS), xclip (Linux)
  • Files: changes to cli.py, run_agent.py

4. Rust-Accelerated Prompt Injection Scanner

  • PyO3 native module (hermes_rs) — compiled RegexSet for 10 threat patterns
  • 17x faster than Python regex on real context files
  • Falls back to pure Python if not installed
  • Files: hermes_rs/ directory

5. CLI Improvements

  • Dynamic terminal width (no more 200-char overflow)
  • Think block styling (<think> → dim italic with ~ thinking ~ header)
  • Color scheme picker (cyber/synthwave)
  • /provider hot-swap mid-session
  • /copycode — import Claude Code skills
  • Context compression: fallback client chain, provider-aware token handling

6. Evaluation Testbed & RL Pipeline

  • testbed/ — REPL, eval runner, task definitions
  • batch_runner.py + trajectory_compressor.py for GRPO training data

How I'd contribute

Happy to split into focused PRs if you're interested:

  1. Local MLX inference (smallest, self-contained)
  2. Rust prompt scanner (additive, no core changes)
  3. WebGPU bridge
  4. Clipboard image paste
  5. CLI improvements
  6. Eval testbed + RL pipeline

Each is independent and reviewable on its own. Let me know what (if any) of this would be useful upstream.

Fork: https://github.com/m0at/hermes-agent

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havetype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions