Hey — I've been maintaining a fork at m0at/hermes-agent with several features I'd love to contribute back if there's interest. ~109 commits, +4,900 lines, all backward-compatible with upstream.
What's in the fork
1. Local Qwen3.5-9B on Apple Silicon (--provider local)
- MLX-VLM server auto-starts on port 8800, serves OpenAI-compatible
/v1/chat/completions
- Vision + text, 4-bit quantized, runs on unified GPU memory
- Server tied to hermes session (atexit + SIGTERM cleanup, no orphan processes)
- Files:
local_models/serve.py, changes to runtime_provider.py
2. WebGPU Client-Side Browser Inference (--provider webgpu)
- Starlette bridge server (port 8801) connects Hermes CLI to WebLLM running in the user's browser
- Models load via WebGPU — zero server-side compute, nothing leaves the machine
- 5 quantized models (Qwen3 4B, Qwen2.5 3B, Llama 3.1 8B, Mistral 7B, SmolLM2 1.7B)
- Files:
web_client/bridge.py, web_client/index.html
3. Clipboard Image Paste (Cmd+V / Ctrl+V)
- Paste screenshots into the chat for VLM analysis — shows
[Image #N] widget, base64-encodes as OpenAI image_url content part
- Required running PyObjC's
NSPasteboard in a subprocess because prompt_toolkit's asyncio loop starves CFRunLoop, causing NSPasteboard to silently return nil in-process
- Extraction chain: pngpaste → PyObjC subprocess → osascript fallback (macOS), xclip (Linux)
- Files: changes to
cli.py, run_agent.py
4. Rust-Accelerated Prompt Injection Scanner
- PyO3 native module (
hermes_rs) — compiled RegexSet for 10 threat patterns
- 17x faster than Python regex on real context files
- Falls back to pure Python if not installed
- Files:
hermes_rs/ directory
5. CLI Improvements
- Dynamic terminal width (no more 200-char overflow)
- Think block styling (
<think> → dim italic with ~ thinking ~ header)
- Color scheme picker (cyber/synthwave)
/provider hot-swap mid-session
/copycode — import Claude Code skills
- Context compression: fallback client chain, provider-aware token handling
6. Evaluation Testbed & RL Pipeline
testbed/ — REPL, eval runner, task definitions
batch_runner.py + trajectory_compressor.py for GRPO training data
How I'd contribute
Happy to split into focused PRs if you're interested:
- Local MLX inference (smallest, self-contained)
- Rust prompt scanner (additive, no core changes)
- WebGPU bridge
- Clipboard image paste
- CLI improvements
- Eval testbed + RL pipeline
Each is independent and reviewable on its own. Let me know what (if any) of this would be useful upstream.
Fork: https://github.com/m0at/hermes-agent
Hey — I've been maintaining a fork at m0at/hermes-agent with several features I'd love to contribute back if there's interest. ~109 commits, +4,900 lines, all backward-compatible with upstream.
What's in the fork
1. Local Qwen3.5-9B on Apple Silicon (
--provider local)/v1/chat/completionslocal_models/serve.py, changes toruntime_provider.py2. WebGPU Client-Side Browser Inference (
--provider webgpu)web_client/bridge.py,web_client/index.html3. Clipboard Image Paste (Cmd+V / Ctrl+V)
[Image #N]widget, base64-encodes as OpenAIimage_urlcontent partNSPasteboardin a subprocess because prompt_toolkit's asyncio loop starves CFRunLoop, causingNSPasteboardto silently return nil in-processcli.py,run_agent.py4. Rust-Accelerated Prompt Injection Scanner
hermes_rs) — compiledRegexSetfor 10 threat patternshermes_rs/directory5. CLI Improvements
<think>→ dim italic with~ thinking ~header)/providerhot-swap mid-session/copycode— import Claude Code skills6. Evaluation Testbed & RL Pipeline
testbed/— REPL, eval runner, task definitionsbatch_runner.py+trajectory_compressor.pyfor GRPO training dataHow I'd contribute
Happy to split into focused PRs if you're interested:
Each is independent and reviewable on its own. Let me know what (if any) of this would be useful upstream.
Fork: https://github.com/m0at/hermes-agent