Fork with local MLX inference, WebGPU browser inference, clipboard image paste, Rust prompt scanner

Hey — I've been maintaining a fork at [m0at/hermes-agent](https://github.com/m0at/hermes-agent) with several features I'd love to contribute back if there's interest. ~109 commits, +4,900 lines, all backward-compatible with upstream.

## What's in the fork

### 1. Local Qwen3.5-9B on Apple Silicon (`--provider local`)
- MLX-VLM server auto-starts on port 8800, serves OpenAI-compatible `/v1/chat/completions`
- Vision + text, 4-bit quantized, runs on unified GPU memory
- Server tied to hermes session (atexit + SIGTERM cleanup, no orphan processes)
- **Files:** `local_models/serve.py`, changes to `runtime_provider.py`

### 2. WebGPU Client-Side Browser Inference (`--provider webgpu`)
- Starlette bridge server (port 8801) connects Hermes CLI to WebLLM running in the user's browser
- Models load via WebGPU — zero server-side compute, nothing leaves the machine
- 5 quantized models (Qwen3 4B, Qwen2.5 3B, Llama 3.1 8B, Mistral 7B, SmolLM2 1.7B)
- **Files:** `web_client/bridge.py`, `web_client/index.html`

### 3. Clipboard Image Paste (Cmd+V / Ctrl+V)
- Paste screenshots into the chat for VLM analysis — shows `[Image #N]` widget, base64-encodes as OpenAI `image_url` content part
- Required running PyObjC's `NSPasteboard` in a **subprocess** because prompt_toolkit's asyncio loop starves CFRunLoop, causing `NSPasteboard` to silently return nil in-process
- Extraction chain: pngpaste → PyObjC subprocess → osascript fallback (macOS), xclip (Linux)
- **Files:** changes to `cli.py`, `run_agent.py`

### 4. Rust-Accelerated Prompt Injection Scanner
- PyO3 native module (`hermes_rs`) — compiled `RegexSet` for 10 threat patterns
- **17x faster** than Python regex on real context files
- Falls back to pure Python if not installed
- **Files:** `hermes_rs/` directory

### 5. CLI Improvements
- Dynamic terminal width (no more 200-char overflow)
- Think block styling (`<think>` → dim italic with `~ thinking ~` header)
- Color scheme picker (cyber/synthwave)
- `/provider` hot-swap mid-session
- `/copycode` — import Claude Code skills
- Context compression: fallback client chain, provider-aware token handling

### 6. Evaluation Testbed & RL Pipeline
- `testbed/` — REPL, eval runner, task definitions
- `batch_runner.py` + `trajectory_compressor.py` for GRPO training data

## How I'd contribute

Happy to split into focused PRs if you're interested:
1. Local MLX inference (smallest, self-contained)
2. Rust prompt scanner (additive, no core changes)
3. WebGPU bridge
4. Clipboard image paste
5. CLI improvements
6. Eval testbed + RL pipeline

Each is independent and reviewable on its own. Let me know what (if any) of this would be useful upstream.

Fork: https://github.com/m0at/hermes-agent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fork with local MLX inference, WebGPU browser inference, clipboard image paste, Rust prompt scanner #467

What's in the fork

1. Local Qwen3.5-9B on Apple Silicon (`--provider local`)

2. WebGPU Client-Side Browser Inference (`--provider webgpu`)

3. Clipboard Image Paste (Cmd+V / Ctrl+V)

4. Rust-Accelerated Prompt Injection Scanner

5. CLI Improvements

6. Evaluation Testbed & RL Pipeline

How I'd contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fork with local MLX inference, WebGPU browser inference, clipboard image paste, Rust prompt scanner #467

Description

What's in the fork

1. Local Qwen3.5-9B on Apple Silicon (--provider local)

2. WebGPU Client-Side Browser Inference (--provider webgpu)

3. Clipboard Image Paste (Cmd+V / Ctrl+V)

4. Rust-Accelerated Prompt Injection Scanner

5. CLI Improvements

6. Evaluation Testbed & RL Pipeline

How I'd contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Local Qwen3.5-9B on Apple Silicon (`--provider local`)

2. WebGPU Client-Side Browser Inference (`--provider webgpu`)