[Feature Proposal] 4 Design Patterns from Tencent Marvis: Pre-built Agent Profiles, Cloud-Local Routing, Desktop GUI Agent, Tiered Security

## Summary

Tencent launched [Marvis](https://marvis.qq.com) on May 20, 2026 — an **OS-level AI assistant** that embeds between the user and the operating system. After studying its architecture in depth, I identified 4 design patterns that could significantly improve Hermes without requiring a rewrite. This issue maps each pattern to Hermes' existing architecture with concrete implementation suggestions.

> **Note for international team**: Marvis is a China-market product (QQ login required, Chinese UI). I've extracted all the relevant architecture details below so you don't need to register or download it. The value is in the *design patterns*, not the product itself.

---

## What is Marvis? (Quick Context)

Marvis is not a chatbot. It's an **AI middleware layer** sitting between the user and the OS:

```
Traditional AI:    User → Chat Window → Text Output
Marvis:            User → Natural Language → OS APIs → Files / Settings / Apps / Hardware
```

**Key stats** (for context, not to copy):
- Built by Tencent's App Store team (PC ecosystem veterans)
- Windows + Mac + Android (iOS coming), cross-device sync
- Free tier: **10 million tokens/day** (Tencent eats the cloud cost)
- Has strategic partnership with Microsoft for Windows API access
- Launched May 20, 2026 — less than a week ago, already gaining traction

---

## The 4 Design Patterns Hermes Should Adopt

### 🥇 Pattern 1: Pre-Configured Specialized Sub-Agents ("1+5 Architecture")

#### What Marvis does

Ships with **1 PM Agent (orchestrator) + 5 specialist agents** pre-configured out of the box:

```
User Natural Language
       ↓
┌─────────────────┐
│  PM Agent        │  Understands intent, decomposes tasks, parallel dispatches
│  (Orchestrator)  │  Powered by Hunyuan + DeepSeek V4 (cloud)
└──┬──┬──┬──┬──┬──┘
   │  │  │  │  │
   ↓  ↓  ↓  ↓  ↓
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ File │ │System│ │ App  │ │Browser│ │Search│
│Agent │ │Agent │ │Agent │ │Agent  │ │Agent │
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘
```

Each specialist has a fixed, narrow scope:

| Agent | Scope | Implementation |
|-------|-------|---------------|
| **File Agent** | File search/read/write/convert, image content search, OCR | Local file index + semantic search |
| **Computer Agent** | System settings, hardware diagnostics, cleanup | **Windows API direct calls** (not simulated clicks) |
| **App Agent** | GUI control of desktop + Android apps | Visual recognition + simulated input |
| **Browser Agent** | Web interaction, data scraping, form filling | Browser takeover + DOM parsing |
| **Search Agent** | Web search + information aggregation | Search engine API calls |

The PM Agent uses a **structured task dispatch protocol** — not just forwarding the user's raw text, but packaging it with context, history, dependencies, and expected output schema.

#### Hermes current state

- ✅ Has `delegate_task` for spawning sub-agents
- ✅ Has `toolsets` for scoping sub-agent capabilities
- ✅ Has #9459 tracking "agent profiles for delegate_task"
- ❌ No pre-configured specialist agents — every delegation requires manual toolset/prompt specification
- ❌ Sub-agents are generalists by default, not specialists by design

#### Concrete proposal

**Step 1 — Define agent profiles in config:**

```yaml
# ~/.hermes/agent_profiles.yaml
profiles:
  file-agent:
    description: "File search, read, write, convert, OCR"
    toolsets: [terminal, file, vision]
    system_prompt: |
      You are a file specialist. Your job is to locate, read, and process files.
      - Search by content, not just filename
      - For images: use vision tools to describe content
      - Always return absolute file paths in results
    preferred_model: "deepseek-v4-flash"  # cheaper for simple file ops
    
  system-agent:
    description: "System diagnostics, settings, cleanup"
    toolsets: [terminal]
    system_prompt: |
      You are a system operations specialist.
      - Diagnose system issues (disk, memory, network, processes)
      - NEVER run destructive commands (rm -rf, format, dd) without explicit confirmation
      - Prefer read-only diagnostics first
    risk_level: medium
    
  browser-agent:
    description: "Web interaction, scraping, form automation"
    toolsets: [browser]
    system_prompt: |
      You are a web interaction specialist.
      - Navigate, extract, fill forms, click buttons
      - Always return the URL you're on and what you found
    
  search-agent:
    description: "Web search and information aggregation"
    toolsets: [web]
    system_prompt: |
      You are a search and research specialist.
      - Search broadly first, then drill down
      - Always cite sources with URLs
      - Synthesize findings into a structured summary
```

**Step 2 — Extend `delegate_task` to accept profile names:**

```python
# Current:
delegate_task(goal="find all invoices", toolsets=["terminal", "file"])

# Proposed:
delegate_task(goal="find all invoices", profile="file-agent")
# Resolves to: toolsets + system_prompt + model from agent_profiles.yaml
```

**Step 3 — PM Agent auto-routing (future, optional):**

The main Agent could auto-detect task type and route to the right specialist:

```python
# In run_agent.py, before delegation:
task_type = classify_task(user_message)  # "file_ops" | "system_ops" | "web_search" | ...
if task_type in agent_profiles:
    delegate_task(goal=user_message, profile=task_type)
```

This is the lowest-hanging fruit — it leverages existing infrastructure and is tracked in #9459.

---

### 🥈 Pattern 2: Cloud-Local Auto Routing

#### What Marvis does

Marvis doesn't make users manually choose between cloud and local models. It auto-routes based on task characteristics:

```
User Input → PM Agent analyzes:
  ├─ "Organize my invoices"        → Cloud intent understanding + Local file search
  ├─ "Review this contract's risk" → Pure local, no cloud upload (privacy)
  └─ "What's the weather today"    → Cloud search
```

The routing logic:

| Factor | Routes to Cloud | Routes to Local |
|--------|----------------|-----------------|
| Task complexity | Multi-step planning, ambiguous intent | Simple, well-defined operations |
| Data sensitivity | Public information | Personal files, contracts, financial data |
| Connectivity requirement | Web search, API calls | File ops, system settings |
| Cost sensitivity | Complex reasoning needs big models | Simple tasks waste cloud tokens |

**The key insight**: Marvis does heavy pre-processing locally (file indexing, image OCR, text extraction) BEFORE sending anything to the cloud. This means cloud models get structured, pre-digested input instead of raw data — cutting token usage by 60-80%.

#### Hermes current state

- ✅ Supports multiple model providers
- ✅ `/model` command for manual switching
- ❌ One model per session — no runtime routing
- ❌ All tool output goes to the cloud model regardless of sensitivity
- ❌ No local pre-processing pipeline

#### Concrete proposal

**Step 1 — Task classifier (lightweight, rule-based first):**

```yaml
# ~/.hermes/routing.yaml
routing:
  enabled: true
  default_model: "deepseek-v4-pro"       # cloud, for general use
  
  local_models:
    primary: "qwen2.5-7b-local"          # via ollama or similar
    fallback: "deepseek-v4-flash"        # cheap cloud if local unavailable
  
  rules:
    - name: "privacy-sensitive"
      patterns:
        keywords: ["contract", "financial", "passport", "password", "confidential", "NDA"]
        file_paths: ["~/Documents/finance/", "~/Desktop/tax/"]
      route_to: local
      reason: "contains sensitive data"
      
    - name: "simple-file-ops"
      patterns:
        intents: ["read_file", "list_directory", "find_file", "file_stats"]
      route_to: local
      reason: "simple operation, no cloud needed"
      
    - name: "web-search"
      patterns:
        intents: ["web_search", "browser_navigate"]
      route_to: cloud
      reason: "requires internet access"
      
    - name: "complex-reasoning"
      patterns:
        keywords: ["explain", "analyze", "compare", "refactor", "design"]
        min_complexity: 0.7  # heuristic score
      route_to: cloud
      reason: "needs large model reasoning"
```

**Step 2 — Local pre-processing pipeline:**

Before sending context to the cloud model, run a lightweight local model to:
- Extract key information from tool outputs (summarize large file contents)
- Classify task sensitivity
- Structure raw data (JSON-ify free text)

This mirrors what Hermes already does with context compression (#31684), but using a local model instead of just truncation.

**Step 3 — Integration point in `run_agent.py`:**

```python
# Pseudo-code for where routing would hook in:
def select_model(self, user_message: str, context: dict) -> str:
    if not self.routing_enabled:
        return self.default_model
    
    task_type = self.classify_task(user_message)
    sensitivity = self.assess_sensitivity(user_message, context)
    
    if sensitivity == "high" or task_type in ["simple_file_ops"]:
        return self.route_to_local()
    elif task_type in ["web_search", "complex_reasoning"]:
        return self.route_to_cloud()
    
    return self.default_model
```

---

### 🥉 Pattern 3: Desktop/GUI Agent Capabilities

#### What Marvis does

Marvis can see and control desktop applications — not just terminal. Examples:
- "Open WeChat, find the last message from Mom, tell her I'll be late"
- "Open my stock trading app, check AAPL price, screenshot the chart"
- "Find the Windows setting that disables lock screen ads" (uses Windows API, not clicking around blindly)

Implementation: **Windows API direct calls** (via Microsoft partnership) + **GUI visual recognition** for apps without APIs.

#### Hermes current state

- ✅ Excellent terminal/CLI tools
- ✅ Browser tools for web interaction
- ✅ `vision_analyze` for image understanding
- ❌ Cannot interact with desktop GUI applications
- ❌ Cannot see what's on the user's screen
- ✅ #29379 tracking Canvas Mode (GUI direction already under discussion)

#### Concrete proposal (phased)

**Phase 1 — Screen Capture + Click/Type (low effort, WSL-compatible):**

Add a `desktop` toolset with minimal primitives:

```python
# New tools (Linux via xdotool/ydotool, Windows via existing Win API)
@register_tool(toolset="desktop", risk_level="medium")
def desktop_screenshot(region: str = "full") -> str:
    """Capture screen and return path for vision analysis. 
    region: 'full' | 'active_window' | 'selection'"""
    # Linux: import -window root /tmp/screenshot.png
    # Windows: existing win32 API
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_click(x: int, y: int, button: str = "left") -> str:
    """Click at screen coordinates."""
    # Linux: xdotool mousemove X Y click 1
    # Windows: SetCursorPos + mouse_event
    pass

@register_tool(toolset="desktop", risk_level="medium")
def desktop_type(text: str) -> str:
    """Type text at current focus."""
    # Linux: xdotool type "..."
    pass

@register_tool(toolset="desktop", risk_level="low")
def desktop_list_windows() -> list:
    """List all open windows with titles and positions."""
    # Linux: wmctrl -l
    # Windows: EnumWindows
    pass
```

This alone would enable use cases like:
- "Take a screenshot of this error dialog and explain what's wrong"
- "Fill out this form in my browser" (Agent sees the screen, identifies fields, types)
- "Close all Slack windows"

**Phase 2 — OS API Integration (higher effort, platform-specific):**

```python
# Dedicated platform adapters (similar to terminal backends)
@register_tool(toolset="desktop")
def system_setting_get(key: str) -> str:
    """Read a system setting by name."""
    # Windows: registry or Settings app API
    # Linux: gsettings / dconf
    # macOS: defaults read
    pass

@register_tool(toolset="desktop")
def file_search_semantic(query: str, path: str = "~") -> list:
    """Find files by content description, not just filename.
    'the PDF invoice from last month' → finds the right file"""
    # Uses local embedding model to index files
    pass
```

**WSL consideration**: Since many Hermes users (especially developers on Windows) use WSL, the desktop tools should detect the environment and proxy through the Windows host when running in WSL. The existing `wsl-windows-interop` patterns in Hermes could be extended.

---

### 4️⃣ Pattern 4: Tiered Security Classification

#### What Marvis does

Every operation is classified into 3 risk tiers with automatic enforcement:

| Tier | Examples | Handling |
|------|----------|----------|
| 🟢 **Low** | Read files, search, display info | AI auto-executes |
| 🟡 **Medium** | Delete files, modify settings, install software | AI proposes plan → User must confirm → Execute |
| 🔴 **High** | Payments, transfers, auth changes, `sudo rm -rf /` | **AI forbidden from executing.** Must be done manually. |

This is NOT just a prompt-level suggestion — it's enforced at the execution layer. Medium-risk operations trigger a "hard check" (L2 hard inquiry) that cannot be bypassed by the AI.

#### Hermes current state

- ✅ Terminal confirmation dialogs for dangerous commands (implicit)
- ✅ `requires_approval` parameter in some tool calls
- ❌ No systematic risk classification across all tools
- ❌ Risk level is not visible to the Agent's reasoning
- ❌ Some dangerous operations can slip through by phrasing

#### Concrete proposal

**Step 1 — Add `risk_level` to tool registration:**

```python
# In tools/registry.py or tool definitions
@register_tool(risk_level="low")
def read_file(path: str, ...): ...

@register_tool(risk_level="medium")
def delete_file(path: str): ...

@register_tool(risk_level="high", human_only=True)
def execute_payment(amount: float, ...): ...
```

**Step 2 — Inject risk rules into system prompt:**

```python
# Auto-generated from tool registry, injected into every Agent session
RISK_RULES = """
## Operation Safety Tiers

When using tools, always respect these risk levels:

🟢 LOW risk (auto-execute): read_file, search, grep, list, stat, cat
   → Execute immediately, no confirmation needed

🟡 MEDIUM risk (confirm first): delete, move, write, install, systemctl, chmod
   → Before executing: explain what you're about to do and wait for confirmation
   → NEVER batch medium-risk operations without individual confirmation

🔴 HIGH risk (FORBIDDEN): payments, sudo destructive, auth changes, rm -rf /
   → DO NOT execute these under any circumstances
   → Tell the user to perform these operations themselves
"""
```

**Step 3 — Enforce at execution layer (not just prompt):**

```python
# In model_tools.py handle_function_call()
def handle_function_call(tool_name: str, args: dict, task_id: str):
    risk = TOOL_RISK_LEVELS.get(tool_name, "medium")  # default medium
    
    if risk == "high" and TOOL_HUMAN_ONLY.get(tool_name, False):
        return {"error": "This operation requires human execution. Refusing."}
    
    if risk == "medium" and not confirmation_granted(tool_name, args, task_id):
        return {"requires_confirmation": True, "plan": f"About to run {tool_name} with {args}"}
    
    return execute_tool(tool_name, args)
```

---

## What NOT to Copy from Marvis

These are intentional anti-patterns that Hermes should avoid:

| Marvis weakness | Why Hermes should NOT adopt it |
|-----------------|-------------------------------|
| **Closed ecosystem** — cross-device relies on proprietary Tencent app engine | Hermes' MCP + A2A + multi-platform gateway is the right open approach |
| **Non-extensible** — users cannot create custom Skills/workflows | Hermes' Skill system is a core differentiator; keep it flexible |
| **Model lock-in** — local = Qwen only, cloud = Hunyuan/DS only | Hermes' provider-agnostic design is a major strength |
| **No long-running autonomy** — no cron, no idle loops, no overnight tasks | Hermes' cron system + background processes are already superior |
| **Search quality issues** — early users report poor retrieval accuracy | Hermes' multi-search backends (anysearch, browser, etc.) provide better flexibility |

---

## Related Existing Issues

- **#29379** — Native Canvas Mode (GUI/visual direction already being discussed)
- **#9459** — Agent profiles for delegate_task (pre-configured roles)
- **#514** — A2A Protocol Support (Agent-to-Agent standard, complementary)
- **#11922** — Multi-agent communication & per-channel persona
- **#31684** — compress_context (local pre-processing overlaps with Pattern 2)
- **#25545** — Skill Orchestration / Workflow Composition (pre-built workflows)

---

## Why Now

Marvis launched 5 days ago. It's the first consumer product that proves:

1. **Multi-Agent architecture is not academic** — 6 agents working in parallel, on consumer hardware, for non-technical users
2. **OS-level AI middleware is viable** — users are willing to let AI control their desktop if safety guarantees are clear
3. **Cloud-local routing saves real money** — Tencent's free 10M token/day is only sustainable because >60% of processing stays local
4. **GUI Agent is the natural extension of Code Agent** — the jump from "AI that writes code" to "AI that operates your computer" is smaller than it seems

Hermes already has better infrastructure than Marvis in many ways (MCP, multi-model, cron, Skills, platforms gateway). What's missing is:

1. **Default configurations** that make the infrastructure accessible (Pattern 1)
2. **Intelligence layer** that routes between capabilities (Pattern 2)
3. **Desktop reach** beyond the terminal (Pattern 3)
4. **Safety guardrails** that make desktop control trustworthy (Pattern 4)

None of these require a rewrite — they're incremental additions to existing architecture.

---

## Implementation Priority (my recommendation)

| Priority | Pattern | Effort | Impact | Dependency |
|----------|---------|--------|--------|------------|
| 🥇 P0 | Pattern 4: Security Tiers | ~200 LOC | **Safety baseline** for all other patterns | None |
| 🥈 P1 | Pattern 1: Pre-built Agent Profiles | ~500 LOC | Unlocks delegation UX | #9459 |
| 🥉 P2 | Pattern 2: Cloud-Local Routing | ~800 LOC | Cost optimization | Pattern 1 for routing targets |
| P3 | Pattern 3: Desktop GUI Agent | ~2000+ LOC | Major differentiation | Pattern 4 for safety |

Security first (Pattern 4), then pre-built agents (Pattern 1), then smart routing (Pattern 2), then desktop reach (Pattern 3). Each builds on the previous.

---

*Researched and drafted by a Hermes user who also runs OpenClaw. Happy to help with testing or refining any of these ideas.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Proposal] 4 Design Patterns from Tencent Marvis: Pre-built Agent Profiles, Cloud-Local Routing, Desktop GUI Agent, Tiered Security #31793

Summary

What is Marvis? (Quick Context)

The 4 Design Patterns Hermes Should Adopt

🥇 Pattern 1: Pre-Configured Specialized Sub-Agents ("1+5 Architecture")

What Marvis does

Hermes current state

Concrete proposal

🥈 Pattern 2: Cloud-Local Auto Routing

What Marvis does

Hermes current state

Concrete proposal

🥉 Pattern 3: Desktop/GUI Agent Capabilities

What Marvis does

Hermes current state

Concrete proposal (phased)

4️⃣ Pattern 4: Tiered Security Classification

What Marvis does

Hermes current state

Concrete proposal

What NOT to Copy from Marvis

Related Existing Issues

Why Now

Implementation Priority (my recommendation)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent	Scope	Implementation
File Agent	File search/read/write/convert, image content search, OCR	Local file index + semantic search
Computer Agent	System settings, hardware diagnostics, cleanup	Windows API direct calls (not simulated clicks)
App Agent	GUI control of desktop + Android apps	Visual recognition + simulated input
Browser Agent	Web interaction, data scraping, form filling	Browser takeover + DOM parsing
Search Agent	Web search + information aggregation	Search engine API calls

Factor	Routes to Cloud	Routes to Local
Task complexity	Multi-step planning, ambiguous intent	Simple, well-defined operations
Data sensitivity	Public information	Personal files, contracts, financial data
Connectivity requirement	Web search, API calls	File ops, system settings
Cost sensitivity	Complex reasoning needs big models	Simple tasks waste cloud tokens

Tier	Examples	Handling
🟢 Low	Read files, search, display info	AI auto-executes
🟡 Medium	Delete files, modify settings, install software	AI proposes plan → User must confirm → Execute
🔴 High	Payments, transfers, auth changes, `sudo rm -rf /`	AI forbidden from executing. Must be done manually.

Marvis weakness	Why Hermes should NOT adopt it
Closed ecosystem — cross-device relies on proprietary Tencent app engine	Hermes' MCP + A2A + multi-platform gateway is the right open approach
Non-extensible — users cannot create custom Skills/workflows	Hermes' Skill system is a core differentiator; keep it flexible
Model lock-in — local = Qwen only, cloud = Hunyuan/DS only	Hermes' provider-agnostic design is a major strength
No long-running autonomy — no cron, no idle loops, no overnight tasks	Hermes' cron system + background processes are already superior
Search quality issues — early users report poor retrieval accuracy	Hermes' multi-search backends (anysearch, browser, etc.) provide better flexibility

Priority	Pattern	Effort	Impact	Dependency
🥇 P0	Pattern 4: Security Tiers	~200 LOC	Safety baseline for all other patterns	None
🥈 P1	Pattern 1: Pre-built Agent Profiles	~500 LOC	Unlocks delegation UX	#9459
🥉 P2	Pattern 2: Cloud-Local Routing	~800 LOC	Cost optimization	Pattern 1 for routing targets
P3	Pattern 3: Desktop GUI Agent	~2000+ LOC	Major differentiation	Pattern 4 for safety

[Feature Proposal] 4 Design Patterns from Tencent Marvis: Pre-built Agent Profiles, Cloud-Local Routing, Desktop GUI Agent, Tiered Security #31793

Description

Summary

What is Marvis? (Quick Context)

The 4 Design Patterns Hermes Should Adopt

🥇 Pattern 1: Pre-Configured Specialized Sub-Agents ("1+5 Architecture")

What Marvis does

Hermes current state

Concrete proposal

🥈 Pattern 2: Cloud-Local Auto Routing

What Marvis does

Hermes current state

Concrete proposal

🥉 Pattern 3: Desktop/GUI Agent Capabilities

What Marvis does

Hermes current state

Concrete proposal (phased)

4️⃣ Pattern 4: Tiered Security Classification

What Marvis does

Hermes current state

Concrete proposal

What NOT to Copy from Marvis

Related Existing Issues

Why Now

Implementation Priority (my recommendation)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions