Feature Request: Message Pre-Processing Hook for Request Routing

## Problem Statement

In multi-model agent setups, the **executing model** is currently responsible for deciding whether to delegate (spawn a subagent) or handle a task itself. This creates an inherent architectural bias: the model that would execute the task is also the one deciding whether to hand it off.

This is analogous to asking a contractor to honestly assess whether they should subcontract — there is a structural incentive to self-execute. In practice, this results in **50-80% missed delegation** even with explicit system prompt instructions, because:

1. **Helpfulness bias (RLHF):** Models are trained to be maximally helpful, meaning they instinctively start working rather than pausing to route
2. **Single-pass processing:** There is no separate "routing phase" before the "execution phase" — both happen simultaneously
3. **Instruction degradation:** In long system prompts (20-30K tokens), procedural routing rules compete with personality, tools, safety, and memory instructions for attention

## Proposed Solution: `hooks.messagePreProcess`

Add a configurable pre-processing hook that runs **before** the main model sees the user message. This hook classifies the message and injects routing metadata into the context.

### Architecture

```
User message → Pre-process hook (classifier) → Classification injected → Main model
```

The classifier runs as a lightweight, fast call that determines routing before the main model engages. This cleanly separates the **routing decision** from the **execution decision**.

## Implementation Approaches

The hook should support multiple classifier backends to give users flexibility:

### Option A: External API Model (e.g., Haiku, GPT-4o-mini, Gemini Flash)

Best for: Nuanced classification, ambiguous messages, complex routing rules

```yaml
hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "api"
      model: "anthropic/claude-haiku-4-5"  # or openai/gpt-4o-mini, google/gemini-2.0-flash
      prompt: |
        Classify this user message for routing. Reply with EXACTLY one token:
        - DIRECT: trivial, quick lookup, status check, greeting, conversational follow-up
        - SPAWN_OPUS: research, analysis, investigation, troubleshooting, building, code review, >500 words output
        - SPAWN_SONNET: batch operations, medium automation, isolated tasks
        
        Message: {{message}}
    action: "inject"  # Prepend classification to system context for main model
    # Alternative actions:
    # action: "route"   # Automatically route to different model/session based on classification
    # action: "metadata" # Add as metadata accessible via {{preprocess.result}}
```

- **Latency:** ~150-300ms
- **Cost:** ~$0.00005/message (Haiku), ~$0.00003/message (GPT-4o-mini)
- **Accuracy:** High — model understands nuance and context

### Option B: Local LLM (via llama.cpp, Ollama, LM Studio)

Best for: Privacy-conscious users, offline operation, zero marginal cost

```yaml
hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "local"
      endpoint: "http://localhost:11434/api/generate"  # Ollama
      # Or: endpoint: "http://localhost:8080/completion"  # llama.cpp server
      # Or: endpoint: "http://localhost:1234/v1/chat/completions"  # LM Studio
      model: "llama3.2:1b"  # Small, fast model for classification
      prompt: |
        Classify: DIRECT, SPAWN_OPUS, or SPAWN_SONNET
        Message: {{message}}
    action: "inject"
```

- **Latency:** ~50-200ms (depends on model size and hardware)
- **Cost:** $0 (runs locally)
- **Accuracy:** Good for clear-cut cases, may need fine-tuning for edge cases

### Option C: Deterministic Rules Engine (Regex/Keyword Matching)

Best for: Predictable routing, zero latency, zero cost, no model dependency

```yaml
hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "rules"
      rules:
        - match: "\\b(analyze|research|investigate|troubleshoot|review|audit|build|implement)\\b"
          qualifiers: "\\b(comprehensive|detailed|thorough|deep dive|in-depth)\\b"
          result: "SPAWN_OPUS"
        - match: "\\b(batch|bulk|process all|migrate)\\b"
          result: "SPAWN_SONNET"
        - default: "DIRECT"
    action: "inject"
```

- **Latency:** <1ms
- **Cost:** $0
- **Accuracy:** High for well-defined patterns, misses nuanced requests

### Option D: Hybrid (Rules + Model Fallback)

```yaml
hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "hybrid"
      primary:
        type: "rules"
        rules: [...]  # Fast deterministic check first
      fallback:
        type: "api"
        model: "anthropic/claude-haiku-4-5"
        condition: "primary.result == UNCERTAIN"
    action: "inject"
```

## Action Types

The hook result should support multiple action modes:

| Action | Behavior |
|--------|----------|
| `inject` | Prepend classification to the system prompt or user message |
| `route` | Automatically route to a different model or spawn a subagent |
| `metadata` | Store as metadata accessible via template variables |
| `block` | Reject the message (useful for content filtering) |

The `route` action would be particularly powerful:

```yaml
hooks:
  messagePreProcess:
    classifier: { ... }
    action: "route"
    routing:
      DIRECT: { model: "anthropic/claude-sonnet-4-5" }  # Default model
      SPAWN_OPUS: { spawn: true, model: "anthropic/claude-opus-4-6", label: "Complex Task" }
      SPAWN_SONNET: { spawn: true, model: "anthropic/claude-sonnet-4-5", label: "Background Task" }
```

## Real-World Use Case

My setup: Main session runs Sonnet 4.5 (fast, cheap) for conversation. Complex tasks should be spawned to Opus 4.6 subagents for higher quality.

**Current state:** Despite explicit routing instructions in the system prompt (~500 words of routing rules), the main model handles complex tasks inline ~50-80% of the time. Two documented failures in one week where research/investigation tasks matching all spawn criteria were executed by Sonnet instead of being delegated to Opus.

**With pre-process hook:** A Haiku classifier (or local model, or regex rules) would catch these before Sonnet even sees the message, achieving ~95-99% routing accuracy at negligible cost.

## Benefits

1. **Architectural separation:** The model that routes is not the model that executes — eliminates self-serving bias
2. **Reliability:** Pre-processing happens deterministically before the main model, not as a skippable instruction
3. **Cost efficiency:** Ensures expensive models are used only when needed; cheap models handle trivial requests
4. **Flexibility:** Users choose their classifier (API model, local LLM, regex rules, or hybrid)
5. **Composability:** Can be combined with existing hooks (session-memory, etc.)
6. **Privacy option:** Local LLM classifier keeps all routing decisions on-device

## Trade-offs

- **Latency:** +50-300ms per message (depending on classifier type; rules engine is <1ms)
- **Cost:** $0 (rules/local) to ~$0.00005/message (Haiku) — negligible for most users
- **Complexity:** Additional configuration surface, but opt-in and well-structured
- **False positives:** Some messages may be misrouted — mitigated by allowing the main model to override with explanation

## Prior Art

- **LangChain/LangGraph:** Router chains that classify and route before execution
- **Semantic Router:** Embedding-based routing for LLM pipelines
- **OpenAI Assistants:** Tool-use routing happens in a separate classification step
- **AWS Bedrock:** Agent routing with classifier-based delegation

## Summary

Adding `hooks.messagePreProcess` would solve a real architectural limitation where the executing model cannot reliably self-assess routing decisions. It is a general-purpose feature that enables request routing, content filtering, context enrichment, and more — while being simple to configure and opt-in.

Happy to discuss implementation details or provide more data on the routing failure patterns that motivated this request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Message Pre-Processing Hook for Request Routing #14150

Problem Statement

Proposed Solution: `hooks.messagePreProcess`

Architecture

Implementation Approaches

Option A: External API Model (e.g., Haiku, GPT-4o-mini, Gemini Flash)

Option B: Local LLM (via llama.cpp, Ollama, LM Studio)

Option C: Deterministic Rules Engine (Regex/Keyword Matching)

Option D: Hybrid (Rules + Model Fallback)

Action Types

Real-World Use Case

Benefits

Trade-offs

Prior Art

Summary

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Action	Behavior
`inject`	Prepend classification to the system prompt or user message
`route`	Automatically route to a different model or spawn a subagent
`metadata`	Store as metadata accessible via template variables
`block`	Reject the message (useful for content filtering)

Uh oh!

Feature Request: Message Pre-Processing Hook for Request Routing #14150

Description

Problem Statement

Proposed Solution: hooks.messagePreProcess

Architecture

Implementation Approaches

Option A: External API Model (e.g., Haiku, GPT-4o-mini, Gemini Flash)

Option B: Local LLM (via llama.cpp, Ollama, LM Studio)

Option C: Deterministic Rules Engine (Regex/Keyword Matching)

Option D: Hybrid (Rules + Model Fallback)

Action Types

Real-World Use Case

Benefits

Trade-offs

Prior Art

Summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Proposed Solution: `hooks.messagePreProcess`