Problem Statement
In multi-model agent setups, the executing model is currently responsible for deciding whether to delegate (spawn a subagent) or handle a task itself. This creates an inherent architectural bias: the model that would execute the task is also the one deciding whether to hand it off.
This is analogous to asking a contractor to honestly assess whether they should subcontract — there is a structural incentive to self-execute. In practice, this results in 50-80% missed delegation even with explicit system prompt instructions, because:
- Helpfulness bias (RLHF): Models are trained to be maximally helpful, meaning they instinctively start working rather than pausing to route
- Single-pass processing: There is no separate "routing phase" before the "execution phase" — both happen simultaneously
- Instruction degradation: In long system prompts (20-30K tokens), procedural routing rules compete with personality, tools, safety, and memory instructions for attention
Proposed Solution: hooks.messagePreProcess
Add a configurable pre-processing hook that runs before the main model sees the user message. This hook classifies the message and injects routing metadata into the context.
Architecture
User message → Pre-process hook (classifier) → Classification injected → Main model
The classifier runs as a lightweight, fast call that determines routing before the main model engages. This cleanly separates the routing decision from the execution decision.
Implementation Approaches
The hook should support multiple classifier backends to give users flexibility:
Option A: External API Model (e.g., Haiku, GPT-4o-mini, Gemini Flash)
Best for: Nuanced classification, ambiguous messages, complex routing rules
hooks:
messagePreProcess:
enabled: true
classifier:
type: "api"
model: "anthropic/claude-haiku-4-5" # or openai/gpt-4o-mini, google/gemini-2.0-flash
prompt: |
Classify this user message for routing. Reply with EXACTLY one token:
- DIRECT: trivial, quick lookup, status check, greeting, conversational follow-up
- SPAWN_OPUS: research, analysis, investigation, troubleshooting, building, code review, >500 words output
- SPAWN_SONNET: batch operations, medium automation, isolated tasks
Message: {{message}}
action: "inject" # Prepend classification to system context for main model
# Alternative actions:
# action: "route" # Automatically route to different model/session based on classification
# action: "metadata" # Add as metadata accessible via {{preprocess.result}}
- Latency: ~150-300ms
- Cost: ~$0.00005/message (Haiku), ~$0.00003/message (GPT-4o-mini)
- Accuracy: High — model understands nuance and context
Option B: Local LLM (via llama.cpp, Ollama, LM Studio)
Best for: Privacy-conscious users, offline operation, zero marginal cost
hooks:
messagePreProcess:
enabled: true
classifier:
type: "local"
endpoint: "http://localhost:11434/api/generate" # Ollama
# Or: endpoint: "http://localhost:8080/completion" # llama.cpp server
# Or: endpoint: "http://localhost:1234/v1/chat/completions" # LM Studio
model: "llama3.2:1b" # Small, fast model for classification
prompt: |
Classify: DIRECT, SPAWN_OPUS, or SPAWN_SONNET
Message: {{message}}
action: "inject"
- Latency: ~50-200ms (depends on model size and hardware)
- Cost: $0 (runs locally)
- Accuracy: Good for clear-cut cases, may need fine-tuning for edge cases
Option C: Deterministic Rules Engine (Regex/Keyword Matching)
Best for: Predictable routing, zero latency, zero cost, no model dependency
hooks:
messagePreProcess:
enabled: true
classifier:
type: "rules"
rules:
- match: "\\b(analyze|research|investigate|troubleshoot|review|audit|build|implement)\\b"
qualifiers: "\\b(comprehensive|detailed|thorough|deep dive|in-depth)\\b"
result: "SPAWN_OPUS"
- match: "\\b(batch|bulk|process all|migrate)\\b"
result: "SPAWN_SONNET"
- default: "DIRECT"
action: "inject"
- Latency: <1ms
- Cost: $0
- Accuracy: High for well-defined patterns, misses nuanced requests
Option D: Hybrid (Rules + Model Fallback)
hooks:
messagePreProcess:
enabled: true
classifier:
type: "hybrid"
primary:
type: "rules"
rules: [...] # Fast deterministic check first
fallback:
type: "api"
model: "anthropic/claude-haiku-4-5"
condition: "primary.result == UNCERTAIN"
action: "inject"
Action Types
The hook result should support multiple action modes:
| Action |
Behavior |
inject |
Prepend classification to the system prompt or user message |
route |
Automatically route to a different model or spawn a subagent |
metadata |
Store as metadata accessible via template variables |
block |
Reject the message (useful for content filtering) |
The route action would be particularly powerful:
hooks:
messagePreProcess:
classifier: { ... }
action: "route"
routing:
DIRECT: { model: "anthropic/claude-sonnet-4-5" } # Default model
SPAWN_OPUS: { spawn: true, model: "anthropic/claude-opus-4-6", label: "Complex Task" }
SPAWN_SONNET: { spawn: true, model: "anthropic/claude-sonnet-4-5", label: "Background Task" }
Real-World Use Case
My setup: Main session runs Sonnet 4.5 (fast, cheap) for conversation. Complex tasks should be spawned to Opus 4.6 subagents for higher quality.
Current state: Despite explicit routing instructions in the system prompt (~500 words of routing rules), the main model handles complex tasks inline ~50-80% of the time. Two documented failures in one week where research/investigation tasks matching all spawn criteria were executed by Sonnet instead of being delegated to Opus.
With pre-process hook: A Haiku classifier (or local model, or regex rules) would catch these before Sonnet even sees the message, achieving ~95-99% routing accuracy at negligible cost.
Benefits
- Architectural separation: The model that routes is not the model that executes — eliminates self-serving bias
- Reliability: Pre-processing happens deterministically before the main model, not as a skippable instruction
- Cost efficiency: Ensures expensive models are used only when needed; cheap models handle trivial requests
- Flexibility: Users choose their classifier (API model, local LLM, regex rules, or hybrid)
- Composability: Can be combined with existing hooks (session-memory, etc.)
- Privacy option: Local LLM classifier keeps all routing decisions on-device
Trade-offs
- Latency: +50-300ms per message (depending on classifier type; rules engine is <1ms)
- Cost: $0 (rules/local) to ~$0.00005/message (Haiku) — negligible for most users
- Complexity: Additional configuration surface, but opt-in and well-structured
- False positives: Some messages may be misrouted — mitigated by allowing the main model to override with explanation
Prior Art
- LangChain/LangGraph: Router chains that classify and route before execution
- Semantic Router: Embedding-based routing for LLM pipelines
- OpenAI Assistants: Tool-use routing happens in a separate classification step
- AWS Bedrock: Agent routing with classifier-based delegation
Summary
Adding hooks.messagePreProcess would solve a real architectural limitation where the executing model cannot reliably self-assess routing decisions. It is a general-purpose feature that enables request routing, content filtering, context enrichment, and more — while being simple to configure and opt-in.
Happy to discuss implementation details or provide more data on the routing failure patterns that motivated this request.
Problem Statement
In multi-model agent setups, the executing model is currently responsible for deciding whether to delegate (spawn a subagent) or handle a task itself. This creates an inherent architectural bias: the model that would execute the task is also the one deciding whether to hand it off.
This is analogous to asking a contractor to honestly assess whether they should subcontract — there is a structural incentive to self-execute. In practice, this results in 50-80% missed delegation even with explicit system prompt instructions, because:
Proposed Solution:
hooks.messagePreProcessAdd a configurable pre-processing hook that runs before the main model sees the user message. This hook classifies the message and injects routing metadata into the context.
Architecture
The classifier runs as a lightweight, fast call that determines routing before the main model engages. This cleanly separates the routing decision from the execution decision.
Implementation Approaches
The hook should support multiple classifier backends to give users flexibility:
Option A: External API Model (e.g., Haiku, GPT-4o-mini, Gemini Flash)
Best for: Nuanced classification, ambiguous messages, complex routing rules
Option B: Local LLM (via llama.cpp, Ollama, LM Studio)
Best for: Privacy-conscious users, offline operation, zero marginal cost
Option C: Deterministic Rules Engine (Regex/Keyword Matching)
Best for: Predictable routing, zero latency, zero cost, no model dependency
Option D: Hybrid (Rules + Model Fallback)
Action Types
The hook result should support multiple action modes:
injectroutemetadatablockThe
routeaction would be particularly powerful:Real-World Use Case
My setup: Main session runs Sonnet 4.5 (fast, cheap) for conversation. Complex tasks should be spawned to Opus 4.6 subagents for higher quality.
Current state: Despite explicit routing instructions in the system prompt (~500 words of routing rules), the main model handles complex tasks inline ~50-80% of the time. Two documented failures in one week where research/investigation tasks matching all spawn criteria were executed by Sonnet instead of being delegated to Opus.
With pre-process hook: A Haiku classifier (or local model, or regex rules) would catch these before Sonnet even sees the message, achieving ~95-99% routing accuracy at negligible cost.
Benefits
Trade-offs
Prior Art
Summary
Adding
hooks.messagePreProcesswould solve a real architectural limitation where the executing model cannot reliably self-assess routing decisions. It is a general-purpose feature that enables request routing, content filtering, context enrichment, and more — while being simple to configure and opt-in.Happy to discuss implementation details or provide more data on the routing failure patterns that motivated this request.