Skip to content

Feature Request: Message Pre-Processing Hook for Request Routing #14150

@rex05ai

Description

@rex05ai

Problem Statement

In multi-model agent setups, the executing model is currently responsible for deciding whether to delegate (spawn a subagent) or handle a task itself. This creates an inherent architectural bias: the model that would execute the task is also the one deciding whether to hand it off.

This is analogous to asking a contractor to honestly assess whether they should subcontract — there is a structural incentive to self-execute. In practice, this results in 50-80% missed delegation even with explicit system prompt instructions, because:

  1. Helpfulness bias (RLHF): Models are trained to be maximally helpful, meaning they instinctively start working rather than pausing to route
  2. Single-pass processing: There is no separate "routing phase" before the "execution phase" — both happen simultaneously
  3. Instruction degradation: In long system prompts (20-30K tokens), procedural routing rules compete with personality, tools, safety, and memory instructions for attention

Proposed Solution: hooks.messagePreProcess

Add a configurable pre-processing hook that runs before the main model sees the user message. This hook classifies the message and injects routing metadata into the context.

Architecture

User message → Pre-process hook (classifier) → Classification injected → Main model

The classifier runs as a lightweight, fast call that determines routing before the main model engages. This cleanly separates the routing decision from the execution decision.

Implementation Approaches

The hook should support multiple classifier backends to give users flexibility:

Option A: External API Model (e.g., Haiku, GPT-4o-mini, Gemini Flash)

Best for: Nuanced classification, ambiguous messages, complex routing rules

hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "api"
      model: "anthropic/claude-haiku-4-5"  # or openai/gpt-4o-mini, google/gemini-2.0-flash
      prompt: |
        Classify this user message for routing. Reply with EXACTLY one token:
        - DIRECT: trivial, quick lookup, status check, greeting, conversational follow-up
        - SPAWN_OPUS: research, analysis, investigation, troubleshooting, building, code review, >500 words output
        - SPAWN_SONNET: batch operations, medium automation, isolated tasks
        
        Message: {{message}}
    action: "inject"  # Prepend classification to system context for main model
    # Alternative actions:
    # action: "route"   # Automatically route to different model/session based on classification
    # action: "metadata" # Add as metadata accessible via {{preprocess.result}}
  • Latency: ~150-300ms
  • Cost: ~$0.00005/message (Haiku), ~$0.00003/message (GPT-4o-mini)
  • Accuracy: High — model understands nuance and context

Option B: Local LLM (via llama.cpp, Ollama, LM Studio)

Best for: Privacy-conscious users, offline operation, zero marginal cost

hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "local"
      endpoint: "http://localhost:11434/api/generate"  # Ollama
      # Or: endpoint: "http://localhost:8080/completion"  # llama.cpp server
      # Or: endpoint: "http://localhost:1234/v1/chat/completions"  # LM Studio
      model: "llama3.2:1b"  # Small, fast model for classification
      prompt: |
        Classify: DIRECT, SPAWN_OPUS, or SPAWN_SONNET
        Message: {{message}}
    action: "inject"
  • Latency: ~50-200ms (depends on model size and hardware)
  • Cost: $0 (runs locally)
  • Accuracy: Good for clear-cut cases, may need fine-tuning for edge cases

Option C: Deterministic Rules Engine (Regex/Keyword Matching)

Best for: Predictable routing, zero latency, zero cost, no model dependency

hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "rules"
      rules:
        - match: "\\b(analyze|research|investigate|troubleshoot|review|audit|build|implement)\\b"
          qualifiers: "\\b(comprehensive|detailed|thorough|deep dive|in-depth)\\b"
          result: "SPAWN_OPUS"
        - match: "\\b(batch|bulk|process all|migrate)\\b"
          result: "SPAWN_SONNET"
        - default: "DIRECT"
    action: "inject"
  • Latency: <1ms
  • Cost: $0
  • Accuracy: High for well-defined patterns, misses nuanced requests

Option D: Hybrid (Rules + Model Fallback)

hooks:
  messagePreProcess:
    enabled: true
    classifier:
      type: "hybrid"
      primary:
        type: "rules"
        rules: [...]  # Fast deterministic check first
      fallback:
        type: "api"
        model: "anthropic/claude-haiku-4-5"
        condition: "primary.result == UNCERTAIN"
    action: "inject"

Action Types

The hook result should support multiple action modes:

Action Behavior
inject Prepend classification to the system prompt or user message
route Automatically route to a different model or spawn a subagent
metadata Store as metadata accessible via template variables
block Reject the message (useful for content filtering)

The route action would be particularly powerful:

hooks:
  messagePreProcess:
    classifier: { ... }
    action: "route"
    routing:
      DIRECT: { model: "anthropic/claude-sonnet-4-5" }  # Default model
      SPAWN_OPUS: { spawn: true, model: "anthropic/claude-opus-4-6", label: "Complex Task" }
      SPAWN_SONNET: { spawn: true, model: "anthropic/claude-sonnet-4-5", label: "Background Task" }

Real-World Use Case

My setup: Main session runs Sonnet 4.5 (fast, cheap) for conversation. Complex tasks should be spawned to Opus 4.6 subagents for higher quality.

Current state: Despite explicit routing instructions in the system prompt (~500 words of routing rules), the main model handles complex tasks inline ~50-80% of the time. Two documented failures in one week where research/investigation tasks matching all spawn criteria were executed by Sonnet instead of being delegated to Opus.

With pre-process hook: A Haiku classifier (or local model, or regex rules) would catch these before Sonnet even sees the message, achieving ~95-99% routing accuracy at negligible cost.

Benefits

  1. Architectural separation: The model that routes is not the model that executes — eliminates self-serving bias
  2. Reliability: Pre-processing happens deterministically before the main model, not as a skippable instruction
  3. Cost efficiency: Ensures expensive models are used only when needed; cheap models handle trivial requests
  4. Flexibility: Users choose their classifier (API model, local LLM, regex rules, or hybrid)
  5. Composability: Can be combined with existing hooks (session-memory, etc.)
  6. Privacy option: Local LLM classifier keeps all routing decisions on-device

Trade-offs

  • Latency: +50-300ms per message (depending on classifier type; rules engine is <1ms)
  • Cost: $0 (rules/local) to ~$0.00005/message (Haiku) — negligible for most users
  • Complexity: Additional configuration surface, but opt-in and well-structured
  • False positives: Some messages may be misrouted — mitigated by allowing the main model to override with explanation

Prior Art

  • LangChain/LangGraph: Router chains that classify and route before execution
  • Semantic Router: Embedding-based routing for LLM pipelines
  • OpenAI Assistants: Tool-use routing happens in a separate classification step
  • AWS Bedrock: Agent routing with classifier-based delegation

Summary

Adding hooks.messagePreProcess would solve a real architectural limitation where the executing model cannot reliably self-assess routing decisions. It is a general-purpose feature that enables request routing, content filtering, context enrichment, and more — while being simple to configure and opt-in.

Happy to discuss implementation details or provide more data on the routing failure patterns that motivated this request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions