Skip to content

Feature: Message Coalescing for Gateway Platforms #345

@teknium1

Description

@teknium1

Overview

When users send rapid-fire messages on platforms like Telegram, Discord, or Slack, each message currently triggers a separate LLM turn. This causes fragmented responses, wasted API calls, and an overall poor experience — the agent starts responding to message 1 while messages 2, 3, and 4 are still arriving, often with important context.

Spacedrive's Spacebot implements message coalescing — a debounce mechanism that accumulates rapid messages into a single batched turn before processing. This is a proven pattern from their multi-user community bot that handles Discord, Slack, Telegram, and Twitch.

This feature would add the same capability to Hermes Agent's gateway platforms.

Research source: Spacebot source code (lines 160-476)


Research Findings

How Spacebot's Coalescing Works

The implementation lives in src/agent/channel.rs with configuration in CoalesceConfig:

Defaults:
  enabled: true
  debounce_ms: 1500      # Wait this long after each message for more
  max_wait_ms: 5000      # Maximum total wait from first message
  min_messages: 2        # Minimum messages to trigger batching
  multi_user_only: true  # Only batch in multi-user channels (not DMs)

Algorithm:

  1. should_coalesce() checks: enabled? not a system retrigger? not a DM (when multi_user_only)?

  2. If yes, message is pushed to a coalesce_buffer and update_coalesce_deadline() is called.

  3. Deadline logic:

    • If buffer has >= min_messages: set deadline to debounce_ms from now, capped at max_wait_ms from the first message's timestamp
    • If buffer has < min_messages: set a short debounce_ms deadline
  4. The channel event loop (tokio::select!) checks coalesce_deadline on each iteration.

  5. On deadline expiry: flush_coalesce_buffer() is called:

    • Single message: Processed normally
    • Multiple messages: Batched via handle_message_batch() which formats all messages with attribution and timestamps, then presents them as a single user turn. A coalesce hint is injected:

    "N messages arrived in Xs. This is a fast-moving conversation with multiple participants..."

Why This Matters

Without coalescing:

User sends: "hey"           → Agent starts LLM call #1
User sends: "can you check" → Agent starts LLM call #2
User sends: "the logs"      → Agent starts LLM call #3
User sends: "from yesterday"→ Agent starts LLM call #4

Result: 4 separate LLM calls, 4 fragmented responses, $$$

With coalescing (1.5s debounce):

User sends: "hey"                     → Buffer: 1 msg, deadline in 1.5s
User sends: "can you check" (0.3s)    → Buffer: 2 msgs, deadline in 1.5s
User sends: "the logs" (0.8s)         → Buffer: 3 msgs, deadline in 1.5s
User sends: "from yesterday" (1.2s)   → Buffer: 4 msgs, deadline in 1.5s
                            (2.7s)    → Deadline fires → single batched turn

Result: 1 LLM call, coherent response, ¢

Current State in Hermes Agent

What We Have

Each gateway platform adapter processes messages immediately:

  • gateway/platforms/telegram.py — on_message triggers immediate handle_message()
  • gateway/platforms/discord_adapter.py — same pattern
  • gateway/platforms/slack.py — same pattern
  • No debounce, no accumulation window, no batching logic

What's Missing

  • No message buffer or coalesce timer per session/channel
  • No configurable debounce window
  • No multi-message formatting with attribution
  • No hint injection for batched messages

Integration Points

  • gateway/run.py — Message routing (GatewayRunner.handle_message)
  • gateway/session.py — Session management (SessionStore)
  • gateway/platforms/*.py — Platform adapters
  • ~/.hermes/config.yaml — Config (where coalescing settings would live)

Implementation Plan

Skill vs. Tool Classification

This should be a core codebase feature (in the gateway module) because:

  • It modifies the message flow in gateway/run.py or session management
  • It needs timer/async scheduling integrated with the event loop
  • It must work across all platform adapters consistently
  • It's a platform-level behavior, not an agent-level capability

What We'd Need

  1. CoalesceBuffer class — Per-session message accumulation with deadline tracking
  2. Configmessage_coalescing section in config.yaml (enabled, debounce_ms, max_wait_ms, etc.)
  3. Batch formatter — Combine multiple messages into a single attributed turn
  4. Timer integration — asyncio deadline management in the gateway event loop
  5. Platform-specific tuning — DM vs. group behavior, platform-specific quirks

Phased Rollout

Phase 1: Basic debounce for all platforms

  • CoalesceBuffer with configurable debounce_ms (default: 1500) and max_wait_ms (5000)
  • Implemented in gateway/run.py (platform-agnostic)
  • Simple concatenation of message texts with timestamps
  • Config: message_coalescing.enabled: true in config.yaml
  • Deliverable: Rapid messages batched into single turn

Phase 2: Attributed batching and hint injection

  • Format batched messages with user attribution (important for multi-user channels)
  • Inject coalesce hint into the batched message ("N messages arrived...")
  • DM vs. group channel detection for multi_user_only mode
  • Platform-specific message metadata (Discord thread IDs, Telegram reply chains)
  • Deliverable: Clean multi-user message batching

Phase 3: Smart coalescing

  • Detect "complete thought" patterns to flush early (question marks, commands)
  • Adjust debounce based on user typing speed patterns
  • File attachment handling (images/documents should trigger flush)
  • Integrate with session context (if agent is already processing, queue next batch)
  • Deliverable: Intelligent coalescing that feels natural

Pros & Cons

Pros

  • Cost reduction — Fewer LLM calls for rapid multi-message input (common pattern on chat platforms)
  • Better responses — Agent sees the full context of what the user intended, not fragments
  • Proven pattern — Battle-tested in Spacebot's community bot across multiple platforms
  • Low complexity — The core is a simple debounce timer + buffer, ~100-200 lines
  • User-configurable — Can be disabled for users who prefer immediate response

Cons / Risks

  • Perceived latency — Users may feel the 1.5s wait is "slow" in DMs (mitigated by multi_user_only mode)
  • Typing indicators — Need to show "thinking" during the coalesce window so users know the bot is "listening"
  • Platform differences — Telegram, Discord, Slack have different message timing patterns
  • Edge cases — File attachments, inline images, voice messages need special handling

Open Questions

  • Should coalescing be enabled by default, or opt-in? (Spacebot defaults to enabled, multi_user_only)
  • What's the right default debounce window? 1.5s feels right for chat, but may need tuning per platform.
  • Should DMs be exempt by default? (Spacebot's multi_user_only=true exempts DMs)
  • How should the coalesced message be formatted? Simple concatenation, or attributed with timestamps?
  • Should the agent show a "typing" indicator during the coalesce window?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions