Overview
When users send rapid-fire messages on platforms like Telegram, Discord, or Slack, each message currently triggers a separate LLM turn. This causes fragmented responses, wasted API calls, and an overall poor experience — the agent starts responding to message 1 while messages 2, 3, and 4 are still arriving, often with important context.
Spacedrive's Spacebot implements message coalescing — a debounce mechanism that accumulates rapid messages into a single batched turn before processing. This is a proven pattern from their multi-user community bot that handles Discord, Slack, Telegram, and Twitch.
This feature would add the same capability to Hermes Agent's gateway platforms.
Research source: Spacebot source code (lines 160-476)
Research Findings
How Spacebot's Coalescing Works
The implementation lives in src/agent/channel.rs with configuration in CoalesceConfig:
Defaults:
enabled: true
debounce_ms: 1500 # Wait this long after each message for more
max_wait_ms: 5000 # Maximum total wait from first message
min_messages: 2 # Minimum messages to trigger batching
multi_user_only: true # Only batch in multi-user channels (not DMs)
Algorithm:
-
should_coalesce() checks: enabled? not a system retrigger? not a DM (when multi_user_only)?
-
If yes, message is pushed to a coalesce_buffer and update_coalesce_deadline() is called.
-
Deadline logic:
- If buffer has >= min_messages: set deadline to
debounce_ms from now, capped at max_wait_ms from the first message's timestamp
- If buffer has < min_messages: set a short
debounce_ms deadline
-
The channel event loop (tokio::select!) checks coalesce_deadline on each iteration.
-
On deadline expiry: flush_coalesce_buffer() is called:
- Single message: Processed normally
- Multiple messages: Batched via
handle_message_batch() which formats all messages with attribution and timestamps, then presents them as a single user turn. A coalesce hint is injected:
"N messages arrived in Xs. This is a fast-moving conversation with multiple participants..."
Why This Matters
Without coalescing:
User sends: "hey" → Agent starts LLM call #1
User sends: "can you check" → Agent starts LLM call #2
User sends: "the logs" → Agent starts LLM call #3
User sends: "from yesterday"→ Agent starts LLM call #4
Result: 4 separate LLM calls, 4 fragmented responses, $$$
With coalescing (1.5s debounce):
User sends: "hey" → Buffer: 1 msg, deadline in 1.5s
User sends: "can you check" (0.3s) → Buffer: 2 msgs, deadline in 1.5s
User sends: "the logs" (0.8s) → Buffer: 3 msgs, deadline in 1.5s
User sends: "from yesterday" (1.2s) → Buffer: 4 msgs, deadline in 1.5s
(2.7s) → Deadline fires → single batched turn
Result: 1 LLM call, coherent response, ¢
Current State in Hermes Agent
What We Have
Each gateway platform adapter processes messages immediately:
gateway/platforms/telegram.py — on_message triggers immediate handle_message()
gateway/platforms/discord_adapter.py — same pattern
gateway/platforms/slack.py — same pattern
- No debounce, no accumulation window, no batching logic
What's Missing
- No message buffer or coalesce timer per session/channel
- No configurable debounce window
- No multi-message formatting with attribution
- No hint injection for batched messages
Integration Points
gateway/run.py — Message routing (GatewayRunner.handle_message)
gateway/session.py — Session management (SessionStore)
gateway/platforms/*.py — Platform adapters
~/.hermes/config.yaml — Config (where coalescing settings would live)
Implementation Plan
Skill vs. Tool Classification
This should be a core codebase feature (in the gateway module) because:
- It modifies the message flow in
gateway/run.py or session management
- It needs timer/async scheduling integrated with the event loop
- It must work across all platform adapters consistently
- It's a platform-level behavior, not an agent-level capability
What We'd Need
- CoalesceBuffer class — Per-session message accumulation with deadline tracking
- Config —
message_coalescing section in config.yaml (enabled, debounce_ms, max_wait_ms, etc.)
- Batch formatter — Combine multiple messages into a single attributed turn
- Timer integration — asyncio deadline management in the gateway event loop
- Platform-specific tuning — DM vs. group behavior, platform-specific quirks
Phased Rollout
Phase 1: Basic debounce for all platforms
- CoalesceBuffer with configurable debounce_ms (default: 1500) and max_wait_ms (5000)
- Implemented in gateway/run.py (platform-agnostic)
- Simple concatenation of message texts with timestamps
- Config:
message_coalescing.enabled: true in config.yaml
- Deliverable: Rapid messages batched into single turn
Phase 2: Attributed batching and hint injection
- Format batched messages with user attribution (important for multi-user channels)
- Inject coalesce hint into the batched message ("N messages arrived...")
- DM vs. group channel detection for multi_user_only mode
- Platform-specific message metadata (Discord thread IDs, Telegram reply chains)
- Deliverable: Clean multi-user message batching
Phase 3: Smart coalescing
- Detect "complete thought" patterns to flush early (question marks, commands)
- Adjust debounce based on user typing speed patterns
- File attachment handling (images/documents should trigger flush)
- Integrate with session context (if agent is already processing, queue next batch)
- Deliverable: Intelligent coalescing that feels natural
Pros & Cons
Pros
- Cost reduction — Fewer LLM calls for rapid multi-message input (common pattern on chat platforms)
- Better responses — Agent sees the full context of what the user intended, not fragments
- Proven pattern — Battle-tested in Spacebot's community bot across multiple platforms
- Low complexity — The core is a simple debounce timer + buffer, ~100-200 lines
- User-configurable — Can be disabled for users who prefer immediate response
Cons / Risks
- Perceived latency — Users may feel the 1.5s wait is "slow" in DMs (mitigated by multi_user_only mode)
- Typing indicators — Need to show "thinking" during the coalesce window so users know the bot is "listening"
- Platform differences — Telegram, Discord, Slack have different message timing patterns
- Edge cases — File attachments, inline images, voice messages need special handling
Open Questions
- Should coalescing be enabled by default, or opt-in? (Spacebot defaults to enabled, multi_user_only)
- What's the right default debounce window? 1.5s feels right for chat, but may need tuning per platform.
- Should DMs be exempt by default? (Spacebot's multi_user_only=true exempts DMs)
- How should the coalesced message be formatted? Simple concatenation, or attributed with timestamps?
- Should the agent show a "typing" indicator during the coalesce window?
References
Overview
When users send rapid-fire messages on platforms like Telegram, Discord, or Slack, each message currently triggers a separate LLM turn. This causes fragmented responses, wasted API calls, and an overall poor experience — the agent starts responding to message 1 while messages 2, 3, and 4 are still arriving, often with important context.
Spacedrive's Spacebot implements message coalescing — a debounce mechanism that accumulates rapid messages into a single batched turn before processing. This is a proven pattern from their multi-user community bot that handles Discord, Slack, Telegram, and Twitch.
This feature would add the same capability to Hermes Agent's gateway platforms.
Research source: Spacebot source code (lines 160-476)
Research Findings
How Spacebot's Coalescing Works
The implementation lives in
src/agent/channel.rswith configuration inCoalesceConfig:Algorithm:
should_coalesce() checks: enabled? not a system retrigger? not a DM (when multi_user_only)?
If yes, message is pushed to a
coalesce_bufferandupdate_coalesce_deadline()is called.Deadline logic:
debounce_msfrom now, capped atmax_wait_msfrom the first message's timestampdebounce_msdeadlineThe channel event loop (tokio::select!) checks
coalesce_deadlineon each iteration.On deadline expiry:
flush_coalesce_buffer()is called:handle_message_batch()which formats all messages with attribution and timestamps, then presents them as a single user turn. A coalesce hint is injected:Why This Matters
Without coalescing:
With coalescing (1.5s debounce):
Current State in Hermes Agent
What We Have
Each gateway platform adapter processes messages immediately:
gateway/platforms/telegram.py— on_message triggers immediatehandle_message()gateway/platforms/discord_adapter.py— same patterngateway/platforms/slack.py— same patternWhat's Missing
Integration Points
gateway/run.py— Message routing (GatewayRunner.handle_message)gateway/session.py— Session management (SessionStore)gateway/platforms/*.py— Platform adapters~/.hermes/config.yaml— Config (where coalescing settings would live)Implementation Plan
Skill vs. Tool Classification
This should be a core codebase feature (in the gateway module) because:
gateway/run.pyor session managementWhat We'd Need
message_coalescingsection in config.yaml (enabled, debounce_ms, max_wait_ms, etc.)Phased Rollout
Phase 1: Basic debounce for all platforms
message_coalescing.enabled: truein config.yamlPhase 2: Attributed batching and hint injection
Phase 3: Smart coalescing
Pros & Cons
Pros
Cons / Risks
Open Questions
References