Skip to content

Bug: /goal continuation loop never fires on gateway (Telegram/Discord) — event enqueued after consumer returns #28649

@NivOO5

Description

@NivOO5

Bug Report: /goal auto-continue loop does not work on gateway platforms

Summary

The /goal command sets the goal and runs the first turn successfully, but the auto-continuation loop never fires on gateway platforms (Telegram, Discord, etc.). The continuation event is enqueued into adapter._pending_messages after _run_agent has already checked for and drained pending events, so the continuation sits orphaned until the user sends another message.

All three goals observed in state_meta show turns_used: 0 and last_verdict: null, confirming the continuation loop has never completed a single cycle on this system.

Environment

  • Hermes Agent: v0.14.0 (2026.5.16)
  • Platform: Telegram (gateway mode), also affects Discord
  • Model: Various (grok-4.3 via xai-oauth, glm-5.1 via zai)
  • OS: macOS (Apple Silicon), gateway running via launchd

Reproduction

  1. Start the gateway (hermes gateway run)
  2. Send /goal <any multi-step task> via Telegram or Discord
  3. Observe: the first turn runs and produces a response
  4. Expected: the judge evaluates, and if "continue", a continuation turn fires automatically
  5. Actual: the agent stops after the first turn. No continuation fires. turns_used remains 0.

Root Cause Analysis

The continuation event is enqueued at the wrong layer — after _run_agent() has already returned and its internal pending-event consumer has checked for and found nothing.

Code path (all in gateway/run.py):

  1. /goal <text> is processed by _handle_goal_command() (line 9791). Goal state is saved to state_meta DB. A kickoff MessageEvent is enqueued via _enqueue_fifo() into adapter._pending_messages[session_key].

  2. The kickoff event triggers _run_agent() (line 14663). The agent runs, produces a response.

  3. Inside _run_agent(), after the agent turn completes, the code at line 16381 dequeues pending events:

    pending_event = _dequeue_pending_event(adapter, session_key)

    At this point, _pending_messages is empty (the kickoff was already consumed). No pending event is found. _run_agent() processes the result and returns.

  4. After _run_agent() returns, back in the outer call site at line 6945:

    await self._post_turn_goal_continuation(
        session_entry=session_entry,
        source=source,
        final_response=_final_text,
    )

    This calls evaluate_after_turn() → judge says "continue" → continuation event is enqueued via _enqueue_fifo() into adapter._pending_messages[session_key].

  5. The code then returns _agent_result (line 6949), the finally block releases the running-agent state (line 6951), and the session goes idle.

  6. The continuation event is now orphaned in _pending_messages — there is no code path that re-enters the processing loop to consume it. It only gets processed when the user sends a new message, which triggers _start_session_processing() in the adapter.

Why the CLI Works but the Gateway Does Not

In the CLI (cli.py), _handle_goal_command() puts the goal into self._pending_input (line 8622), which the CLI interactive main loop reads from continuously. The CLI effectively has a built-in re-entry loop. The gateway has no equivalent — it relies on platform adapters to trigger processing, and a synthetic enqueued event does not trigger the adapter.

Additional Finding: turns_used Stays at 0

Three goals observed in the DB, all with turns_used: 0:

goal:20260514_... | active | turns_used=0 | last_verdict=null
goal:20260517_... | active | turns_used=0 | last_verdict=null
goal:20260518_... | active | turns_used=0 | last_verdict=null

This suggests evaluate_after_turn() may not be reached at all — possibly GoalManager(session_id=sid).is_active() returns False when loaded fresh from DB in _post_turn_goal_continuation, or an exception occurs before the evaluate call and is swallowed by the outer try/except (line 6947, which logs at debug level only).

Suggested Fix

After _post_turn_goal_continuation enqueues the continuation event, the gateway should re-enter _run_agent (or equivalent) to process it. Two approaches:

Option A: Move _post_turn_goal_continuation inside _run_agent(), before the pending-event dequeue at line 16381. This way the continuation event is already enqueued when the dequeue runs.

Option B: After _post_turn_goal_continuation returns at line 6945, check if a continuation was enqueued. If so, call _start_session_processing on the adapter to spawn a new processing task — similar to how _drain_pending_after_session_command works in base.py.

Workaround

Users can manually send any message (e.g., "continue") after each goal turn to trigger processing of the queued continuation. But this defeats the purpose of the autonomous loop.

Related Code

  • gateway/run.py: _post_turn_goal_continuation() (line 9981), _enqueue_fifo() (line 2040), dequeue in _run_agent() (line 16381), continuation hook call site (line 6942)
  • hermes_cli/goals.py: GoalManager.evaluate_after_turn() (line 620), judge_goal() (line 371)
  • gateway/platforms/base.py: _start_session_processing() (line 2646), _drain_pending_after_session_command() (line 2728)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliveryplatform/discordDiscord bot adapterplatform/telegramTelegram bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions