Skip to content

fix(discord): avoid false ❌ on self-restart cancellation #6315

@bobashopcashier

Description

@bobashopcashier

Bug Description

When Hermes is told to restart from Discord, the restart may begin or even succeed, but the original Discord message is often marked with a red and there is no reliable final confirmation message.

This makes an intentional self-restart look like a failure from the Discord UX, even when the restart itself likely worked.

This appears to be a remaining bug after #1414 / #1427:

User-visible symptom

In Discord, a message such as restart now can get:

  • an in-progress reaction
  • then a final red
  • and no trustworthy final "restart worked" acknowledgement

A follow-up message such as did it work may also get a red if the restart sequence is still interrupting the original processing lifecycle.

Observed screenshot behavior from local investigation:

  • restart now receives a red
  • the bot appears to emit restart-related terminal output
  • a later did it work message also receives a red

Concrete code path

Current main still has this cancellation-to-failure mapping in gateway/platforms/base.py:

  • gateway/platforms/base.py:1452-1454
    • normal completion derives processing_ok and calls on_processing_complete(..., processing_ok)
  • gateway/platforms/base.py:1472-1474
    • asyncio.CancelledError calls on_processing_complete(..., False) and re-raises

Current Discord reaction handling still maps False to a red in gateway/platforms/discord.py:

  • gateway/platforms/discord.py:741-748
    • removes 👀
    • adds if success else

That means an intentional self-restart can cancel the in-flight handler, trigger on_processing_complete(..., False), and produce a false failure reaction in Discord.

Why this is distinct from #1427

PR #1427 fixed an important shutdown bug:

  • tracked adapter background tasks
  • canceled them during shutdown
  • interrupted running agents before disconnecting adapters

That addresses old gateway instances continuing to unwind work after shutdown/replacement.

But the remaining Discord UX problem appears narrower:

  • intentional self-restart cancellation is still classified as ordinary failure for lifecycle completion
  • Discord therefore shows for what may be a successful or expected restart transition

Expected Behavior

On intentional self-restart from Discord:

  • Hermes should not mark the triggering Discord message as failed purely because the handler was cancelled as part of restart/shutdown
  • if a final confirmation message cannot be sent before restart, the bot should avoid a false red
  • genuine handler errors should still surface as failure

Acceptance Criteria

  • intentional self-restart from Discord does not end with a false red
  • cancellation caused by expected restart/shutdown is distinguished from genuine processing failure
  • genuine errors still produce failure signaling
  • regression coverage proves the Discord reaction lifecycle does not report false failure during self-restart

Historical Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/discordDiscord bot adaptersweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions