Skip to content

SMS gateway: bind to 127.0.0.1 by default, fix retry-after-giving-up loop #16258

@eschneider8255

Description

@eschneider8255

Summary

Two related defects in gateway/platforms/sms.py:

  1. Retry-after-giving-up loop — when the SMS adapter fails to start (e.g., `SMS_WEBHOOK_URL` unset), it logs `Giving up reconnecting sms after 20 attempts` and then retries the entire start-up loop every 5 minutes. Result: ~288 ERROR lines per day per misconfigured deployment. The "give up" should actually give up until the next config reload, not silently retry forever.

  2. Defaults to 0.0.0.0 bind — the webhook listener binds to all interfaces:

    ```python

    gateway/platforms/sms.py:107 (v2026.4.23)

    site = web.TCPSite(self._runner, "0.0.0.0", self._webhook_port)
    ```

    In a Cloudflare Tunnel deployment (which is the recommended public-facing setup), the tunnel routes external traffic to `http://127.0.0.1:`, so binding to `0.0.0.0` provides no benefit and exposes the webhook to anything in the same network/security group. Defense-in-depth would default to `127.0.0.1` and offer an opt-in env var (`SMS_WEBHOOK_BIND`) for deployments that need broader binding.

Repro

  1. Deploy with `SMS_WEBHOOK_URL` unset on a config that loads the SMS platform.
  2. Watch `journalctl -u hermes-agent` for 30 minutes -- the "Refusing to start" + "Giving up" pair recurs every ~5 minutes.

Suggested patch (the bind half)

```python

gateway/platforms/sms.py

DEFAULT_WEBHOOK_BIND = "127.0.0.1"

class SmsAdapter(BasePlatformAdapter):
def init(self, config):
...
self._webhook_bind: str = os.getenv("SMS_WEBHOOK_BIND", DEFAULT_WEBHOOK_BIND)

async def connect(self) -> bool:
    ...
    site = web.TCPSite(self._runner, self._webhook_bind, self._webhook_port)

```

For the retry loop: the supervisor / reconnect code path that re-attempts after a 20-attempt cap should mark the platform as permanently failed for that gateway lifecycle and emit a single "sms platform disabled until restart" warning -- not silently retry every 5 minutes.

Impact

  • Log noise: 288 ERROR lines/day on misconfigured deployments.
  • Defense-in-depth: a 1-line bind change that closes a small attack surface for tunnel-fronted deployments (the now-recommended SMS deployment pattern).

Happy to send a PR once a maintainer signals which of the two fixes are in scope -- they're independent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/smsSMS (Twilio) adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions