Summary
Two related defects in gateway/platforms/sms.py:
-
Retry-after-giving-up loop — when the SMS adapter fails to start (e.g., `SMS_WEBHOOK_URL` unset), it logs `Giving up reconnecting sms after 20 attempts` and then retries the entire start-up loop every 5 minutes. Result: ~288 ERROR lines per day per misconfigured deployment. The "give up" should actually give up until the next config reload, not silently retry forever.
-
Defaults to 0.0.0.0 bind — the webhook listener binds to all interfaces:
```python
gateway/platforms/sms.py:107 (v2026.4.23)
site = web.TCPSite(self._runner, "0.0.0.0", self._webhook_port)
```
In a Cloudflare Tunnel deployment (which is the recommended public-facing setup), the tunnel routes external traffic to `http://127.0.0.1:`, so binding to `0.0.0.0` provides no benefit and exposes the webhook to anything in the same network/security group. Defense-in-depth would default to `127.0.0.1` and offer an opt-in env var (`SMS_WEBHOOK_BIND`) for deployments that need broader binding.
Repro
- Deploy with `SMS_WEBHOOK_URL` unset on a config that loads the SMS platform.
- Watch `journalctl -u hermes-agent` for 30 minutes -- the "Refusing to start" + "Giving up" pair recurs every ~5 minutes.
Suggested patch (the bind half)
```python
gateway/platforms/sms.py
DEFAULT_WEBHOOK_BIND = "127.0.0.1"
class SmsAdapter(BasePlatformAdapter):
def init(self, config):
...
self._webhook_bind: str = os.getenv("SMS_WEBHOOK_BIND", DEFAULT_WEBHOOK_BIND)
async def connect(self) -> bool:
...
site = web.TCPSite(self._runner, self._webhook_bind, self._webhook_port)
```
For the retry loop: the supervisor / reconnect code path that re-attempts after a 20-attempt cap should mark the platform as permanently failed for that gateway lifecycle and emit a single "sms platform disabled until restart" warning -- not silently retry every 5 minutes.
Impact
- Log noise: 288 ERROR lines/day on misconfigured deployments.
- Defense-in-depth: a 1-line bind change that closes a small attack surface for tunnel-fronted deployments (the now-recommended SMS deployment pattern).
Happy to send a PR once a maintainer signals which of the two fixes are in scope -- they're independent.
Summary
Two related defects in
gateway/platforms/sms.py:Retry-after-giving-up loop — when the SMS adapter fails to start (e.g., `SMS_WEBHOOK_URL` unset), it logs `Giving up reconnecting sms after 20 attempts` and then retries the entire start-up loop every 5 minutes. Result: ~288 ERROR lines per day per misconfigured deployment. The "give up" should actually give up until the next config reload, not silently retry forever.
Defaults to 0.0.0.0 bind — the webhook listener binds to all interfaces:
```python
gateway/platforms/sms.py:107 (v2026.4.23)
site = web.TCPSite(self._runner, "0.0.0.0", self._webhook_port)
```
In a Cloudflare Tunnel deployment (which is the recommended public-facing setup), the tunnel routes external traffic to `http://127.0.0.1:`, so binding to `0.0.0.0` provides no benefit and exposes the webhook to anything in the same network/security group. Defense-in-depth would default to `127.0.0.1` and offer an opt-in env var (`SMS_WEBHOOK_BIND`) for deployments that need broader binding.
Repro
Suggested patch (the bind half)
```python
gateway/platforms/sms.py
DEFAULT_WEBHOOK_BIND = "127.0.0.1"
class SmsAdapter(BasePlatformAdapter):
def init(self, config):
...
self._webhook_bind: str = os.getenv("SMS_WEBHOOK_BIND", DEFAULT_WEBHOOK_BIND)
```
For the retry loop: the supervisor / reconnect code path that re-attempts after a 20-attempt cap should mark the platform as permanently failed for that gateway lifecycle and emit a single "sms platform disabled until restart" warning -- not silently retry every 5 minutes.
Impact
Happy to send a PR once a maintainer signals which of the two fixes are in scope -- they're independent.