Skip to content

fix(sms): mark missing-config errors as non-retryable; default bind to 127.0.0.1 (#16258)#16278

Closed
briandevans wants to merge 3 commits into
NousResearch:mainfrom
briandevans:fix/sms-fatal-error-bind-16258
Closed

fix(sms): mark missing-config errors as non-retryable; default bind to 127.0.0.1 (#16258)#16278
briandevans wants to merge 3 commits into
NousResearch:mainfrom
briandevans:fix/sms-fatal-error-bind-16258

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

Summary

  • Config-validation failures in SmsAdapter.connect() (missing SMS_WEBHOOK_URL, missing TWILIO_PHONE_NUMBER) now call _set_fatal_error(..., retryable=False) before returning False, so the reconnect watcher removes them from the retry queue immediately and never re-enqueues them.
  • DEFAULT_WEBHOOK_HOST changed from 0.0.0.0 to 127.0.0.1 — aligns with the recommended Cloudflare Tunnel deployment model (tunnel routes to 127.0.0.1:<port>). Operators who need 0.0.0.0 can set SMS_WEBHOOK_HOST=0.0.0.0.

The bugs

Retry-after-giving-up loop: when SMS_WEBHOOK_URL is unset, connect() returned False without setting has_fatal_error. The reconnect watcher treated this as a transient failure, scheduled exponential back-off retries, and logged an error on every attempt — up to 20 times before finally logging "Giving up reconnecting sms after N attempts". For a permanent configuration error this cycle is pure log noise. After the fix, the gateway recognises the error as non-retryable on the first attempt and drops SMS from the retry queue immediately.

0.0.0.0 bind: the Twilio webhook listener bound to all interfaces by default, exposing it to other hosts on the same network even though Cloudflare Tunnel (the recommended public-facing setup) only routes to 127.0.0.1. Changing the default eliminates the unnecessary exposure; the env-var override path SMS_WEBHOOK_HOST is unchanged for operators who require the old behaviour.

The fix

gateway/platforms/sms.py:

  • DEFAULT_WEBHOOK_HOST: "0.0.0.0""127.0.0.1" (+ module docstring)
  • connect(): two return False paths that previously set no fatal-error state now call self._set_fatal_error("<code>", msg, retryable=False) before returning

tests/gateway/test_sms.py:

  • test_default_host_is_all_interfaces renamed to test_default_host_is_localhost, assertion updated
  • test_host_from_env updated to verify env-var override to 0.0.0.0 still works (was testing override to the new default, which added no coverage)
  • Three new TestStartupGuard tests: test_missing_webhook_url_is_non_retryable, test_missing_phone_number_is_non_retryable, test_insecure_flag_does_not_set_fatal_error

Test plan

  • Before: test_missing_webhook_url_is_non_retryable fails — has_fatal_error is False on the old code path
  • After: 39 tests pass in tests/gateway/test_sms.py
  • Regression guard: reverted both _set_fatal_error calls → both new non-retryable tests fail as expected; restored → 0 failures
  • Adjacent suite (tests/gateway/) unchanged

Related

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 27, 2026 01:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the SMS (Twilio) gateway adapter to treat missing required configuration as non-retryable failures (so the reconnect watcher drops SMS immediately), and tightens default webhook listener binding to localhost for safer Cloudflare Tunnel deployments.

Changes:

  • Mark missing SMS_WEBHOOK_URL / TWILIO_PHONE_NUMBER startup failures as fatal + retryable=False in SmsAdapter.connect().
  • Change the default webhook bind host from 0.0.0.0 to 127.0.0.1 (env override unchanged).
  • Update/add tests to cover the new default bind and the non-retryable fatal-error behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
gateway/platforms/sms.py Switch default webhook host to localhost and set non-retryable fatal errors for missing required config in connect().
tests/gateway/test_sms.py Adjust host-default/override assertions and add startup-guard tests verifying non-retryable fatal errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/gateway/test_sms.py Outdated
"TWILIO_ACCOUNT_SID": "ACtest",
"TWILIO_AUTH_TOKEN": "tok",
}
with patch.dict(os.environ, env, clear=False):

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test_missing_phone_number_is_non_retryable, the patch.dict(os.environ, env, clear=False) allows a pre-existing TWILIO_PHONE_NUMBER (or SMS_WEBHOOK_URL) from the outer environment to leak into SmsAdapter.__init__(). That can make this test flaky (it may fail a different guard than intended or even try to start a real aiohttp listener). Use clear=True (and/or explicitly unset TWILIO_PHONE_NUMBER/SMS_WEBHOOK_URL) so the test deterministically exercises the missing-phone-number path.

Suggested change
with patch.dict(os.environ, env, clear=False):
with patch.dict(os.environ, env, clear=True):

Copilot uses AI. Check for mistakes.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/sms SMS (Twilio) adapter comp/gateway Gateway runner, session dispatch, delivery labels Apr 27, 2026
@briandevans

Copy link
Copy Markdown
Contributor Author

@copilot Finding addressed in commit 2f5c9ce3:

Finding (line 261 — env leak via clear=False): Added "TWILIO_PHONE_NUMBER": "" and "SMS_WEBHOOK_URL": "" to the env dict so both vars are explicitly cleared by patch.dict, regardless of what the host environment has set. This deterministically exercises the missing-phone-number guard path without risk of TWILIO_PHONE_NUMBER from the outer env leaking through and redirecting to the webhook-URL guard instead.

briandevans and others added 3 commits April 30, 2026 13:12
…o 127.0.0.1

Two bugs in the SMS (Twilio) gateway adapter:

1. **Retry-after-giving-up**: when SMS_WEBHOOK_URL is unset (or
   TWILIO_PHONE_NUMBER is missing), connect() returned False without
   calling _set_fatal_error(). The reconnect watcher treated the
   failure as transient and retried it up to 20 times with exponential
   back-off before finally giving up — generating noise in logs and
   wasting CPU for a permanent configuration error. The fix marks both
   config-validation failures as retryable=False so the gateway removes
   them from the retry queue immediately and never re-enqueues them.

2. **0.0.0.0 bind default**: the webhook listener defaulted to binding
   on all interfaces. In the recommended Cloudflare Tunnel deployment
   the tunnel routes to 127.0.0.1:<port>, so the 0.0.0.0 bind exposed
   the endpoint to everything in the same network/security group
   without benefit. The default is now 127.0.0.1; operators who
   genuinely need to bind all interfaces can still set
   SMS_WEBHOOK_HOST=0.0.0.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n env

Copilot finding on NousResearch#16278: patch.dict(clear=False) allowed leaked env vars
from the outer environment to make test_missing_phone_number_is_non_retryable
flaky — if TWILIO_PHONE_NUMBER was already set, the adapter would proceed past
the phone-number guard and hit the webhook-URL guard instead.

Adding explicit empty values for both vars ensures the test deterministically
exercises the missing-phone-number path regardless of the host environment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevents pre-existing TWILIO_PHONE_NUMBER or SMS_WEBHOOK_URL values in
the outer test environment from leaking into the assertion context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1

teknium1 commented May 4, 2026

Copy link
Copy Markdown
Contributor

Salvaged via #19745 onto current main. Test-file conflict had extra tests from your branch (insecure-flag regression and others); preserved them all. The sms.py changes weren't applied cleanly by the initial cherry-pick (silent file conflict), so I re-applied them manually and also updated the existing test_default_host_is_all_interfaces test assertion to match the new 127.0.0.1 default. Your authorship was preserved. Thanks @briandevans!

@teknium1 teknium1 closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/sms SMS (Twilio) adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SMS gateway: bind to 127.0.0.1 by default, fix retry-after-giving-up loop

4 participants