fix(sms): mark missing-config errors as non-retryable; default bind to 127.0.0.1 (#16258)#16278
fix(sms): mark missing-config errors as non-retryable; default bind to 127.0.0.1 (#16258)#16278briandevans wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the SMS (Twilio) gateway adapter to treat missing required configuration as non-retryable failures (so the reconnect watcher drops SMS immediately), and tightens default webhook listener binding to localhost for safer Cloudflare Tunnel deployments.
Changes:
- Mark missing
SMS_WEBHOOK_URL/TWILIO_PHONE_NUMBERstartup failures as fatal +retryable=FalseinSmsAdapter.connect(). - Change the default webhook bind host from
0.0.0.0to127.0.0.1(env override unchanged). - Update/add tests to cover the new default bind and the non-retryable fatal-error behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
gateway/platforms/sms.py |
Switch default webhook host to localhost and set non-retryable fatal errors for missing required config in connect(). |
tests/gateway/test_sms.py |
Adjust host-default/override assertions and add startup-guard tests verifying non-retryable fatal errors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "TWILIO_ACCOUNT_SID": "ACtest", | ||
| "TWILIO_AUTH_TOKEN": "tok", | ||
| } | ||
| with patch.dict(os.environ, env, clear=False): |
There was a problem hiding this comment.
In test_missing_phone_number_is_non_retryable, the patch.dict(os.environ, env, clear=False) allows a pre-existing TWILIO_PHONE_NUMBER (or SMS_WEBHOOK_URL) from the outer environment to leak into SmsAdapter.__init__(). That can make this test flaky (it may fail a different guard than intended or even try to start a real aiohttp listener). Use clear=True (and/or explicitly unset TWILIO_PHONE_NUMBER/SMS_WEBHOOK_URL) so the test deterministically exercises the missing-phone-number path.
| with patch.dict(os.environ, env, clear=False): | |
| with patch.dict(os.environ, env, clear=True): |
|
@copilot Finding addressed in commit Finding (line 261 — env leak via clear=False): Added |
…o 127.0.0.1 Two bugs in the SMS (Twilio) gateway adapter: 1. **Retry-after-giving-up**: when SMS_WEBHOOK_URL is unset (or TWILIO_PHONE_NUMBER is missing), connect() returned False without calling _set_fatal_error(). The reconnect watcher treated the failure as transient and retried it up to 20 times with exponential back-off before finally giving up — generating noise in logs and wasting CPU for a permanent configuration error. The fix marks both config-validation failures as retryable=False so the gateway removes them from the retry queue immediately and never re-enqueues them. 2. **0.0.0.0 bind default**: the webhook listener defaulted to binding on all interfaces. In the recommended Cloudflare Tunnel deployment the tunnel routes to 127.0.0.1:<port>, so the 0.0.0.0 bind exposed the endpoint to everything in the same network/security group without benefit. The default is now 127.0.0.1; operators who genuinely need to bind all interfaces can still set SMS_WEBHOOK_HOST=0.0.0.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n env Copilot finding on NousResearch#16278: patch.dict(clear=False) allowed leaked env vars from the outer environment to make test_missing_phone_number_is_non_retryable flaky — if TWILIO_PHONE_NUMBER was already set, the adapter would proceed past the phone-number guard and hit the webhook-URL guard instead. Adding explicit empty values for both vars ensures the test deterministically exercises the missing-phone-number path regardless of the host environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevents pre-existing TWILIO_PHONE_NUMBER or SMS_WEBHOOK_URL values in the outer test environment from leaking into the assertion context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a4c77ea to
a5b5790
Compare
|
Salvaged via #19745 onto current main. Test-file conflict had extra tests from your branch (insecure-flag regression and others); preserved them all. The sms.py changes weren't applied cleanly by the initial cherry-pick (silent file conflict), so I re-applied them manually and also updated the existing |
Summary
SmsAdapter.connect()(missingSMS_WEBHOOK_URL, missingTWILIO_PHONE_NUMBER) now call_set_fatal_error(..., retryable=False)before returningFalse, so the reconnect watcher removes them from the retry queue immediately and never re-enqueues them.DEFAULT_WEBHOOK_HOSTchanged from0.0.0.0to127.0.0.1— aligns with the recommended Cloudflare Tunnel deployment model (tunnel routes to127.0.0.1:<port>). Operators who need0.0.0.0can setSMS_WEBHOOK_HOST=0.0.0.0.The bugs
Retry-after-giving-up loop: when
SMS_WEBHOOK_URLis unset,connect()returnedFalsewithout settinghas_fatal_error. The reconnect watcher treated this as a transient failure, scheduled exponential back-off retries, and logged an error on every attempt — up to 20 times before finally logging "Giving up reconnecting sms after N attempts". For a permanent configuration error this cycle is pure log noise. After the fix, the gateway recognises the error as non-retryable on the first attempt and drops SMS from the retry queue immediately.0.0.0.0 bind: the Twilio webhook listener bound to all interfaces by default, exposing it to other hosts on the same network even though Cloudflare Tunnel (the recommended public-facing setup) only routes to
127.0.0.1. Changing the default eliminates the unnecessary exposure; the env-var override pathSMS_WEBHOOK_HOSTis unchanged for operators who require the old behaviour.The fix
gateway/platforms/sms.py:DEFAULT_WEBHOOK_HOST:"0.0.0.0"→"127.0.0.1"(+ module docstring)connect(): tworeturn Falsepaths that previously set no fatal-error state now callself._set_fatal_error("<code>", msg, retryable=False)before returningtests/gateway/test_sms.py:test_default_host_is_all_interfacesrenamed totest_default_host_is_localhost, assertion updatedtest_host_from_envupdated to verify env-var override to0.0.0.0still works (was testing override to the new default, which added no coverage)TestStartupGuardtests:test_missing_webhook_url_is_non_retryable,test_missing_phone_number_is_non_retryable,test_insecure_flag_does_not_set_fatal_errorTest plan
test_missing_webhook_url_is_non_retryablefails —has_fatal_errorisFalseon the old code pathtests/gateway/test_sms.py_set_fatal_errorcalls → both new non-retryable tests fail as expected; restored → 0 failurestests/gateway/) unchangedRelated
🤖 Generated with Claude Code