Skip to content

Surface server Retry-After for rate-limit send failures#2016

Merged
AsamK merged 3 commits into
AsamK:masterfrom
tonycpsu:ratelimit-retry-after
Apr 15, 2026
Merged

Surface server Retry-After for rate-limit send failures#2016
AsamK merged 3 commits into
AsamK:masterfrom
tonycpsu:ratelimit-retry-after

Conversation

@tonycpsu

Copy link
Copy Markdown
Contributor

Summary

  • SendMessageResult now carries a rateLimitRetryAfterSeconds component populated from libsignal's RateLimitException.getRetryAfterMilliseconds() (for plain 413s) or ProofRequiredException.getRetryAfterSeconds() (for 428 challenges), giving clients a single place to read the server's advertised retry window.
  • JsonSendMessageResult.retryAfterSeconds is now populated for every RATE_LIMIT_FAILURE where the server sent Retry-After, not only for proof-required challenges. The canonical proof-required discriminator remains token != null.
  • Text output appends "retry after N seconds" to rate-limit failure lines when known.
  • The aggregate RateLimitErrorException thrown when all recipients rate-limit now carries the real nextAttemptTimestamp = now + retryAfter * 1000 instead of the hardcoded 0 placeholder.

Motivation

Closes #1996.

Before this change, clients had no way to know when it was safe to retry a rate-limited send. Bridges, bots, and other daemons would either retry too eagerly (extending the server-side window) or wait an arbitrary amount. The data was already available upstream in libsignal — signal-cli just wasn't forwarding it.

Design notes

  • Unified retry-after field. Both 413 and 428 cases feed into the same rateLimitRetryAfterSeconds component, so downstream code doesn't need to branch on failure subtype. A new SendMessageResults.maxRateLimitRetryAfterSeconds() helper returns the longest wait across a batch.
  • Ceiling division for ms → s. Plain rate-limit windows come in milliseconds; naive truncation would let clients retry up to 999ms early. Rounding up guarantees the reported window is never shorter than the server's. SendMessageResult.millisToCeilingSeconds(long) is a small package-private helper with its own boundary tests.
  • Source compatibility. Adding a component widens the record's canonical constructor. A non-canonical 8-arg delegating constructor is included so pre-existing callers still compile.
  • No new dependencies. Pure-JDK change to the existing API surface.

Test plan

  • ./gradlew build green
  • New JsonSendMessageResultTest covers: retry-after surfacing on plain 413, null retry-after when server omits Retry-After, successful sends leaving the field null, aggregate picking the longest window across a mix of populated/null entries
  • New SendMessageResultTest covers: ms→s ceiling boundaries (0, 1, 500, 999, 1000, 1001, 1500, 60000), and the 8-arg legacy constructor still compiles and produces a null retry-after
  • Live send smoke-tested end-to-end (successful send returns type: SUCCESS with no retryAfterSeconds, as expected)

The JSON contract extension (populating retryAfterSeconds for plain 413s where it was previously null) is documented in CHANGELOG.md under Unreleased.

tonycpsu and others added 3 commits April 13, 2026 01:17
libsignal-service's RateLimitException exposes retryAfterMilliseconds
for HTTP 413 responses, but signal-cli only forwarded retry-after for
ProofRequired (428) failures. Clients had no signal for when it was
safe to retry plain rate-limited sends, so every failed retry
potentially extended the server-side window.

SendMessageResult now carries an optional rateLimitRetryAfterSeconds,
populated from the upstream Optional<Long>. JsonSendMessageResult
exposes it for RATE_LIMIT_FAILURE type. Text output includes the
window when known. Aggregate RateLimitErrorException now carries the
real nextAttemptTimestamp (was hardcoded to 0).

Closes AsamK#1996.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…millis

Codex adversarial review flagged two issues in the phase 1 retry-after
plumbing:

* Aggregate retry-after ignored proof-required failures. Because
  isRateLimitFailure is true for proof-required cases but
  rateLimitRetryAfterSeconds was only populated from plain 413s, an
  all-proof-required batch (or a mixed batch where the proof-required
  delay was longer) could flow into outputResult() and produce a
  RateLimitException(0), telling callers to retry immediately.

* Millisecond Retry-After values were truncated by integer division,
  so 1..999ms became 0 and non-second-aligned values lost up to 999ms.
  A retry suggested from the floored value can land before the
  server's real deadline and re-trigger the limit.

SendMessageResult.from(...) now populates rateLimitRetryAfterSeconds
from either the proof-required seconds or the plain rate-limit ms
(converted via ceiling division), giving maxRateLimitRetryAfterSeconds
a single source of truth. JsonSendMessageResult.from(...) reads the
unified field. New millisToCeilingSeconds helper plus boundary test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a non-canonical 8-arg SendMessageResult constructor that delegates
to the canonical form with null retry-after. This keeps source
compatibility for any downstream code that constructs the record
directly (tests, mocks) without changing the canonical shape. Records
permit additional constructors alongside the canonical one.

Document the retryAfterSeconds meaning change in the CHANGELOG. The
field was previously populated only for proof-required failures; it
is now populated whenever the server sends a Retry-After header. The
canonical proof-required discriminator is still token != null.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tonycpsu tonycpsu force-pushed the ratelimit-retry-after branch from c255551 to 0b1261c Compare April 13, 2026 05:22
@AsamK AsamK merged commit e1b17bf into AsamK:master Apr 15, 2026
8 checks passed
@AsamK

AsamK commented Apr 15, 2026

Copy link
Copy Markdown
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Idea: 429 - Rate limit

2 participants