-
Notifications
You must be signed in to change notification settings - Fork 110
Closed
Description
The metrics consumers scale with bucket throughput. As there have been challenges to scale the pipeline lately, we're looking to limit the throughput to what the pipeline can handle for the interim.
In longer run, the pipeline should scale to any volume needed. The rate limiter will then primarily ensure cost-effective metrics use cases by placing a reasonable limit on the product.
Validation
- Metrics:
- Number of buckets rate limited by scope.
- Metrics messages sent to Kafka (already exists in Relay)
- Metrics messages as seen by the consumers
- Alerts
- Rate Limit hit
- [Optional] Rate Limit capacity at X%
Design
- Scoped Limits:
- Global
- Namespaces Total
- Org Total
- Namespace per Org
- Goal is to protect Kafka throughput -> Rate Limit by Amount of Buckets, not Values/Datapoints
- Use existing abuse quota system from Sentry to communicate down to relay
Technical Design
- Extend Scopes to support namespaces and global, includes redis key
- Add Rate Limiting to the EnvelopeProcessor, after splitting
- Log outcomes for discarded/rate-limited Buckets
- Rate limit on batches not on buckets
- Discard all remaining batches if one hits the rate limiter
- Possibly: rate limit already on the entire flush, splitting will only increase the number of total buckets
Limitations
- Best Effort rate limiting, we don't need exact rate limits and consistency. Implementation may be consistent.
- Processing Relays only
- No propagation of rate limits, because we can only propagate Rate Limit Backoffs (Stop sending me stuff now). To be changed later.
Rollout
- Load Test (on Loadtesting Infra)
- S4S
- Enable it in Prod
### Tasks
- [ ] https://github.com/getsentry/relay/pull/2758
- [ ] https://github.com/getsentry/sentry/pull/61666
- [ ] https://github.com/getsentry/relay/pull/2928
- [ ] https://github.com/getsentry/relay/pull/2941
- [ ] https://github.com/getsentry/sentry/pull/64574
- [x] Test global abuse quota in S4S
- [ ] https://github.com/getsentry/sentry-options-automator/pull/1117
- [x] Implement rate limiting with org/namespace scope for metrics - https://github.com/getsentry/relay/pull/3090
- [ ] https://github.com/getsentry/sentry/pull/68686
### Followups
- [ ] https://github.com/getsentry/relay/pull/3086
Reactions are currently unavailable