Skip to content

[EPIC] Add throughput rate limits for metric buckets #2716

@jan-auer

Description

@jan-auer

The metrics consumers scale with bucket throughput. As there have been challenges to scale the pipeline lately, we're looking to limit the throughput to what the pipeline can handle for the interim.

In longer run, the pipeline should scale to any volume needed. The rate limiter will then primarily ensure cost-effective metrics use cases by placing a reasonable limit on the product.

Validation

  • Metrics:
    • Number of buckets rate limited by scope.
    • Metrics messages sent to Kafka (already exists in Relay)
    • Metrics messages as seen by the consumers
  • Alerts
    • Rate Limit hit
    • [Optional] Rate Limit capacity at X%

Design

  • Scoped Limits:
    • Global
    • Namespaces Total
    • Org Total
    • Namespace per Org
  • Goal is to protect Kafka throughput -> Rate Limit by Amount of Buckets, not Values/Datapoints
  • Use existing abuse quota system from Sentry to communicate down to relay

Technical Design

  • Extend Scopes to support namespaces and global, includes redis key
  • Add Rate Limiting to the EnvelopeProcessor, after splitting
  • Log outcomes for discarded/rate-limited Buckets
  • Rate limit on batches not on buckets
    • Discard all remaining batches if one hits the rate limiter
    • Possibly: rate limit already on the entire flush, splitting will only increase the number of total buckets

Limitations

  • Best Effort rate limiting, we don't need exact rate limits and consistency. Implementation may be consistent.
  • Processing Relays only
  • No propagation of rate limits, because we can only propagate Rate Limit Backoffs (Stop sending me stuff now). To be changed later.

Rollout

  • Load Test (on Loadtesting Infra)
  • S4S
  • Enable it in Prod
### Tasks
- [ ] https://github.com/getsentry/relay/pull/2758
- [ ] https://github.com/getsentry/sentry/pull/61666
- [ ] https://github.com/getsentry/relay/pull/2928
- [ ] https://github.com/getsentry/relay/pull/2941
- [ ] https://github.com/getsentry/sentry/pull/64574
- [x] Test global abuse quota in S4S
- [ ] https://github.com/getsentry/sentry-options-automator/pull/1117
- [x] Implement rate limiting with org/namespace scope for metrics - https://github.com/getsentry/relay/pull/3090
- [ ] https://github.com/getsentry/sentry/pull/68686
### Followups
- [ ] https://github.com/getsentry/relay/pull/3086

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions