feat(server): Org rate limit per metric bucket by TBS1996 · Pull Request #2758 · getsentry/relay

TBS1996 · 2023-11-23T09:37:49Z

part of: #2716

in order to protect our kafka metric consumers, we want to have a way of rate limiting based on the amount of buckets, as that's what decides the load placed on our kafka topics.

We are starting out with just the org throughput limits but will be expanded upon further as outlined in the linked epic.

relay-server/src/actors/processor.rs

Dav1dde

Just some quick nits, I'll have to look into the outcome logging more carefully later.

I would like if we didn't have to duplicate this and somehow can share as much code as possible with the ManagedEnvelope logic (when we drop the envelope transactions and profile outcomes are created). But I'll need more time to think about this.

relay-server/src/utils/metrics_rate_limits.rs

relay-server/src/actors/processor.rs

Dav1dde · 2023-12-05T08:37:01Z

relay-server/src/actors/processor.rs

+        let _bucket_qty = buckets.len();
+        let bucket_partitions = partition_buckets(scoping.project_key, buckets, partitions);
+
+        #[cfg(feature = "processing")]


I think it makes sense to check the cached rate limits even on non-processing relays

well atm, they just get updated from redis, which isnt available on non-processing relays, so im not sure if that would make sense

I am not sure how propagation of quotas/limits to pops is exactly working, but I don't see a reason why we cant check it now and once we actually have a working propagation everything just works

relay-server/src/actors/processor.rs

iker-barriocanal

Does this PR rate-limit buckets on a binary basis? Do we want to consider scenarios in which some buckets are rate-limited but others aren't?

relay-server/src/actors/processor.rs

iker-barriocanal · 2023-12-05T13:08:37Z

relay-server/src/actors/processor.rs

+
+            self.drop_buckets_with_outcomes(
+                reason_code,
+                total_batches,


What's the difference between total_batches and bucket_qty (function param), the maximum size of the bucket? Can we use just one of the two? It seems odd to me that sometimes we use one and sometimes the other (see when emitting outcomes and on log messages).

bucket_qty is the amount of total buckets in the flush, total_batches is the amount of batches we are writing upstream (aka. amount of requests to upstream or amount of messages in the kafka topic).

iker-barriocanal · 2023-12-05T13:28:35Z

relay-server/src/actors/processor.rs

+            event_id: None,
+            remote_addr: None,
+            category: DataCategory::Metrics,
+            quantity: total_batches as u32,


Should we exclude buckets with the usage metric from rate-limited outcomes? These metrics are data we (sentry) generate but the user doesn't care about, and I assume these outcomes will be presented to them.

Dav1dde · 2023-12-05T14:12:20Z

@iker-barriocanal

Does this PR rate-limit buckets on a binary basis? Do we want to consider scenarios in which some buckets are rate-limited but others aren't?

This is on the roadmap but not planned for this PR, the idea is to rate limit based on namespace (there is some info in the epic: #2716)

jan-auer

This looks good to me. Could you add an integration test?

CHANGELOG.md

jan-auer · 2023-12-06T08:08:20Z

relay-cabi/include/relay.h

+  /**
+   * Metric bucket.
+   */
+  RELAY_DATA_CATEGORY_METRIC_BUCKET = 15,


Note that this will require a release of the python library and bump in Sentry and Snuba.

Dav1dde · 2023-12-06T10:42:34Z

CHANGELOG.md

 - Add Mixed JS/Android Profiles events processing. ([#2706](https://github.com/getsentry/relay/pull/2706))
 - Allow to ingest measurements on a span. ([#2792](https://github.com/getsentry/relay/pull/2792))
+- Add size limits on metric related envelope items. ([#2800](https://github.com/getsentry/relay/pull/2800))
+- Include the size offending item in the size limit error message. ([#2801](https://github.com/getsentry/relay/pull/2801))


These changes shouldnt be here

Dav1dde · 2023-12-06T10:44:48Z

relay-server/src/actors/processor.rs

+            source_quantities += source_quantities_from_buckets(&BucketsView::new(buckets), mode);
+        }
+
+        let timestamp = UnixTimestamp::now().as_datetime().unwrap_or_else(Utc::now);


Is this not just Utc::now() why the roundtrip through UnixTimestamp?

Dav1dde · 2023-12-06T10:49:22Z

tests/integration/test_store.py

+    ]
+
+    def generate_ticks():
+        # Generate a new timestamp for every bucket, so they do not get merged by the aggregator


We can also use tags for this, should be easier than having timestamps tied to bucket_interval

This reverts commit 8353064.

Reverts #2758 This was deployed before sentry updated librelay, so sentry crashed with unknown data category.

we had an incident when deploying #2758 because we unexpextedly sent outcomes of type ItemBucket without having updated sentry. This PR only adds the new datacategory so we can properly synchronize on both ends before adding the business logic. #2821 (comment)

part of: #2716 in order to protect our kafka metric consumers, we want to have a way of rate limiting based on the amount of buckets, as that's what decides the load placed on our kafka topics. We are starting out with just the org throughput limits but will be expanded upon further as outlined in the linked epic. This is a re-revert of #2758 (reverted in #2821). --------- Co-authored-by: Joris Bayer <joris.bayer@sentry.io>

Dav1dde and others added 7 commits November 17, 2023 16:29

split and partition buckets in envelope processor

49b6d7d

produce outcomes for metrics

da0ebea

wip

3e3f869

wip

bdb83ac

wip

fa0c081

wip

3c40af6

add outcomes

99c00c4

TBS1996 self-assigned this Nov 24, 2023

TBS1996 added 11 commits November 28, 2023 14:21

merge attempt

f01ad02

wip

59edc28

wip

7714338

wip

8285e39

wip

0722a57

wip

f2dd09e

wip

d4cd3fc

wip

60040e8

wip

86e97dd

wip

b46d7ea

wip

e2d60c4

TBS1996 marked this pull request as ready for review November 30, 2023 11:08

TBS1996 requested a review from a team November 30, 2023 11:08

TBS1996 changed the title ~~feat(server): org total rate limit~~ feat(server): Org rate limit per batch Nov 30, 2023

TBS1996 added 2 commits November 30, 2023 12:58

cl

9ea64bb

make header

a8b8701

TBS1996 requested a review from a team as a code owner December 1, 2023 09:29

Dav1dde reviewed Dec 1, 2023

View reviewed changes

relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved

relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved

relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved

relay-server/src/actors/processor.rs Show resolved Hide resolved

Dav1dde mentioned this pull request Dec 1, 2023

[EPIC] Add throughput rate limits for metric buckets #2716

Closed

outcomes

531bd28

Dav1dde reviewed Dec 1, 2023

View reviewed changes

relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved

Dav1dde reviewed Dec 1, 2023

View reviewed changes

relay-server/src/utils/metrics_rate_limits.rs Outdated Show resolved Hide resolved

relay-server/src/utils/metrics_rate_limits.rs Outdated Show resolved Hide resolved

relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved

Dav1dde reviewed Dec 5, 2023

View reviewed changes

address feedback

38ce8ee

jjbayer reviewed Dec 5, 2023

View reviewed changes

relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved

limiting on non-proc relays

9ecd26c

iker-barriocanal reviewed Dec 5, 2023

View reviewed changes

TBS1996 added 4 commits December 5, 2023 14:38

change to ratelimiting per bucket

e7b1b49

nit

e485c7b

rename batches

e4efae4

rename batches

54d8e0a

jan-auer changed the title ~~feat(server): Org rate limit per batch~~ feat(server): Org rate limit per metric bucket Dec 6, 2023

jan-auer reviewed Dec 6, 2023

View reviewed changes

TBS1996 added 2 commits December 6, 2023 09:29

add integration test

7de002a

Merge branch 'master' into tor/orgtotal

2e5725c

Dav1dde reviewed Dec 6, 2023

View reviewed changes

TBS1996 added 4 commits December 6, 2023 13:35

fix timestamp and log

82326a0

fix

99e7efd

fix the fix

7d3e220

nit

5aa22cc

Dav1dde approved these changes Dec 6, 2023

View reviewed changes

fix comp

974ebe5

TBS1996 merged commit 8353064 into master Dec 6, 2023

TBS1996 deleted the tor/orgtotal branch December 6, 2023 14:07

jjbayer added a commit that referenced this pull request Dec 6, 2023

Revert "feat(server): Org rate limit per metric bucket (#2758)"

affba97

This reverts commit 8353064.

jjbayer mentioned this pull request Dec 6, 2023

Revert "feat(server): Org rate limit per metric bucket" #2821

Merged

jjbayer added a commit that referenced this pull request Dec 6, 2023

Revert "feat(server): Org rate limit per metric bucket" (#2821)

3243430

Reverts #2758 This was deployed before sentry updated librelay, so sentry crashed with unknown data category.

TBS1996 mentioned this pull request Dec 7, 2023

feat(schema): Add new MetricBucket datacategory #2824

Merged

jjbayer mentioned this pull request Dec 13, 2023

feat(server): Org rate limit per metric bucket #2836

Merged

Conversation

TBS1996 commented Nov 23, 2023 • edited by jan-auer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dav1dde left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

iker-barriocanal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dav1dde Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dav1dde commented Dec 5, 2023

Uh oh!

jan-auer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TBS1996 commented Nov 23, 2023 •

edited by jan-auer

Loading

Dav1dde Dec 5, 2023 •

edited

Loading