[EPIC] Move the metrics cardinality limiter to Relay

The performance of metrics queries depends on the number of time series they have to aggregate. To put an upper bound on the number of time series - and thus the number of metric buckets with a unique combination of tag values - the metrics ingestion pipeline has a **cardinality limiter**.

It resides in the metrics indexer consumer, currently causing more than half of its CPU usage and introducing additional latency due to a high number of Redis calls. There are two reasons to move the cardinality limiter out of the indexer consumers: 

1. The limiter currently does not apply to Relay.
2. The indexer consumers are a bottleneck and rely on low-latency high throughput.

# Stage 1

```[tasklist]
### Stage 1 / Re-Implementation
- [ ] https://github.com/getsentry/relay/issues/2719
- [x] Organize a Memstore/Redis Instance (We decided to use the existing Relay Redis for now)
- [x] Pass configuration via global config down from Sentry
- [ ] https://github.com/getsentry/sentry/pull/61028
- [ ] https://github.com/getsentry/sentry/pull/61024
- [ ] https://github.com/getsentry/getsentry/pull/12241
- [ ] https://github.com/getsentry/relay/pull/2745
- [x] ~~`SMISMEMBER` instead of loading the entire set~~ (Needs Redis 6.2)
- [ ] https://github.com/getsentry/relay/pull/2849
- [x] Hookup Cardinality Limiter into existing Processing Pipeline
- [x] Investiage alternative HashSet hasher, e.g. AHash for Redis implementation
- [ ] https://github.com/getsentry/snuba/pull/5349/
- [ ] https://github.com/getsentry/sentry-options-automator/pull/574
- [ ] https://github.com/getsentry/relay/pull/2947
- [ ] https://github.com/getsentry/sentry/pull/63562
- [ ] https://github.com/getsentry/relay/pull/2972
- [ ] https://github.com/getsentry/relay/pull/2990
- [ ] https://github.com/getsentry/relay/pull/2978
- [ ] https://github.com/getsentry/relay/pull/3008
- [ ] https://github.com/getsentry/relay/pull/3020
- [ ] https://github.com/getsentry/relay/pull/3018
- [ ] https://github.com/getsentry/relay/issues/3010
- [ ] https://github.com/getsentry/relay/pull/3029
- [ ] https://github.com/getsentry/relay/pull/3053
- [ ] https://github.com/getsentry/relay/pull/3071
- [ ] https://github.com/getsentry/relay/pull/3076
- [ ] https://github.com/getsentry/ops/pull/9381
- [ ] https://github.com/getsentry/relay/pull/3199
- [ ] https://github.com/getsentry/sentry/pull/66299
- [x] ~~Investigate using the hash as a hashmap hash instead of hashing the hash, e.g. https://crates.io/crates/hash_hasher~~  Performs very badly with certain hashes, minimal gain
```

## Validation

- Metrics
  - Same metrics as in the indexer cardinality limiter
  - Observability if org limits are hit through Sentry Errors and metrics

## Rollout

- New memstore
- Load Test (cardinality), adapt existing indexer load tests
- Rollout in S4S
  - Use a really low sample rate, slowly ramp up the rate
  - Verify indexer limiter is now longer hit
  - Turn off limiter in Indexer
- Rollout in Production
  - Same as above, just more informed decisions

# Stage 2

~~Replace Redis Set with in memory HLL and Bloomfilter, which is repeatedly merged back into Redis to synchronize all nodes.~~

While this is still an option to use either probabilistic data structure in either Redis or Relay, it seems a Bloomfilter can [reduce the amount of bits we store by 50%](https://redis.io/docs/data-types/probabilistic/bloom-filter/#total-size-of-a-bloom-filter) 32 to ~15, while at the same time decreasing lookup and write performance. Our current bottleneck is more likely to be CPU than Storage, so this makes this option less interesting and valuable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Move the metrics cardinality limiter to Relay #2717

Stage 1

Validation

Rollout

Stage 2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[EPIC] Move the metrics cardinality limiter to Relay #2717

Description

Stage 1

Validation

Rollout

Stage 2

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions