Skip to content

feat(dynamic-sampling): Implement prioritize by project bias [TET-574]#42939

Merged
andriisoldatenko merged 43 commits intomasterfrom
andrii/implement-cronjob-to-fetch-orgs-projects
Feb 28, 2023
Merged

feat(dynamic-sampling): Implement prioritize by project bias [TET-574]#42939
andriisoldatenko merged 43 commits intomasterfrom
andrii/implement-cronjob-to-fetch-orgs-projects

Conversation

@andriisoldatenko
Copy link
Contributor

@andriisoldatenko andriisoldatenko commented Jan 9, 2023

This PR implements prioritize by project bias.

In detail:
We run celery task every 24 at 8:00AM (UTC randomly selected) for every ORG (we call it prioritise by project snuba query ) and all projects inside this org, and for a given combination of org and projects run an adjustment model to recalculate sample rates if necessary.

Then we cache sample rate using redis cluster -> SENTRY_DYNAMIC_SAMPLING_RULES_REDIS_CLUSTER using this pattern for key: f"ds::o:{org_id}:p:{project_id}:prioritise_projects".

When relay fetches projectconfig endpoint we run generate_rules functions to generate all dynamic sampling biases, so and we check if we have adjusted sample rate for this project in the cache, so we apply it as uniform bias, otherwise we use default one.

Regarding prioritize by project snuba query is cross org snuba query that utilizes a new generic counter metric, which was introduced in relay c:transactions/count_per_root_project@none.

TODO:

  • Provision infrastructure to run clickhouse clusters for the counters tables. This is primarily dependent on ops
  • Start running the snuba consumers to read and write to the counters table. SnS can work on this
  • Add unit-tests;
  • Update snuba query using new metric
  • Hide behind feature flag

related PRs:

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 9, 2023
@andriisoldatenko andriisoldatenko changed the title feat(sampling): Implement prioritize by project bias [TET-569] feat(dynamic-sampling): Implement prioritize by project bias [TET-569] Jan 9, 2023
@github-actions github-actions bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Jan 16, 2023
@github-actions
Copy link
Contributor

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 77f6033 to b074f9a Compare January 16, 2023 08:10
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 8532860 to 6071553 Compare January 22, 2023 16:55
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 6071553 to e4ed330 Compare January 26, 2023 17:01
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from b81e3bd to 955c3bf Compare February 7, 2023 12:44
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 955c3bf to 1c0bbcf Compare February 12, 2023 17:36
@andriisoldatenko
Copy link
Contributor Author

Tested locally with 1 org 1 project - no changes:

15:33:18 worker                                | 15:33:18 [INFO] sentry.dynamic_sampling: rules_generator.generate_rules (org_id=1 project_id=1 rules=[{'id': 1002, 'type': 'ignoreHealthChecks', 'samplingValue': {'type': 'sampleRate', 'value': 0.05}, 'healthChecks': ['*healthcheck*', '*healthy*', '*live*', '*ready*', '*heartbeat*', '*/health', '*/healthz']}, {'id': 1000, 'type': 'uniformRule', 'samplingValue': {'type': 'sampleRate', 'value': 0.25}}])

With 1 org and 2 projects (blended rate - 0.25):

16:18:01 worker                                | 16:18:01 [INFO] sentry: monitor.missed-checkin (monitor_id=4)
16:18:01 worker                                | 16:18:01 [INFO] sentry: monitor.missed-checkin (monitor_id=1)
16:18:01 worker                                | 16:18:01 [INFO] sentry.auth: !!!!!!!!!!!!!!!!!!!!!!!!!!check_auth
16:18:01 worker                                | 16:18:01 [INFO] sentry.dynamic_sampling.tasks: !!! start prioritise_projects
16:18:01 worker                                | 16:18:01 [INFO] sentry.dynamic_sampling.tasks: !!! 1 [(1, 1452.0), (8, 100.0)]
16:18:01 worker                                | 16:18:01 [INFO] sentry.dynamic_sampling.tasks: !!! start process_projects_sample_rates
16:18:01 worker                                | 16:18:01 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
16:18:04 worker                                | 16:18:04 [INFO] sentry.tasks.groupowner: process_suspect_commits.skipped (reason='no_release')
16:18:04 worker                                | 16:18:04 [INFO] sentry.tasks.groupowner: process_suspect_commits.skipped (reason='no_release')
16:18:07 worker                                | 16:18:07 [INFO] sentry.dynamic_sampling: rules_generator.generate_rules (org_id=1 project_id=1 rules=[{'id': 1002, 'type': 'ignoreHealthChecks', 'samplingValue': {'type': 'sampleRate', 'value': 0.03966942148760331}, 'healthChecks': ['*healthcheck*', '*healthy*', '*live*', '*ready*', '*heartbeat*', '*/health', '*/healthz']}, {'id': 1000, 'type': 'uniformRule', 'samplingValue': {'type': 'sampleRate', 'value': 0.19834710743801653}}])

],
groupby=[Column("org_id"), Column("project_id")],
where=[
Condition(Function("modulo", [Column("org_id"), 100]), Op.LT, sample_rate),
Copy link
Contributor

@onewland onewland Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might work fine but I'm not sure how efficient WHERE org_id % 100 < 45 (or some other n) will be at scanning the table.

@nikhars do you have an opinion on this? I think given the data size and the filtering by metric_id it would probably work but I wonder if they'd get better ClickHouse performance if they enumerated the org_id's to check into batches of 5k or 10k and filtering before sending the query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can run the query and provide information whether this sort of WHERE clause would be helpful or not.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants