feat: Update Group.filter_by_event_id to use Snuba#14035
Conversation
fpacifici
left a comment
There was a problem hiding this comment.
A few questions but overall it seems ok
src/sentry/models/group.py
Outdated
| ) | ||
| from sentry.utils import snuba | ||
|
|
||
| group_ids = set([evt['issue'] for evt in snuba.raw_query( |
There was a problem hiding this comment.
nit, I think this code would be more readable without the list comprehension.
| from sentry.utils import snuba | ||
|
|
||
| group_ids = set([evt['issue'] for evt in snuba.raw_query( | ||
| start=datetime.utcfromtimestamp(0), |
There was a problem hiding this comment.
Is the retention policy we have in snuba the same as the one we have in Event and EventMapping table? Snuba will apply can apply a default maximum timespan.
https://github.com/getsentry/snuba/blob/master/snuba/api.py#L170
@JTCunning Do we use that setting in production?
There was a problem hiding this comment.
We do have this setting in Production, but this should be scoped to an organization's retention.
You can find other queries scoped to retention like so:
sentry/src/sentry/search/snuba/backend.py
Lines 275 to 285 in 0a8c456
This also has lead me to believe that we have most likely not properly scoped other SnubaEvent-based queries to organization retention.
There was a problem hiding this comment.
As discussed on Slack, Sentry's snuba client shrinks every time window to be inside retention.
sentry/src/sentry/utils/snuba.py
Lines 561 to 574 in 576048a
There was a problem hiding this comment.
Which goes back to my original question.
Are we going to return fewer results than in the previous implementation since there is no retention applied to the postgres query ?
There was a problem hiding this comment.
No. Those are deleted at 90d every 5 minutes, and if we weren't previously capping the 30d retention, now is the time to.
src/sentry/models/group.py
Outdated
| 'event_id': [event_id], | ||
| 'project_id': project_ids, | ||
| }, | ||
| limit=1000, |
There was a problem hiding this comment.
Can this query ever return more than 1 value? We may be able to make it more efficient for clickhouse if we could limit the results.
There was a problem hiding this comment.
Theoretically I think it can since event IDs are only unique by project, not org. We only use this code here to detect a direct hit https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/organization_group_index.py#L129 and we end up discarding all other results apart from the first one anyway. I think it would be better to just return a single result here for direct hit, but not sure if we'll want to tackle that here.
There was a problem hiding this comment.
I've updated this function to only look for a single direct hit, since that's the only place where we actually use this code. As discussed on Slack, we need to preserve the existing behavior since event IDs are only unique by project. Updated limit to len(project_ids)
This is the second part of #13905, which updates the group model to use events from Snuba instead of Postgres. This PR updates the filter_by_event_id function.
99a56f9 to
e9403d9
Compare
This is the second part of #13905, which updates the group model to use events from Snuba instead of Postgres. This PR refactors the filter_by_event_id function.