-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
feat(attribute-distributions): parallelize stats query #104113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(attribute-distributions): parallelize stats query #104113
Conversation
44c5aa5 to
5b6b8fa
Compare
508a20b to
3ccc2e8
Compare
5a5b131 to
02db9ee
Compare
| attrs_response = snuba_rpc.attribute_names_rpc(attrs_request) | ||
|
|
||
| # Chunk attributes and run stats query in parallel | ||
| chunked_attributes: dict[int, list[AttributeKey]] = defaultdict(list[AttributeKey]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: defaultdict is incorrectly instantiated with list[AttributeKey] (a type hint) instead of a callable list, causing a TypeError on first access.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The defaultdict constructor is incorrectly instantiated with list[AttributeKey]. This is a type hint (a types.GenericAlias object in Python 3.9+), not a callable function. When a missing key is accessed in chunked_attributes, defaultdict attempts to call list[AttributeKey](), leading to a TypeError: 'types.GenericAlias' object is not callable. This will cause an immediate server crash in OrganizationTraceItemsStatsEndpoint.get() and OrganizationTraceItemsAttributesRankedEndpoint.get() when processing attribute distributions.
💡 Suggested Fix
Change defaultdict(list[AttributeKey]) to defaultdict(list). The type hint dict[int, list[AttributeKey]] is correct and should remain.
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/api/endpoints/organization_trace_item_stats.py#L97
Potential issue: The `defaultdict` constructor is incorrectly instantiated with
`list[AttributeKey]`. This is a type hint (a `types.GenericAlias` object in Python
3.9+), not a callable function. When a missing key is accessed in `chunked_attributes`,
`defaultdict` attempts to call `list[AttributeKey]()`, leading to a `TypeError:
'types.GenericAlias' object is not callable`. This will cause an immediate server crash
in `OrganizationTraceItemsStatsEndpoint.get()` and
`OrganizationTraceItemsAttributesRankedEndpoint.get()` when processing attribute
distributions.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 5300574
0242c8b to
cee4d15
Compare
| for stats in result: | ||
| for stats_type, data in stats.items(): | ||
| stats_results[stats_type]["data"].update(data["data"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The organization_trace_item_stats.py endpoint's response format changed, breaking API compatibility.
Severity: CRITICAL | Confidence: High
🔍 Detailed Analysis
The organization_trace_item_stats.py endpoint's response format has changed from a dictionary structure, {"data": {stat_type: {...}}}, to a list of single-key dictionaries, {"data": [{stat_type: {...}}, ...]}. This transformation is a breaking API change that will cause clients expecting the original dictionary format to fail.
💡 Suggested Fix
Revert the response format in organization_trace_item_stats.py line 135 to return Response({"data": stats_results}) instead of Response({"data": [{k: v} for k, v in stats_results.items()]}).
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/api/endpoints/organization_trace_item_stats.py#L131-L133
Potential issue: The `organization_trace_item_stats.py` endpoint's response format has
changed from a dictionary structure, `{"data": {stat_type: {...}}}`, to a list of
single-key dictionaries, `{"data": [{stat_type: {...}}, ...]}`. This transformation is a
breaking API change that will cause clients expecting the original dictionary format to
fail.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 5302423
| meta=attrs_meta, | ||
| limit=max_attributes, | ||
| type=attr_type, | ||
| intersecting_attributes_filter=cohort_2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Suspect cohort attributes missed due to baseline-only attribute filtering
The TraceItemAttributeNamesRequest uses intersecting_attributes_filter=cohort_2 (the baseline cohort), which means only attributes present in the baseline are fetched and analyzed. These attributes are then used to query both the suspect cohort (cohort_1) and baseline cohort (cohort_2). Attributes unique to the suspect cohort are completely missed, which defeats the purpose of comparing cohorts to find differentiating attributes. The old code didn't pre-filter attributes, so it analyzed all attributes from both cohorts independently.
Abdkhan14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, I would look at that seer comment to make sure we don't break the FE
stats results would have returned |
Parallelize the stats query by explicitly passing a list of attributes to fetch.
Hoping to make the stats and ranked endpoints faster this way.