Skip to content

Conversation

@scttcper
Copy link
Member

GA calculating empty tags by running two queries instead of a single query with a bunch of countif statements. Some issues have more tags than allowed in a single query

Remove the temporary test, we already have another test covering this case.

GA calculating empty tags by running two queries instead of a single query with a bunch of countif statements. Some issues have more tags than allowed in a single query
@scttcper scttcper requested review from a team as code owners December 22, 2025 20:16
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 22, 2025
Comment on lines 998 to 999
return {}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When conditions are provided, they are not applied to the keys_with_counts query. This can lead to an incomplete keys_to_check list and incorrect empty tag counts.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

In the get_group_tag_keys_and_top_values method, the keys_with_counts list is generated from a query that does not use the optional conditions parameter. However, the values_by_key dictionary is populated from a separate query that does apply these conditions. If the conditions filter out all events for a specific tag key, that key will be present in keys_with_counts but absent from values_by_key. The logic keys_to_check = list(values_by_key.keys()) or ... will then create an incomplete list for keys_to_check if values_by_key is partially populated. This causes the subsequent empty count calculation to skip the missing keys, resulting in an incorrect final count for those tags.

💡 Suggested Fix

The conditions parameter should be applied consistently to both the query that generates keys_with_counts and the query that generates values_by_key. This ensures that both lists are derived from the same set of events, preventing discrepancies.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/tagstore/snuba/backend.py#L998-L999

Potential issue: In the `get_group_tag_keys_and_top_values` method, the
`keys_with_counts` list is generated from a query that does not use the optional
`conditions` parameter. However, the `values_by_key` dictionary is populated from a
separate query that does apply these `conditions`. If the `conditions` filter out all
events for a specific tag key, that key will be present in `keys_with_counts` but absent
from `values_by_key`. The logic `keys_to_check = list(values_by_key.keys()) or ...` will
then create an incomplete list for `keys_to_check` if `values_by_key` is partially
populated. This causes the subsequent empty count calculation to skip the missing keys,
resulting in an incorrect final count for those tags.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7842998

@scttcper scttcper changed the title feat(issues): GA tags subtraction query feat(issues): GA empty tags subtraction query Dec 22, 2025
@scttcper scttcper merged commit 7fce6ab into master Dec 22, 2025
69 checks passed
@scttcper scttcper deleted the scttcper/ga-subtraction-tags branch December 22, 2025 23:38
@github-actions github-actions bot locked and limited conversation to collaborators Jan 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants