-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
feat(occurrences): Add tagname attr to eap #105074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, when we look to get all tags from EAP we run a `TraceItemAttributeNamesRequest`. The problem there is that that request only performs "best effort" filtering, which means we might over-fetch tags for a given group_id. Instead, let's add the tag names as their own attrs to occurrences so that we can draw & use those directly.
| AnyValue(string_value="tags[release]"), | ||
| ] | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Set iteration produces non-deterministic order in test assertion
The tag_keys variable is a set, which has non-deterministic iteration order. When encoding via _encode_value, the resulting array will have elements in an unpredictable order. However, the test assertion compares against a hardcoded list with a specific ordering (alphabetical: environment, level, release). This will cause flaky test failures since set iteration order can vary across Python versions, environments, or even individual runs due to hash randomization. Either the production code needs to sort the set before encoding, or the test needs to compare without considering order.
Additional Locations (1)
| format_tag_key = lambda key: f"tags[{key}]" | ||
|
|
||
| tag_keys = set() | ||
| for key, value in event_data["tags"]: | ||
| if value is None: | ||
| continue | ||
| attributes[f"tags[{key}]"] = _encode_value(value) | ||
| formatted_key = format_tag_key(key) | ||
| attributes[formatted_key] = _encode_value(value) | ||
| tag_keys.add(formatted_key) | ||
|
|
||
| attributes["tag_keys"] = _encode_value(tag_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The code encodes tag_keys from an unordered set, leading to a non-deterministic order in the output. This will cause flaky tests and unpredictable production behavior.
Severity: HIGH | Confidence: High
🔍 Detailed Analysis
The code collects tag keys into a set and then encodes it directly. Since Python's set iteration order is non-deterministic by default (due to hash randomization), the resulting tag_keys attribute will have an unpredictable order across different runs. This will cause flaky tests that assert a specific order and introduces non-deterministic behavior in production, potentially affecting downstream consumers that rely on a consistent ordering for caching, deduplication, or comparison logic.
💡 Suggested Fix
Sort the tag_keys set before encoding it to ensure a deterministic and consistent order. Change the line attributes["tag_keys"] = _encode_value(tag_keys) to attributes["tag_keys"] = _encode_value(sorted(tag_keys)).
🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/eventstream/item_helpers.py#L88-L98
Potential issue: The code collects tag keys into a `set` and then encodes it directly.
Since Python's set iteration order is non-deterministic by default (due to hash
randomization), the resulting `tag_keys` attribute will have an unpredictable order
across different runs. This will cause flaky tests that assert a specific order and
introduces non-deterministic behavior in production, potentially affecting downstream
consumers that rely on a consistent ordering for caching, deduplication, or comparison
logic.
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7609150
shashjar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just left a question for my own understanding
| attributes[formatted_key] = _encode_value(value) | ||
| tag_keys.add(formatted_key) | ||
|
|
||
| attributes["tag_keys"] = _encode_value(tag_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array encoding for tags is temporary and we're eventually going to migrate to the map representation once supported, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to, but I'm not 100% sure.
From Pierre:
and in general, it's better to split a map with different attributes so you can aggregate on each of them
arrays were a little bit different as we can't really split an array with different attributes
without pain and modifying attribute names to add the index at the end
which then forces us to do wildcard key search, not exact match
The `encode_attributes` function was storing tag keys in a set and then encoding it as an array. Since sets are unordered in Python, this caused non-deterministic ordering in the resulting array, leading to flaky test failures. This fix sorts the tag_keys set before encoding to ensure deterministic, consistent ordering across test runs. Fixes flaky tests: - tests/sentry/eventstream/test_item_helpers.py::ItemHelpersTest::test_encode_attributes_with_tags - tests/sentry/discover/test_dashboard_widget_split.py::DashboardWidgetDatasetSplitTestCase::test_unhandled_filter_sets_error_events_dataset - tests/sentry/discover/test_dataset_split.py::DiscoverSavedQueryDatasetSplitTestCase::test_unhandled_filter_sets_error_events_dataset Root cause introduced in: 15e26cf (feat(occurrences): Add tagname attr to eap #105074)
Currently, when we look to get all tags from EAP we run a
TraceItemAttributeNamesRequest. The problem there is that that request only performs "best effort" filtering, which means we might over-fetch tags for a given group_id. Instead, let's add the tag names as their own attrs to occurrences so that we can draw & use those directly.