Skip to content

Conversation

@thetruecpaul
Copy link
Contributor

Currently, when we look to get all tags from EAP we run a TraceItemAttributeNamesRequest. The problem there is that that request only performs "best effort" filtering, which means we might over-fetch tags for a given group_id. Instead, let's add the tag names as their own attrs to occurrences so that we can draw & use those directly.

Currently, when we look to get all tags from EAP we run a `TraceItemAttributeNamesRequest`. The problem there is that that request only performs "best effort" filtering, which means we might over-fetch tags for a given group_id. Instead, let's add the tag names as their own attrs to occurrences so that we can draw & use those directly.
@thetruecpaul thetruecpaul requested a review from a team December 16, 2025 19:30
@thetruecpaul thetruecpaul requested a review from a team as a code owner December 16, 2025 19:30
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 16, 2025
AnyValue(string_value="tags[release]"),
]
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Set iteration produces non-deterministic order in test assertion

The tag_keys variable is a set, which has non-deterministic iteration order. When encoding via _encode_value, the resulting array will have elements in an unpredictable order. However, the test assertion compares against a hardcoded list with a specific ordering (alphabetical: environment, level, release). This will cause flaky test failures since set iteration order can vary across Python versions, environments, or even individual runs due to hash randomization. Either the production code needs to sort the set before encoding, or the test needs to compare without considering order.

Additional Locations (1)

Fix in Cursor Fix in Web

Comment on lines +88 to +98
format_tag_key = lambda key: f"tags[{key}]"

tag_keys = set()
for key, value in event_data["tags"]:
if value is None:
continue
attributes[f"tags[{key}]"] = _encode_value(value)
formatted_key = format_tag_key(key)
attributes[formatted_key] = _encode_value(value)
tag_keys.add(formatted_key)

attributes["tag_keys"] = _encode_value(tag_keys)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code encodes tag_keys from an unordered set, leading to a non-deterministic order in the output. This will cause flaky tests and unpredictable production behavior.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

The code collects tag keys into a set and then encodes it directly. Since Python's set iteration order is non-deterministic by default (due to hash randomization), the resulting tag_keys attribute will have an unpredictable order across different runs. This will cause flaky tests that assert a specific order and introduces non-deterministic behavior in production, potentially affecting downstream consumers that rely on a consistent ordering for caching, deduplication, or comparison logic.

💡 Suggested Fix

Sort the tag_keys set before encoding it to ensure a deterministic and consistent order. Change the line attributes["tag_keys"] = _encode_value(tag_keys) to attributes["tag_keys"] = _encode_value(sorted(tag_keys)).

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/eventstream/item_helpers.py#L88-L98

Potential issue: The code collects tag keys into a `set` and then encodes it directly.
Since Python's set iteration order is non-deterministic by default (due to hash
randomization), the resulting `tag_keys` attribute will have an unpredictable order
across different runs. This will cause flaky tests that assert a specific order and
introduces non-deterministic behavior in production, potentially affecting downstream
consumers that rely on a consistent ordering for caching, deduplication, or comparison
logic.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7609150

Copy link
Member

@shashjar shashjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just left a question for my own understanding

attributes[formatted_key] = _encode_value(value)
tag_keys.add(formatted_key)

attributes["tag_keys"] = _encode_value(tag_keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array encoding for tags is temporary and we're eventually going to migrate to the map representation once supported, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to, but I'm not 100% sure.

From Pierre:

and in general, it's better to split a map with different attributes so you can aggregate on each of them
arrays were a little bit different as we can't really split an array with different attributes
without pain and modifying attribute names to add the index at the end
which then forces us to do wildcard key search, not exact match

@thetruecpaul thetruecpaul merged commit 15e26cf into master Dec 16, 2025
68 checks passed
@thetruecpaul thetruecpaul deleted the cpaul/121625/occeap-tagnames branch December 16, 2025 22:18
JoshFerge added a commit that referenced this pull request Dec 17, 2025
The `encode_attributes` function was storing tag keys in a set and then
encoding it as an array. Since sets are unordered in Python, this caused
non-deterministic ordering in the resulting array, leading to flaky test
failures.

This fix sorts the tag_keys set before encoding to ensure deterministic,
consistent ordering across test runs.

Fixes flaky tests:
- tests/sentry/eventstream/test_item_helpers.py::ItemHelpersTest::test_encode_attributes_with_tags
- tests/sentry/discover/test_dashboard_widget_split.py::DashboardWidgetDatasetSplitTestCase::test_unhandled_filter_sets_error_events_dataset
- tests/sentry/discover/test_dataset_split.py::DiscoverSavedQueryDatasetSplitTestCase::test_unhandled_filter_sets_error_events_dataset

Root cause introduced in: 15e26cf (feat(occurrences): Add tagname attr to eap #105074)
@github-actions github-actions bot locked and limited conversation to collaborators Jan 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants