feat(occurrences): Add tagname attr to eap #105074

thetruecpaul · 2025-12-16T19:30:33Z

Currently, when we look to get all tags from EAP we run a TraceItemAttributeNamesRequest. The problem there is that that request only performs "best effort" filtering, which means we might over-fetch tags for a given group_id. Instead, let's add the tag names as their own attrs to occurrences so that we can draw & use those directly.

Currently, when we look to get all tags from EAP we run a `TraceItemAttributeNamesRequest`. The problem there is that that request only performs "best effort" filtering, which means we might over-fetch tags for a given group_id. Instead, let's add the tag names as their own attrs to occurrences so that we can draw & use those directly.

cursor · 2025-12-16T19:32:57Z

tests/sentry/eventstream/test_item_helpers.py

+                    AnyValue(string_value="tags[release]"),
+                ]
+            )
+        )


Bug: Set iteration produces non-deterministic order in test assertion

The tag_keys variable is a set, which has non-deterministic iteration order. When encoding via _encode_value, the resulting array will have elements in an unpredictable order. However, the test assertion compares against a hardcoded list with a specific ordering (alphabetical: environment, level, release). This will cause flaky test failures since set iteration order can vary across Python versions, environments, or even individual runs due to hash randomization. Either the production code needs to sort the set before encoding, or the test needs to compare without considering order.

Additional Locations (1)

src/sentry/eventstream/item_helpers.py#L89-L98

sentry · 2025-12-16T19:33:14Z

src/sentry/eventstream/item_helpers.py

+    format_tag_key = lambda key: f"tags[{key}]"
+
+    tag_keys = set()
    for key, value in event_data["tags"]:
        if value is None:
            continue
-        attributes[f"tags[{key}]"] = _encode_value(value)
+        formatted_key = format_tag_key(key)
+        attributes[formatted_key] = _encode_value(value)
+        tag_keys.add(formatted_key)
+
+    attributes["tag_keys"] = _encode_value(tag_keys)


Bug: The code encodes tag_keys from an unordered set, leading to a non-deterministic order in the output. This will cause flaky tests and unpredictable production behavior.
_{Severity: HIGH | Confidence: High}

🔍 Detailed Analysis

The code collects tag keys into a set and then encodes it directly. Since Python's set iteration order is non-deterministic by default (due to hash randomization), the resulting tag_keys attribute will have an unpredictable order across different runs. This will cause flaky tests that assert a specific order and introduces non-deterministic behavior in production, potentially affecting downstream consumers that rely on a consistent ordering for caching, deduplication, or comparison logic.

💡 Suggested Fix

Sort the tag_keys set before encoding it to ensure a deterministic and consistent order. Change the line attributes["tag_keys"] = _encode_value(tag_keys) to attributes["tag_keys"] = _encode_value(sorted(tag_keys)).

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/eventstream/item_helpers.py#L88-L98 Potential issue: The code collects tag keys into a `set` and then encodes it directly. Since Python's set iteration order is non-deterministic by default (due to hash randomization), the resulting `tag_keys` attribute will have an unpredictable order across different runs. This will cause flaky tests that assert a specific order and introduces non-deterministic behavior in production, potentially affecting downstream consumers that rely on a consistent ordering for caching, deduplication, or comparison logic.

_{Did we get this right? 👍 / 👎 to inform future reviews.}
_{Reference ID: 7609150}

shashjar

just left a question for my own understanding

shashjar · 2025-12-16T21:04:48Z

src/sentry/eventstream/item_helpers.py

+        attributes[formatted_key] = _encode_value(value)
+        tag_keys.add(formatted_key)
+
+    attributes["tag_keys"] = _encode_value(tag_keys)


The array encoding for tags is temporary and we're eventually going to migrate to the map representation once supported, right?

I would like to, but I'm not 100% sure.

From Pierre:

and in general, it's better to split a map with different attributes so you can aggregate on each of them
arrays were a little bit different as we can't really split an array with different attributes
without pain and modifying attribute names to add the index at the end
which then forces us to do wildcard key search, not exact match

The `encode_attributes` function was storing tag keys in a set and then encoding it as an array. Since sets are unordered in Python, this caused non-deterministic ordering in the resulting array, leading to flaky test failures. This fix sorts the tag_keys set before encoding to ensure deterministic, consistent ordering across test runs. Fixes flaky tests: - tests/sentry/eventstream/test_item_helpers.py::ItemHelpersTest::test_encode_attributes_with_tags - tests/sentry/discover/test_dashboard_widget_split.py::DashboardWidgetDatasetSplitTestCase::test_unhandled_filter_sets_error_events_dataset - tests/sentry/discover/test_dataset_split.py::DiscoverSavedQueryDatasetSplitTestCase::test_unhandled_filter_sets_error_events_dataset Root cause introduced in: 15e26cf (feat(occurrences): Add tagname attr to eap #105074)

thetruecpaul requested a review from a team December 16, 2025 19:30

thetruecpaul requested a review from a team as a code owner December 16, 2025 19:30

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 16, 2025

cursor bot reviewed Dec 16, 2025

View reviewed changes

sentry bot reviewed Dec 16, 2025

View reviewed changes

shashjar approved these changes Dec 16, 2025

View reviewed changes

thetruecpaul merged commit 15e26cf into master Dec 16, 2025
68 checks passed

thetruecpaul deleted the cpaul/121625/occeap-tagnames branch December 16, 2025 22:18

JoshFerge mentioned this pull request Dec 17, 2025

fix(tests): skip flaky tests #105125

Merged

github-actions bot locked and limited conversation to collaborators Jan 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(occurrences): Add tagname attr to eap #105074

feat(occurrences): Add tagname attr to eap #105074

Uh oh!

thetruecpaul commented Dec 16, 2025

Uh oh!

cursor bot Dec 16, 2025

Uh oh!

sentry bot Dec 16, 2025

Uh oh!

shashjar left a comment

Uh oh!

shashjar Dec 16, 2025

Uh oh!

thetruecpaul Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat(occurrences): Add tagname attr to eap #105074

feat(occurrences): Add tagname attr to eap #105074

Uh oh!

Conversation

thetruecpaul commented Dec 16, 2025

Uh oh!

cursor bot Dec 16, 2025

Choose a reason for hiding this comment

Bug: Set iteration produces non-deterministic order in test assertion

Uh oh!

sentry bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

shashjar left a comment

Choose a reason for hiding this comment

Uh oh!

shashjar Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

thetruecpaul Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants