fix(attributes): migrating several numeric fields to `pii=maybe` by constantinius · Pull Request #228 · getsentry/sentry-conventions

constantinius · 2026-01-22T14:14:15Z

Description

In a recent discussion it was concluded that numeric fields (unless they are IDs of some kind) should be migrated to pii=maybe. This also removes the test that was enforcing the wrong behavior of not allowing it for non-string fields.

Contributes to https://linear.app/getsentry/issue/TET-1760/ttft-relay-treats-it-as-pii

PR Checklist

I have run yarn test and verified that the tests pass.
I have run yarn generate && yarn format to generate and format code and docs.

If an attribute was added:

I have used the correct value for pii (i.e. maybe or true. Use false only for values that should never be scrubbed such as IDs)

If an attribute was deprecated:

I've followed the policies described in CONTRIBUTING.md

Extra Info

This is the total list of updates and omissions:

============================================================
CHANGED ATTRIBUTES (PII set to 'maybe'):
============================================================
  - ai.completion_tokens.used
  - ai.frequency_penalty
  - ai.presence_penalty
  - ai.prompt_tokens.used
  - ai.temperature
  - ai.top_k
  - ai.top_p
  - ai.total_cost
  - ai.total_tokens.used
  - browser.script.source_char_position
  - cache.item_size
  - cache.ttl
  - client.port
  - cloudflare.d1.duration
  - cloudflare.d1.rows_read
  - cloudflare.d1.rows_written
  - code.line.number
  - code.lineno
  - frames.delay
  - frames.frozen
  - frames.slow
  - frames.total
  - gen_ai.cost.input_tokens
  - gen_ai.cost.output_tokens
  - gen_ai.cost.total_tokens
  - gen_ai.request.frequency_penalty
  - gen_ai.request.max_tokens
  - gen_ai.request.presence_penalty
  - gen_ai.request.temperature
  - gen_ai.request.top_k
  - gen_ai.request.top_p
  - gen_ai.response.tokens_per_second
  - gen_ai.usage.completion_tokens
  - gen_ai.usage.input_tokens
  - gen_ai.usage.input_tokens.cache_write
  - gen_ai.usage.input_tokens.cached
  - gen_ai.usage.output_tokens
  - gen_ai.usage.output_tokens.reasoning
  - gen_ai.usage.prompt_tokens
  - gen_ai.usage.total_tokens
  - http.decoded_response_content_length
  - http.request.connect_start
  - http.request.connection_end
  - http.request.domain_lookup_end
  - http.request.domain_lookup_start
  - http.request.fetch_start
  - http.request.redirect_end
  - http.request.redirect_start
  - http.request.request_start
  - http.request.resend_count
  - http.request.response_end
  - http.request.response_start
  - http.request.secure_connection_start
  - http.request.time_to_first_byte
  - http.request.worker_start
  - http.response.body.size
  - http.response.size
  - http.response.status_code
  - http.response_content_length
  - http.response_transfer_size
  - http.status_code
  - lcp.size
  - mcp.progress.current
  - mcp.progress.percentage
  - mcp.progress.total
  - mcp.prompt.result.message_count
  - mcp.protocol.ready
  - mcp.tool.result.content_count
  - messaging.message.body.size
  - messaging.message.envelope.size
  - messaging.message.receive.latency
  - messaging.message.retry.count
  - nel.elapsed_time
  - nel.sampling_function
  - net.host.port
  - net.peer.port
  - net.sock.host.port
  - net.sock.peer.port
  - network.local.port
  - network.peer.port
  - process.pid
  - rpc.grpc.status_code
  - sentry.client_sample_rate
  - sentry.exclusive_time
  - sentry.server_sample_rate
  - sentry.status_code
  - server.port
  - url.port
  - vercel.proxy.response_byte_size
  - vercel.proxy.status_code
  - vercel.proxy.timestamp
  - vercel.status_code

============================================================
OMITTED ATTRIBUTES (ID fields, not changed):
============================================================
  - event.id
  - thread.id

Total changed: 92
Total omitted: 2

…ch had it set to false

…trings`/`string[]`

linear · 2026-01-22T14:14:19Z

TET-1760 TTFT: Relay treats it as PII

constantinius · 2026-01-22T14:16:43Z

This was the vibe coded script to update the fields:

#!/usr/bin/env python3
"""
Script to update PII to 'maybe' for numeric attributes that have PII set to false,
excluding ID fields.
"""

import json
import os
from pathlib import Path


NUMERIC_TYPES = {"integer", "int", "double", "float", "long", "number"}
ATTRIBUTES_DIR = Path("model/attributes")


def is_id_field(key: str) -> bool:
    """Check if the attribute key looks like an ID field."""
    key_lower = key.lower()
    # Check for common ID patterns
    if key_lower.endswith("_id") or key_lower.endswith(".id"):
        return True
    if "_id." in key_lower or "_id_" in key_lower:
        return True
    # Check for specific ID-like suffixes
    parts = key_lower.split(".")
    if parts and parts[-1] in ("id", "request_id", "trace_id", "span_id", "session_id"):
        return True
    return False


def process_attribute_file(file_path: Path) -> tuple[str | None, str | None]:
    """
    Process a single attribute file.

    Returns:
        (changed_key, omitted_key) - one will be set, the other None
    """
    try:
        with open(file_path, "r") as f:
            data = json.load(f)
    except (json.JSONDecodeError, IOError) as e:
        print(f"Error reading {file_path}: {e}")
        return None, None

    key = data.get("key", "")
    attr_type = data.get("type", "")
    pii = data.get("pii", {})
    pii_value = pii.get("key") if isinstance(pii, dict) else None

    # Check if it's a numeric type with pii=false
    if attr_type not in NUMERIC_TYPES:
        return None, None

    if pii_value != "false":
        return None, None

    # Check if it's an ID field (should be omitted)
    if is_id_field(key):
        return None, key

    # Update the PII to "maybe"
    data["pii"]["key"] = "maybe"

    with open(file_path, "w") as f:
        json.dump(data, f, indent=2)
        f.write("\n")  # Add trailing newline

    return key, None


def main():
    changed = []
    omitted = []

    # Find all JSON files in the attributes directory
    for json_file in ATTRIBUTES_DIR.rglob("*.json"):
        changed_key, omitted_key = process_attribute_file(json_file)
        if changed_key:
            changed.append(changed_key)
        if omitted_key:
            omitted.append(omitted_key)

    # Sort for consistent output
    changed.sort()
    omitted.sort()

    print("=" * 60)
    print("CHANGED ATTRIBUTES (PII set to 'maybe'):")
    print("=" * 60)
    if changed:
        for key in changed:
            print(f"  - {key}")
    else:
        print("  (none)")

    print()
    print("=" * 60)
    print("OMITTED ATTRIBUTES (ID fields, not changed):")
    print("=" * 60)
    if omitted:
        for key in omitted:
            print(f"  - {key}")
    else:
        print("  (none)")

    print()
    print(f"Total changed: {len(changed)}")
    print(f"Total omitted: {len(omitted)}")


if __name__ == "__main__":
    main()

Dav1dde

Only checked the changed attribute names, 👍

lcian · 2026-01-22T14:31:14Z

Looks like not all of the can be sent by the SDK, but I think these should remain pii: "false":

- sentry.client_sample_rate
- sentry.exclusive_time
- sentry.server_sample_rate
- sentry.status_code

Do you agree @Dav1dde?

Thank you!

Dav1dde · 2026-01-22T15:07:49Z

Good callout, the sample rates make sense to be false and the client one is also sent by the SDK

In practice exclusive time and status code I think don't matter and are fine either way. I'd just generally default to maybe unless there is a really good reason no to, which for sample rates definitely makes sense.

constantinius · 2026-01-22T15:19:15Z

This comment was raised:

sentry.status_code is populated for Insights and is copied from the http response status code field - I think that would be safer as maybe since we don't control the source data

What's the verdict on that one? @lcian @Dav1dde

lcian · 2026-01-22T18:53:06Z

@constantinius Then I would say let's please revert just both sample rates to false, the other two we can keep as maybe.

…ample_rate as discussed

…e-for-numbers

constantinius · 2026-01-23T08:09:47Z

@lcian Done!
I still need an approval from either code owner

constantinius added 3 commits January 22, 2026 15:05

fix(attributes): setting PII to maybe for 92 numeric attributes whi…

c36a291

…ch had it set to false

chore: re-generated outputs

c24c925

test: removed wrong tests fixing pii maybe to only be allowed in `s…

7baf086

…trings`/`string[]`

constantinius requested review from AbhiPrasad, cleptric and lcian as code owners January 22, 2026 14:14

chore: update changelog

e89d2a9

Dav1dde approved these changes Jan 22, 2026

View reviewed changes

rev: reverting changes for sentry.client_sample_rate, sentry.server_s…

fda3d15

…ample_rate as discussed

constantinius requested a review from Lms24 as a code owner January 23, 2026 08:08

Merge branch 'main' into constantinius/fix/attributes/change-pii-mayb…

739a7f9

…e-for-numbers

lcian approved these changes Jan 23, 2026

View reviewed changes

constantinius merged commit b39c051 into main Jan 23, 2026
10 checks passed

constantinius deleted the constantinius/fix/attributes/change-pii-maybe-for-numbers branch January 23, 2026 08:36

sentry-release-bot bot mentioned this pull request Jan 23, 2026

publish: getsentry/sentry-conventions@0.3.2 getsentry/publish#6982

Closed

4 tasks

Lms24 mentioned this pull request Feb 2, 2026

feat: add gen_ai.input.messages original length attribute #241

Closed

sentry-release-bot bot mentioned this pull request Feb 17, 2026

publish: getsentry/sentry-conventions@0.4.0 getsentry/publish#7192

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(attributes): migrating several numeric fields to `pii=maybe`#228

fix(attributes): migrating several numeric fields to `pii=maybe`#228
constantinius merged 6 commits intomainfrom
constantinius/fix/attributes/change-pii-maybe-for-numbers

constantinius commented Jan 22, 2026 •

edited

Loading

Uh oh!

linear bot commented Jan 22, 2026

Uh oh!

constantinius commented Jan 22, 2026

Uh oh!

Dav1dde left a comment

Uh oh!

lcian commented Jan 22, 2026 •

edited

Loading

Uh oh!

Dav1dde commented Jan 22, 2026

Uh oh!

constantinius commented Jan 22, 2026

Uh oh!

lcian commented Jan 22, 2026 •

edited

Loading

Uh oh!

constantinius commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

constantinius commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR Checklist

Extra Info

Uh oh!

linear bot commented Jan 22, 2026

Uh oh!

constantinius commented Jan 22, 2026

Uh oh!

Dav1dde left a comment

Choose a reason for hiding this comment

Uh oh!

lcian commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dav1dde commented Jan 22, 2026

Uh oh!

constantinius commented Jan 22, 2026

Uh oh!

lcian commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

constantinius commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

constantinius commented Jan 22, 2026 •

edited

Loading

lcian commented Jan 22, 2026 •

edited

Loading

lcian commented Jan 22, 2026 •

edited

Loading