fix(attributes): migrating several numeric fields to pii=maybe#228
Conversation
…ch had it set to false
…trings`/`string[]`
|
This was the vibe coded script to update the fields: #!/usr/bin/env python3
"""
Script to update PII to 'maybe' for numeric attributes that have PII set to false,
excluding ID fields.
"""
import json
import os
from pathlib import Path
NUMERIC_TYPES = {"integer", "int", "double", "float", "long", "number"}
ATTRIBUTES_DIR = Path("model/attributes")
def is_id_field(key: str) -> bool:
"""Check if the attribute key looks like an ID field."""
key_lower = key.lower()
# Check for common ID patterns
if key_lower.endswith("_id") or key_lower.endswith(".id"):
return True
if "_id." in key_lower or "_id_" in key_lower:
return True
# Check for specific ID-like suffixes
parts = key_lower.split(".")
if parts and parts[-1] in ("id", "request_id", "trace_id", "span_id", "session_id"):
return True
return False
def process_attribute_file(file_path: Path) -> tuple[str | None, str | None]:
"""
Process a single attribute file.
Returns:
(changed_key, omitted_key) - one will be set, the other None
"""
try:
with open(file_path, "r") as f:
data = json.load(f)
except (json.JSONDecodeError, IOError) as e:
print(f"Error reading {file_path}: {e}")
return None, None
key = data.get("key", "")
attr_type = data.get("type", "")
pii = data.get("pii", {})
pii_value = pii.get("key") if isinstance(pii, dict) else None
# Check if it's a numeric type with pii=false
if attr_type not in NUMERIC_TYPES:
return None, None
if pii_value != "false":
return None, None
# Check if it's an ID field (should be omitted)
if is_id_field(key):
return None, key
# Update the PII to "maybe"
data["pii"]["key"] = "maybe"
with open(file_path, "w") as f:
json.dump(data, f, indent=2)
f.write("\n") # Add trailing newline
return key, None
def main():
changed = []
omitted = []
# Find all JSON files in the attributes directory
for json_file in ATTRIBUTES_DIR.rglob("*.json"):
changed_key, omitted_key = process_attribute_file(json_file)
if changed_key:
changed.append(changed_key)
if omitted_key:
omitted.append(omitted_key)
# Sort for consistent output
changed.sort()
omitted.sort()
print("=" * 60)
print("CHANGED ATTRIBUTES (PII set to 'maybe'):")
print("=" * 60)
if changed:
for key in changed:
print(f" - {key}")
else:
print(" (none)")
print()
print("=" * 60)
print("OMITTED ATTRIBUTES (ID fields, not changed):")
print("=" * 60)
if omitted:
for key in omitted:
print(f" - {key}")
else:
print(" (none)")
print()
print(f"Total changed: {len(changed)}")
print(f"Total omitted: {len(omitted)}")
if __name__ == "__main__":
main() |
Dav1dde
left a comment
There was a problem hiding this comment.
Only checked the changed attribute names, 👍
|
Looks like not all of the can be sent by the SDK, but I think these should remain Do you agree @Dav1dde? Thank you! |
|
Good callout, the sample rates make sense to be In practice exclusive time and status code I think don't matter and are fine either way. I'd just generally default to |
|
@constantinius Then I would say let's please revert just both sample rates to |
…ample_rate as discussed
|
@lcian Done! |
Description
In a recent discussion it was concluded that numeric fields (unless they are IDs of some kind) should be migrated to
pii=maybe. This also removes the test that was enforcing the wrong behavior of not allowing it for non-string fields.Contributes to https://linear.app/getsentry/issue/TET-1760/ttft-relay-treats-it-as-pii
PR Checklist
yarn testand verified that the tests pass.yarn generate && yarn formatto generate and format code and docs.If an attribute was added:
pii(i.e.maybeortrue. Usefalseonly for values that should never be scrubbed such as IDs)If an attribute was deprecated:
Extra Info
This is the total list of updates and omissions: