Skip to content

fix: Resolved anonymization action logic#519

Merged
onuratakan merged 1 commit intomasterfrom
anonimization_fix
Feb 5, 2026
Merged

fix: Resolved anonymization action logic#519
onuratakan merged 1 commit intomasterfrom
anonimization_fix

Conversation

@onuratakan
Copy link
Copy Markdown
Member

@onuratakan onuratakan commented Feb 5, 2026

Note

Medium Risk
Touches the privacy/anonymization flow by transforming user prompts and post-processing model outputs, so mistakes could leak PII or corrupt returned content (especially for structured/Pydantic outputs). Changes are localized but affect all runs when anonymization triggers.

Overview
Adds reversible anonymization support end-to-end: safety policies can now return a transformation_map, which is stored on the Task when user input is REPLACE/ANONYMIZE and accompanied by a privacy-mode notice prepended to the prompt.

Introduces safety_engine.anonymization utilities and exports them from safety_engine.__init__, updates PolicyResult/PolicyManager to aggregate transformation maps across policies, and updates FinalizationStep to de-anonymize the final context.output (and best-effort structured/Pydantic outputs) plus task._response before returning to the caller.

Simplifies PIIAnonymizeAction to use the standardized anonymization path and hardens ActionBase.anonymize_triggered_keywords to skip PII_KEYWORD: markers and handle typed keywords consistently (case-insensitive regex replacement).

Written by Cursor Bugbot for commit 008cdcb. This will update automatically on new commits. Configure here.

@claude
Copy link
Copy Markdown

claude bot commented Feb 5, 2026

PR Review: Reversible Anonymization Support (#519)

This PR adds end-to-end reversible anonymization: PII is replaced with random-but-format-preserving placeholders before the prompt hits the LLM, and the placeholders are swapped back to originals in the finalization step. Good direction overall. Below is a detailed review organized by severity.


🔴 Bugs / Correctness Issues

1. create_anonymization_result passes bool where dict is expected (anonymization.py:313)

The convenience function signature accepts use_random_format: bool = False but forwards it directly as the third argument to anonymize_contents, whose third parameter is existing_map: Optional[Dict[int, Dict[str, str]]]. Passing False (a bool) will be treated as a falsy existing_map and silently ignored today, but the parameter is semantically wrong and will break if the guard if existing_map: is ever tightened (e.g., if existing_map is not None:). The use_random_format flag is also unused — the module only does random replacement anyway. Either remove the parameter or fix the call:

# anonymization.py line 313 — current
return anonymize_contents(original_content, triggered_keywords, use_random_format)  # bool, not dict!

# should be
return anonymize_contents(original_content, triggered_keywords)

2. Transformation-map key collision in PolicyManager.execute_policies_async (policy_manager.py:271)

The merge logic uses base_idx = len(result.transformation_map) as the new key offset, but the incoming policy_output.transformation_map keys are 1-based (counter starts at 0, increments before assignment). Example: map A has keys {1, 2} (len=2), map B has keys {1, 2}. Merged keys become 2+1=3 and 2+2=4 — that works. But if map A has keys {1, 3} (e.g., after a prior merge or gap), len=2, so map B key 1 maps to 2+1=3, colliding with A's key 3, silently overwriting an entry. Use max(result.transformation_map.keys()) instead of len(...):

base_idx = max(result.transformation_map.keys())  # not len()

3. replace_triggered_keywords_async still has the pre-existing PII_KEYWORD: bug (action_base.py:263-277)

The sync replace_triggered_keywords and the async anonymize_triggered_keywords_async both got the PII_KEYWORD: skip and typed-keyword strip fixes in this PR, but replace_triggered_keywords_async (line 263) was not touched. It still does a raw re.escape(keyword) replacement without stripping the type prefix, so a keyword like "EMAIL:user@example.com" would search for the literal string EMAIL:user@example.com in the text rather than user@example.com. This is a latent bug this PR could have caught while it was in the area.


🟡 Design & Reliability Concerns

4. Random replacements can collide with real content or with each other

_generate_random_replacement produces format-preserving random strings with no collision check against existing content or other replacements. For short values (e.g., a 3-letter name, a 2-digit number) the probability of the replacement accidentally matching a word already in the prompt or matching another replacement is non-trivial. Structured placeholders like [ANON_EMAIL_1] would be both more reliable and more debuggable, while still preventing the LLM from inferring the original value. The current approach also makes the de-anonymization fragile: if the LLM paraphrases or changes the case of the placeholder in a way that doesn't match re.IGNORECASE exactly (e.g., splits a compound replacement across a line), the original value won't be restored.

5. Silent failure on Pydantic de-anonymization (steps.py:2978)

The bare except Exception: pass on the Pydantic path means that if de-anonymization corrupts a JSON field value (e.g., the original email is longer than the placeholder and overflows a max_length validator), the user gets back the still-anonymized Pydantic object with no indication that restoration failed. At minimum, this should log a warning when agent.debug is true, or surface the error in the action_output.

6. Anonymization notice is user-visible in task description (agent.py:1990-1995)

The [PRIVACY MODE ACTIVE ...] string is prepended to task.description. If anything downstream logs, caches, or surfaces task.description to the user (e.g., task history, debugging panels), this internal LLM-steering instruction leaks. Consider storing it separately and injecting it only at prompt-assembly time, or at least stripping it after the LLM call completes.

7. _anonymization_map on a Pydantic BaseModel as a bare class variable (tasks.py:60)

Task extends BaseModel. Other private attributes in Task (e.g., _response, _tool_calls) follow the same pattern, so this is consistent with the existing code. However, Pydantic v2 treats bare _-prefixed class annotations as PrivateAttr only if declared with PrivateAttr(). Without that, the attribute may not serialize/deserialize correctly if a Task is ever round-tripped through .model_dump() / .model_validate(). This is low risk today but worth noting for long-term correctness.


🟢 Minor / Style

8. Dead import in steps.py:2974
import json is imported inside the try block but never used — model_dump_json() / model_validate_json() are Pydantic methods, not stdlib json calls.

9. Eager import at module top in __init__.py (line 14) alongside lazy __getattr__
The top-level from .anonymization import (...) at line 14 (inside the try block that also imports other modules) runs eagerly on first access to the safety_engine package. The _get_anonymization_classes() lazy helper added later is redundant for anything already imported at the top. Pick one pattern.

10. Anonymizer class is stateful but instantiated fresh in every module-level helper
anonymize_content and anonymize_contents both create a brand-new Anonymizer() every call. The class and its reset() method, _counter, and _anonymized_cache exist to support reuse across calls, but nothing actually reuses an instance. Either document that callers should manage their own Anonymizer lifetime (and simplify the module-level helpers) or remove the class in favor of plain functions.


🔵 Test Coverage Gaps

No tests were added for the new anonymization.py module or for the de-anonymization path in FinalizationStep. The following cases should be covered:

  1. Round-trip correctness: anonymize → de-anonymize returns the original for emails, phones, SSNs, credit cards, and addresses.
  2. PII_KEYWORD: skip: keywords prefixed with PII_KEYWORD: are ignored during anonymization.
  3. Typed-keyword extraction: EMAIL:x@y.com correctly extracts x@y.com.
  4. Collision in transformation-map merge: two policies both anonymize, merged map has no key collisions.
  5. Pydantic round-trip de-anonymization: a structured output with anonymized fields is correctly restored.
  6. create_anonymization_result type bug: the bool/dict mismatch is caught.
  7. Case-insensitive de-anonymization: if the LLM returns a placeholder in different casing, the original is still restored.

The existing test_pii_anonymize_phone test does not assert that de-anonymization actually runs (the mock response doesn't contain the anonymized placeholder, so the de-anon path is never exercised).


Summary

Severity Count Items
Bug 3 #1 type mismatch, #2 key collision, #3 async replace not fixed
Design 4 #4 placeholder collisions, #5 silent Pydantic failure, #6 notice leakage, #7 PrivateAttr
Minor 3 #8 dead import, #9 dual import pattern, #10 stateful-but-not-reused class
Tests 1 No unit tests for the new anonymization module or de-anon path

The core idea is sound and the PIIAnonymizeAction simplification is a clear improvement. The bugs in #1#3 should be fixed before merge; the design items are worth at least tracking as follow-ups.

🤖 Generated with Claude Code

@onuratakan onuratakan merged commit 5c60ede into master Feb 5, 2026
6 checks passed
@onuratakan onuratakan deleted the anonimization_fix branch February 5, 2026 15:07
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Returns:
Tuple of (anonymized_contents, transformation_map)
"""
return anonymize_contents(original_content, triggered_keywords, use_random_format)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bool passed where Dict expected causes AttributeError

High Severity

The create_anonymization_result function passes use_random_format (a bool) as the third argument to anonymize_contents, but that function expects existing_map (an Optional[Dict]). When use_random_format=True is passed, the code at line 217 will attempt True.copy(), raising an AttributeError since booleans don't have a copy() method.

Additional Locations (1)

Fix in Cursor Fix in Web

debug_log(
f"De-anonymized response using {len(task._anonymization_map)} mappings",
"FinalizationStep"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null check for agent before accessing debug

Low Severity

The code accesses agent.debug at line 2986 without first checking if agent is not None. Earlier in the same function (line 2957), there's a defensive pattern if agent and hasattr(agent, 'run_id') suggesting agent may be None in some cases. If agent is None, this would cause an AttributeError.

Fix in Cursor Fix in Web

debug_log(
f"De-anonymized response using {len(task._anonymization_map)} mappings",
"FinalizationStep"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-anonymization corrupts data when multiple values share placeholder

Medium Severity

The de-anonymization logic applies to both REPLACE and ANONYMIZE actions identically. For REPLACE actions using a fixed placeholder (e.g., [REDACTED]) for multiple PII items, the transformation_map contains multiple entries all mapping to the same anonymous value. When deanonymize_content processes these, it performs sequential replacements where later entries overwrite earlier ones, causing incorrect data restoration. The privacy notice also misleadingly states "random placeholders" for REPLACE actions.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant