Skip to content

Improved PII stripping.#1

Merged
Agalin merged 4 commits intooperasoftware:pymongofrom
antonpirker:pymonog-strip-pii
Nov 2, 2022
Merged

Improved PII stripping.#1
Agalin merged 4 commits intooperasoftware:pymongofrom
antonpirker:pymonog-strip-pii

Conversation

@antonpirker
Copy link

I have now created a function, that does nicer PII stripping, where most of the keys are still there, but the values are gone.

@Agalin
Copy link
Collaborator

Agalin commented Oct 28, 2022

TBH I'm not sure if it's safe to do it like that - those keys depend on the protocol version used by mongo client and can change in the future so PII may be leaked.

Need to check how does it look with older queries but probably won't be able to do it this week.

@antonpirker
Copy link
Author

Good point! I added this, because when we redact all the values then the resulting query does not show anything, and the developer can not figure out what query is the slow one. See this screenshot:

Screenshot 2022-10-28 at 16 18 50

Having more information is very helpful:

Screenshot 2022-10-28 at 16 20 38

But you are right about old versions. It could leak PII and that is not good.

So we could:
a) go back to redacting everything.
b) only keep the first element (to have at least the collection on that the query is performed)
c) go through some older version of MongoDB (so maybe the last 2 major version or such) and make proper stripping for them and limit the SDK to those versions.

(side note: this code will come in handy if we do a Elasticsearch integration in the future, because those queries are also json)

@Agalin
Copy link
Collaborator

Agalin commented Oct 31, 2022

I'm worried about future ones, not the past. Monitoring API was only introduced in 3.1 so there is not much to check in that part. And bounding upper version is a really bad idea.

There is also an option of adding an optional callback but with it being a kind of an integration that should enable itself by default (in the future) it's not a great solution either.

TBH maybe just adding a warning in docs "PII stripping tested up to Mongo x.y.z" is enough?

Copy link
Collaborator

@Agalin Agalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rerun your test code (that one spans from which you've included) with pymongo 3.1?

@antonpirker
Copy link
Author

TBH maybe just adding a warning in docs "PII stripping tested up to Mongo x.y.z" is enough?

Yes, I think this should be enough.
I have now changed the stripping code to make it easier to read and comprehend. And I guess it is good enough. If new fields are added to the command they will be stripped by default. One will have to add them to SAFE_COMMAND_ATTRIBUTES or to one of the special stripping ifs [1] for them not to be stripped away in their entirety.

1: https://github.com/antonpirker/sentry-python/blob/pymonog-strip-pii/sentry_sdk/integrations/pymongo.py#L58-L79

@antonpirker
Copy link
Author

antonpirker commented Nov 2, 2022

Could you please merge this into your branch @Agalin
We then can continue in your PR. (Fixing the broken tests in Python 2.7 and so on)
Thanks!

@Agalin Agalin merged commit 116c40a into operasoftware:pymongo Nov 2, 2022
@Agalin
Copy link
Collaborator

Agalin commented Nov 2, 2022

Yeah, LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants