Skip to content

fix(pii): scrub PII in span description [INGEST-1709]#1590

Merged
olksdr merged 4 commits intomasterfrom
fix/pii-in-snap-desc
Nov 17, 2022
Merged

fix(pii): scrub PII in span description [INGEST-1709]#1590
olksdr merged 4 commits intomasterfrom
fix/pii-in-snap-desc

Conversation

@olksdr
Copy link
Contributor

@olksdr olksdr commented Nov 17, 2022

Filter description if it contains PII

@olksdr olksdr self-assigned this Nov 17, 2022
@olksdr olksdr requested a review from a team November 17, 2022 10:24

/// Human readable description of a span (e.g. method URL).
#[metastructure(pii = "maybe")]
#[metastructure(pii = "true")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I will add this

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would add a bit more explanation to the PR description (that this change was motivated by an incident, etc.).

CHANGELOG.md Outdated
**Internal**:

- Emit a `service.back_pressure` metric that measures internal back pressure by service. ([#1583](https://github.com/getsentry/relay/pull/1583))
- Scrub PII in the span description. ([#1590](https://github.com/getsentry/relay/pull/1590))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Scrub PII in the span description. ([#1590](https://github.com/getsentry/relay/pull/1590))
- Always scrub PII in the span description. So far, the user had to configure this by hand. ([#1590](https://github.com/getsentry/relay/pull/1590))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very consistent with this myself, but let's also add this to py/CHANGELOG.md. See https://github.com/getsentry/relay#instructions-for-changelogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@marandaneto
Copy link
Contributor

Can we add a test for breadcrumbs as well?

/// Arbitrary data associated with this breadcrumb.

SDKs sets the data.url as well https://develop.sentry.dev/sdk/event-payloads/breadcrumbs/

@marandaneto
Copy link
Contributor

request.url is also maybe

///The query string can be declared either as part of the `url`, or separately in `query_string`.

Should we change that too? Same for query_string.

@olksdr
Copy link
Contributor Author

olksdr commented Nov 17, 2022

Can we add a test for breadcrumbs as well?

/// Arbitrary data associated with this breadcrumb.

SDKs sets the data.url as well https://develop.sentry.dev/sdk/event-payloads/breadcrumbs/

this already is tested in

fn test_breadcrumb_message() {

@olksdr
Copy link
Contributor Author

olksdr commented Nov 17, 2022

request.url is also maybe

///The query string can be declared either as part of the `url`, or separately in `query_string`.

Should we change that too? Same for query_string.

we can add, but it can be a lot of false positives, if anything is in the URL , domain name which will contain the key words we look for we will strip entire URL. Is that ok?


let mut processor = PiiProcessor::new(config.compiled());
process_value(&mut event, &mut processor, ProcessingState::root()).unwrap();
assert_annotated_snapshot!(event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we're only interested in ensuring the description is [Filtered], so it may be easier to read that assertion over a snapshot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is to ensure that the things are working as expected now, and it will show the problem if we have the regression. But I do not really care who this implemented.

@marandaneto
Copy link
Contributor

request.url is also maybe

///The query string can be declared either as part of the `url`, or separately in `query_string`.

Should we change that too? Same for query_string.

we can add, but it can be a lot of false positives, if anything is in the URL , domain name which will contain the key words we look for we will strip entire URL. Is that ok?

mm I am unsure, this will definatelly degrade the feature (filtering the whole URL instead of the only key words), Mobile already scrub out the query parameters (by not sending them), if you think this is ok, we are good already.

@olksdr
Copy link
Contributor Author

olksdr commented Nov 17, 2022

mm I am unsure, this will definatelly degrade the feature (filtering the whole URL instead of the only key words), Mobile already scrub out the query parameters (by not sending them), if you think this is ok, we are good already.

@marandaneto , checked this with @jjbayer - we already normalize the URL and push query string from url field to query_string which has pii=true and will be scrubbed

@olksdr olksdr merged commit c7eb483 into master Nov 17, 2022
@olksdr olksdr deleted the fix/pii-in-snap-desc branch November 17, 2022 13:23
olksdr added a commit that referenced this pull request Nov 18, 2022
olksdr added a commit that referenced this pull request Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants