Skip to content

Sanitize privacy sensitive data before sending to sentry.#16780

Merged
danjm merged 16 commits intodevelopfrom
sentry-update-init
Dec 16, 2022
Merged

Sanitize privacy sensitive data before sending to sentry.#16780
danjm merged 16 commits intodevelopfrom
sentry-update-init

Conversation

@danjm
Copy link
Copy Markdown
Contributor

@danjm danjm commented Dec 2, 2022

This PR will ensure that we don't send any privacy sensitive information along with our sentry errors.

Still need to add some unit tests.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 2, 2022

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@metamaskbot
Copy link
Copy Markdown
Collaborator

Builds ready [871dfb7]
Page Load Metrics (2340 ± 108 ms)
PlatformPageMetricMin (ms)Max (ms)Average (ms)StandardDeviation (ms)MarginOfError (ms)
ChromeHomefirstPaint1093501564823
domContentLoaded185026842325211101
load185026842340225108
domInteractive185026842325211101
Bundle size diffs [🚨 Warning! Bundle size has increased!]
  • background: 652 bytes
  • ui: 0 bytes
  • common: 652 bytes

@metamaskbot
Copy link
Copy Markdown
Collaborator

Builds ready [3f04b09]
Page Load Metrics (2106 ± 88 ms)
PlatformPageMetricMin (ms)Max (ms)Average (ms)StandardDeviation (ms)MarginOfError (ms)
ChromeHomefirstPaint99172118168
domContentLoaded17062388207817885
load17062388210618388
domInteractive17062388207817885
Bundle size diffs [🚨 Warning! Bundle size has increased!]
  • background: 652 bytes
  • ui: 0 bytes
  • common: 652 bytes

@danjm danjm marked this pull request as ready for review December 8, 2022 16:30
@danjm danjm requested a review from a team as a code owner December 8, 2022 16:30
@danjm danjm requested a review from segun December 8, 2022 16:30
@PeterYinusa PeterYinusa added this to the v10.24.0 milestone Dec 8, 2022
Co-authored-by: Mark Stacey <markjstacey@gmail.com>
const re = /(([A-z]+:\/\/)|(www\.))\S+[@:.]\S+/gu;
const urlsInMessage = newErrorMessage.match(re) || [];
urlsInMessage.forEach((url) => {
const urlObj = new URL(url);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. This will throw if the URL isn't valid. Luckily if this throws, it stops trying to send the whole report so we're still not leaking anything. Though it does lose us the report, which is not ideal.

Perhaps it would be safer to take the hostname from the same regex we're using to detect URLs? We have the two forward slashes as a starting point. The valid characters within a hostname are quite a lot more restricted than for a URL in general, so it should be easy enough to match it.

Something like... [-a-zA-Z0-9.]+

Then once we have the hostname, we can look for any that match an allowlist entry or end with .[entry] (to capture subdomains)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of the approach I took in 526a5d3?

If we try to get the hostname from the regex, then we need to make the regex sophisticated enough to match the hostname but not subdomains, and also not things that look like host names in other parts of the url. There is a decent amount of safety that comes with relying on URL.

My approach here is just to continue to rely on that, but just catch errors and hide any url that is invalid,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach taken in that commit does not match subdomains, which is a bad thing. We do want to match subdomains - otherwise we'll be filtering out URLs we use internally. codefi.network for example is just the root TLD we use; it's the subdomains that we're actually sending requests to.

If we try to get the hostname from the regex, then we need to make the regex sophisticated enough to match the hostname but not subdomains, and also not things that look like host names in other parts of the url.

Hmm, wouldn't my suggestion work here? Use the regex I suggested right after the double forward slash.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, if we extract subdomain.codefi.network it will never successfully match against the hardcoded string we have for the URL allowlist: codefi.network

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to do as done in 352d306

Copy link
Copy Markdown
Member

@Gudahtt Gudahtt Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by that comment. But that latest commit won't work either, because now it will incorrectly match domains that have codefi.network as a subdomain (e.g. codefi.network.another.domain.com). We can't look for matches indiscriminately in the hostname, we'll have to look just at hostnames that either match exactly, or end with a period followed by the domain we're trying to match.

That's what I meant by this sentence:

we can look for any that match an allowlist entry or end with .[entry] (to capture subdomains)

Copy link
Copy Markdown
Contributor Author

@danjm danjm Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I have now added another commit that I think works correctly. 981d56b (And correctly follows the sentence "we can look for any that match an allowlist entry or end with .[entry] (to capture subdomains)")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing your approach to what I suggested, it looks like this strategy will be equivalent except in the case where a matched URL is not a valid URL and causes the URL constructor to throw. In that case, we'll fallback to hiding the matched string.

This might at worst result in some of our internal URLs being hidden when they otherwise wouldn't be. I can't think of any examples offhand that this would affect. This is probably fine.

@metamaskbot
Copy link
Copy Markdown
Collaborator

Builds ready [2ec2e71]
Page Load Metrics (2161 ± 128 ms)
PlatformPageMetricMin (ms)Max (ms)Average (ms)StandardDeviation (ms)MarginOfError (ms)
ChromeHomefirstPaint1032701383718
domContentLoaded171427662133260125
load171427662161266128
domInteractive171427662133260125
Bundle size diffs [🚨 Warning! Bundle size has increased!]
  • background: 860 bytes
  • ui: 0 bytes
  • common: 860 bytes

@metamaskbot
Copy link
Copy Markdown
Collaborator

Builds ready [526a5d3]
Page Load Metrics (2017 ± 102 ms)
PlatformPageMetricMin (ms)Max (ms)Average (ms)StandardDeviation (ms)MarginOfError (ms)
ChromeHomefirstPaint97144114126
domContentLoaded15632383200620599
load156323832017212102
domInteractive15632382200520599
Bundle size diffs [🚨 Warning! Bundle size has increased!]
  • background: 900 bytes
  • ui: 0 bytes
  • common: 900 bytes

@metamaskbot
Copy link
Copy Markdown
Collaborator

Builds ready [352d306]
Page Load Metrics (2132 ± 154 ms)
PlatformPageMetricMin (ms)Max (ms)Average (ms)StandardDeviation (ms)MarginOfError (ms)
ChromeHomefirstPaint961264182250120
domContentLoaded163333262105330158
load163333262132321154
domInteractive163333262105329158
Bundle size diffs [🚨 Warning! Bundle size has increased!]
  • background: 908 bytes
  • ui: 0 bytes
  • common: 908 bytes

@Gudahtt
Copy link
Copy Markdown
Member

Gudahtt commented Dec 12, 2022

LGTM except for the e2e test failure. Unsure whether that's related to these changes or not.

danjm and others added 3 commits December 13, 2022 14:33
@metamaskbot
Copy link
Copy Markdown
Collaborator

Builds ready [c022190]
Page Load Metrics (2167 ± 111 ms)
PlatformPageMetricMin (ms)Max (ms)Average (ms)StandardDeviation (ms)MarginOfError (ms)
ChromeHomefirstPaint98181132209
domContentLoaded170225012151227109
load170225022167231111
domInteractive170225012151227109
Bundle size diffs [🚨 Warning! Bundle size has increased!]
  • background: 927 bytes
  • ui: 0 bytes
  • common: 927 bytes

Copy link
Copy Markdown
Contributor

@digiwand digiwand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

In regards to #16780 (comment), I think it would be a good idea to create a follow-up ticket for this

@danjm danjm merged commit ba3914e into develop Dec 16, 2022
@danjm danjm deleted the sentry-update-init branch December 16, 2022 15:26
@github-actions github-actions bot locked and limited conversation to collaborators Dec 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants