Sanitize privacy sensitive data before sending to sentry.#16780
Sanitize privacy sensitive data before sending to sentry.#16780
Conversation
Temp Near complete Complete
|
CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes. |
Builds ready [871dfb7]
Page Load Metrics (2340 ± 108 ms)
Bundle size diffs [🚨 Warning! Bundle size has increased!]
|
Builds ready [3f04b09]
Page Load Metrics (2106 ± 88 ms)
Bundle size diffs [🚨 Warning! Bundle size has increased!]
|
Co-authored-by: Mark Stacey <markjstacey@gmail.com>
app/scripts/lib/setupSentry.js
Outdated
| const re = /(([A-z]+:\/\/)|(www\.))\S+[@:.]\S+/gu; | ||
| const urlsInMessage = newErrorMessage.match(re) || []; | ||
| urlsInMessage.forEach((url) => { | ||
| const urlObj = new URL(url); |
There was a problem hiding this comment.
Hmmm. This will throw if the URL isn't valid. Luckily if this throws, it stops trying to send the whole report so we're still not leaking anything. Though it does lose us the report, which is not ideal.
Perhaps it would be safer to take the hostname from the same regex we're using to detect URLs? We have the two forward slashes as a starting point. The valid characters within a hostname are quite a lot more restricted than for a URL in general, so it should be easy enough to match it.
Something like... [-a-zA-Z0-9.]+
Then once we have the hostname, we can look for any that match an allowlist entry or end with .[entry] (to capture subdomains)
There was a problem hiding this comment.
What do you think of the approach I took in 526a5d3?
If we try to get the hostname from the regex, then we need to make the regex sophisticated enough to match the hostname but not subdomains, and also not things that look like host names in other parts of the url. There is a decent amount of safety that comes with relying on URL.
My approach here is just to continue to rely on that, but just catch errors and hide any url that is invalid,
There was a problem hiding this comment.
The approach taken in that commit does not match subdomains, which is a bad thing. We do want to match subdomains - otherwise we'll be filtering out URLs we use internally. codefi.network for example is just the root TLD we use; it's the subdomains that we're actually sending requests to.
If we try to get the hostname from the regex, then we need to make the regex sophisticated enough to match the hostname but not subdomains, and also not things that look like host names in other parts of the url.
Hmm, wouldn't my suggestion work here? Use the regex I suggested right after the double forward slash.
There was a problem hiding this comment.
Hold on, if we extract subdomain.codefi.network it will never successfully match against the hardcoded string we have for the URL allowlist: codefi.network
There was a problem hiding this comment.
I think we have to do as done in 352d306
There was a problem hiding this comment.
I'm not sure what you mean by that comment. But that latest commit won't work either, because now it will incorrectly match domains that have codefi.network as a subdomain (e.g. codefi.network.another.domain.com). We can't look for matches indiscriminately in the hostname, we'll have to look just at hostnames that either match exactly, or end with a period followed by the domain we're trying to match.
That's what I meant by this sentence:
we can look for any that match an allowlist entry or end with
.[entry](to capture subdomains)
There was a problem hiding this comment.
Yes, that makes sense. I have now added another commit that I think works correctly. 981d56b (And correctly follows the sentence "we can look for any that match an allowlist entry or end with .[entry] (to capture subdomains)")
There was a problem hiding this comment.
Comparing your approach to what I suggested, it looks like this strategy will be equivalent except in the case where a matched URL is not a valid URL and causes the URL constructor to throw. In that case, we'll fallback to hiding the matched string.
This might at worst result in some of our internal URLs being hidden when they otherwise wouldn't be. I can't think of any examples offhand that this would affect. This is probably fine.
Builds ready [2ec2e71]
Page Load Metrics (2161 ± 128 ms)
Bundle size diffs [🚨 Warning! Bundle size has increased!]
|
Co-authored-by: Mark Stacey <markjstacey@gmail.com>
Co-authored-by: Mark Stacey <markjstacey@gmail.com>
Builds ready [526a5d3]
Page Load Metrics (2017 ± 102 ms)
Bundle size diffs [🚨 Warning! Bundle size has increased!]
|
Builds ready [352d306]
Page Load Metrics (2132 ± 154 ms)
Bundle size diffs [🚨 Warning! Bundle size has increased!]
|
|
LGTM except for the e2e test failure. Unsure whether that's related to these changes or not. |
Co-authored-by: Mark Stacey <markjstacey@gmail.com>
Builds ready [c022190]
Page Load Metrics (2167 ± 111 ms)
Bundle size diffs [🚨 Warning! Bundle size has increased!]
|
digiwand
left a comment
There was a problem hiding this comment.
👍
In regards to #16780 (comment), I think it would be a good idea to create a follow-up ticket for this
This PR will ensure that we don't send any privacy sensitive information along with our sentry errors.
Still need to add some unit tests.