feat(replay): Restrict network JSON bodies to max depth by mydea · Pull Request #7730 · getsentry/sentry-javascript

mydea · 2023-04-04T13:46:28Z

This restricts JSON bodies for replay network breadcrumbs to a max. depth of 6. If exceeding this, it will:

Replace child nodes with [~MaxDepth]
Add a warning to the _meta key: MAX_JSON_DEPTH_EXCEEDED

This is not perfect, but relatively straightforward.

Caveats

We still do the check for MAX_BODY_SIZE_EXCEEDED before doing this check. The reason is that otherwise we need to always process this first, serialize it again, get the length, in order to check this - feels like a lot of overhead 😬 IMHO this is OK for now (so no processing & skipping over 300k, below that we limit to 6 levels deep).

Closes #7531

mydea · 2023-04-04T13:46:47Z

packages/replay/src/coreHandlers/util/networkUtils.ts


+// How many levels deep the body JSON payloads should be normalized.
+// Everything deeper than this will be replaced with [removed]
+const BODY_DEPTH = 6;


This is a random number I selected, which felt somewhat reasonable.

What's the reason o cut off at certain depth? Does it affect performance on the client or at ingest time? When pii stripping, other?
Asking to better understand what value we could consider here

So the idea is two-fold:

Cutting off at a certain depth will generally make payloads a bit smaller. Esp. when dealing with heavily nested stuff this could reduce payloads significantly.

Less stuff sent means less stuff to parse for PII etc.

Basically it is a very simple way to make payloads smaller. Not bullet proof (as you can obv. still have massive strings in more shallow depth), but easy to reason about.

Makes sense. Do we check again if the payload is too large after doing this to take additional measures? Like dropping it, or something else?

What are your thoughts about pulling in a library like truncate-json. This library truncates to a specific byte size and can you tell what it's truncated.

I think it's more likely that API responses will hit size limits rather than depth limits. (Think index pages where we have lists of shallowly nested objects). This way we're almost always guaranteed some data in the response body as it'll always be trimmed down to meet our limits. This is especially important when we lower the max size.

We could also potentially include the paths that get truncated so that our UI can show them and show that their respective values were truncated.

Would there be a way to lazy load the package if users do not opt into network req details?

Hmm, truncate-json is 3kb, so not trivial. Also it really just omits any exceeding props, which is confusing IMHO, so we'd need to build on top of that to get what we actually want. IMHO in that case it's better we rewrite this implementation to take total size (vs. depth, or maybe in addition to depth) into account.

One thing I want to highlight is that processing of JSON like this is potentially slow and may bring a performance impact with it, that could be noticeable. Just to keep this in mind, the more stuff we do the more this may become noticeable. As this will run on each HTTP request for both request & response bodies, esp. at startup this could be quite a few requests at the same time. So we should be careful in what we do, the more complex our logic becomes the bigger the perf impact. (Also lazy loading this will not work consistently as we cannot assume await import will work consistently in all our target platforms 😢 )

Maybe a simple implementation on top of this PR could be:

Check & limit depth to e.g. 6 levels

In addition, check and limit length of strings to e.g. 250 chars

In addition, check and limit number of children to e.g. 30 (so any array items after that are redacted, and any object properties after that as well)

IMHO an approach like this will also potentially have a more useful outcome than to truncate based on size, because it means you'll have some data across the whole payload, vs. maybe just having the full first record but nothing of the second record, or similar.

If we want to take the output of this process as the size to check against the 300k limit, we will have to do additional processing (json stringify the new object, run through TextEncoder). If this is important to us, we can do it!

IMO having the full first record is more helpful to me (in the case of a list of records), since I'd want to see the full picture from a collection of pictures, rather than a collection of partial records. The reality is, regardless of method, truncation will make some users unhappy.

My concern with the custom implementation would be trying to explain the different truncation techniques we are employing vs size limit which is more simple and easy to understand. Having a size limit means we could potentially optimize perf by early returning after limit has been reached so we don't iterate over the rest of the object.

Let's see if other folks have thoughts here. I lean towards file size, but not full convinced either way. Maybe the more important problem is figuring out the best way to handle this on a worker thread first

There are good ideas and relevant approaches discussed here but I agree with the complexity argument.

Perhaps easiest to communicate and implement (and lowest overhead) is to simply truncate the payload.

github-actions · 2023-04-04T13:58:48Z

size-limit report 📦

Path	Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified)	20.73 KB (0%)
@sentry/browser - ES5 CDN Bundle (minified)	64.75 KB (0%)
@sentry/browser - ES6 CDN Bundle (gzipped + minified)	19.28 KB (-0.01% 🔽)
@sentry/browser - ES6 CDN Bundle (minified)	57.23 KB (0%)
@sentry/browser - Webpack (gzipped + minified)	20.84 KB (0%)
@sentry/browser - Webpack (minified)	68.2 KB (0%)
@sentry/react - Webpack (gzipped + minified)	20.87 KB (0%)
@sentry/nextjs Client - Webpack (gzipped + minified)	48.69 KB (0%)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified)	28.3 KB (0%)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified)	26.53 KB (+0.01% 🔺)
@sentry/replay ES6 CDN Bundle (gzipped + minified)	45.05 KB (+0.39% 🔺)
@sentry/replay - Webpack (gzipped + minified)	39.04 KB (+0.49% 🔺)
@sentry/browser + @sentry/tracing + @sentry/replay - ES6 CDN Bundle (gzipped + minified)	63.75 KB (+0.28% 🔺)
@sentry/browser + @sentry/replay - ES6 CDN Bundle (gzipped + minified)	56.79 KB (+0.31% 🔺)

bruno-garcia

Trying to understand the value defined and reason

bruno-garcia · 2023-04-04T15:07:52Z

packages/replay/src/coreHandlers/util/networkUtils.ts

+
+    const json = item as JsonObject;
+
+    return Object.keys(json).reduce((acc, key) => {


Are we doing this on the worker?

It's not in the worker. There are some things to think about if we move it to the worker:

would we also do this when compression is turned off (or worker doesn't load)?

Correct me if I'm wrong Francesco, but I think that means we'd need duplicate code in the non-worker and worker? Not significant here but if we decided to use a larger library?

Would require extra (de)+serialization steps in the worker in order to parse the event body

Jup, we would either have to duplicate this in the worker and the main script, or decide to not do any processing when the worker is disabled. Currently the worker is pretty dumb and just compresses stuff it receives, we actually just pass strings around (you can only pass strings to the worker) and don't even need to JSON.parse it there.

mydea · 2023-04-17T11:11:26Z

Closing this in favor of #7730

feat(replay): Restrict network JSON bodies to max depth

62eda03

mydea requested review from Lms24 and billyvg April 4, 2023 13:46

mydea self-assigned this Apr 4, 2023

mydea commented Apr 4, 2023

View reviewed changes

bruno-garcia reviewed Apr 4, 2023

View reviewed changes

bruno-garcia mentioned this pull request Apr 4, 2023

Replay: Optionally attach HTTP request/response payloads #7103

Closed

mydea mentioned this pull request Apr 17, 2023

feat(replay): Truncate network bodies to max size #7875

Merged

mydea closed this Apr 17, 2023


		const json = item as JsonObject;

		return Object.keys(json).reduce((acc, key) => {

Uh oh!

Conversation

mydea commented Apr 4, 2023

Caveats

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 4, 2023

size-limit report 📦

Uh oh!

bruno-garcia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mydea commented Apr 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants