feat(replay): Keep min. 30 seconds of data in error mode#7025
feat(replay): Keep min. 30 seconds of data in error mode#7025
Conversation
size-limit report 📦
|
f0283ca to
7146b49
Compare
| describe('Unit | eventBuffer | EventBufferArray', () => { | ||
| it('adds events to normal event buffer', async function () { | ||
| const buffer = createEventBuffer({ useCompression: false }); | ||
| for (let recordingMode of ['session', 'error'] as ReplayRecordingMode[]) { |
There was a problem hiding this comment.
We just make sure here that this works the same in both error & session mode.
| expect(replay.isEnabled()).toEqual(false); | ||
| }); | ||
|
|
||
| it.each([ |
There was a problem hiding this comment.
These are the most relevant tests here, making sure it works the same with & without compression for both error & session mode.
7146b49 to
07d63b6
Compare
07d63b6 to
47bdb74
Compare
packages/replay/src/replay.ts
Outdated
|
|
||
| this.eventBuffer = createEventBuffer({ | ||
| useCompression: this._options.useCompression, | ||
| recordingMode: this.recordingMode, |
There was a problem hiding this comment.
Is there gonna be an issue when we switch from error -> session recording mode?
There was a problem hiding this comment.
Shouldn't, it only means that (for now) we will continue in partitioned mode for compression - meaning even after the error was sent, we'll still not be using the streaming compression but the "all at once" compression. We may adjust this in the future, but this'll mean to swap out the event buffer at runtime, which I wanted to avoid for this first step.
Maybe it is clearer to rename this, e.g. instead of recordingMode make it rememberLastCheckout: boolean? Then it's a bit clearer what it does and that it is not really directly related to the recording mode?
There was a problem hiding this comment.
changed this to keepLastCheckout: boolean, which should hopefully be clearer!
packages/replay/test/unit/eventBuffer/EventBufferPartitionedCompressionWorker.tes.ts
Show resolved
Hide resolved
| if (replay.recordingMode === 'error') { | ||
| // Do not wait on it, just do it | ||
| // We know in this mode this is actually "sync" | ||
| void eventBuffer.clear(true); | ||
|
|
||
| // Ensure we have the correct first checkout timestamp when an error occurs | ||
| if (!session.segmentId) { | ||
| replay.getContext().earliestEvent = eventBuffer.getEarliestTimestamp(); | ||
| } |
There was a problem hiding this comment.
Just to clarify for myself, we examine the buffer for the earliest event because we want the replay "start time" to be when the 1st checkout in the buffer happens.
So if session has been running for 2.5 minutes, we would want the replay to "start" at ~2 minutes. And replay.context.earliestEvent (before this block of code), would represent the earliest event of the entire "session".
So an issue I can think of is with the performance observer. I believe we only add them when we flush, so those events will end up becoming the earliest events anyway (which might be fine).
There was a problem hiding this comment.
Yeah, to be honest all of this is a bit brittle to me 😅 but this was the best I could come up with to handle this in a somewhat reasonable way. I wonder if it makes sense to maybe compute this on the server? Or generally change how we keep this a bit, not sure...
There was a problem hiding this comment.
semi-related to this PR but I was also wondering about that - do we discard events with timestamps before the initial full checkout on the server? Or would sending such events cause problems?
There was a problem hiding this comment.
We discard events that are older than SESSION_IDLE_TIMEOUT minutes ago. Because we get buffered data from performance observer, those occur before initial checkout.
I wonder if we need to change/move this logic as well
sentry-javascript/packages/replay/src/replay.ts
Lines 552 to 557 in c3dd355
I think this is to meant to update session start time so we know when to correctly timeout a session.
47bdb74 to
3602fc4
Compare
|
FYI since main focus right now is on stability, let's probably wait for until GA to merge this in. WDYT @billyvg ? |
|
@mydea agreed! |
3602fc4 to
22275b0
Compare
22275b0 to
7ba8c7b
Compare
|
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you label it "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
|
Closing in favor of #7992 |
Supersedes #6924
This PR adjusts the event buffer to keep 30-60s of recording data (instead of 0-60s) in error mode.
This is accomplished by keeping the events in a queue that tracks when the last checkout happens, and allows to delete everything up until this event.
In compression mode, this means that we do not stream the events to the compressor, but compress them all at once. That's necessary because it is not really possible to remove parts of the compression stream.
In session mode, we continue to use the streaming compression as we do right now.