-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add new client_report category for replay #6774
Description
We want to add two new client report categories to allow us to capture things for replay that are currently not really covered by existing categories:
send_error: A more generic type ofnetwork_error, which can be used when sending an event failed for different reasons than a network_error, e.g. when receiving a 400 error or when something inside the transport fails.internal_sdk_error: In replay, there is some internal code that can result in recording being stopped & events being dropped because of this (e.g. the compression worker fails, ...). We want to be able to capture this as well.
Initial draft (for reference)
For replay, we have some retry logic which will retry some failed events a few times, before fully stopping. This is the case because we depend on segment order, so a single actual fail means the whole replay is stopped. So we decided it is fine to retry up to 3 times with increasing backoff until we stop.
Note: We do not retry when we hit either a rate limit or an API error - in these cases, we stop immediately (in the case of rate limit, we will re-start once the rate limit is over).
In order to know how often we actually fully stop, we would like to use client reports.
We propose to add a new category send_error which we can use when we exceeded the retry budget: send_error:replay. This needs to be synced with backend changes, where the category also needs to be added.
This can be used when sending failed for generic reasons (that are not covered by other categories) - for example when hitting a 400 or 500 error, or when sending fails for unspecified reasons (e.g. we throw on send).