Dealing with thousands of errors (Or, event_from_exception is slow. Or, Rails 7's error_reporter is awkward)

We recently had a cache server go down, which caused our servers to become overwhelmed and start dropping requests like crazy. This seemed unusual: previously we've been fine with a cache server disappearing - we might get a handful of errors, but Rails' cache normally behaves fairly sensibly, just behaving like a cache-miss & silently failing on writes.

I think the new problem lies in a mix of Rails 7's error reporting behaviour, and Sentry's event_from_exception being a relatively slow method.

In Rails 7, every single cache error gets sent to the error_reporter (https://github.com/rails/rails/blob/0169d15bc7ec4557971d6ac6120e48b2cac9c407/activesupport/lib/active_support/cache/redis_cache_store.rb#L460-L466)

These then get sent to Sentry. We're using Sentry's BackgroundWorker to process these asynchronously (and drop events that exceed the `max_queue` size), but before that happens, Sentry calls event_from_exception on the current thread. This seems like quite a heavy method, mostly due to StackTraceBuilder - 

<img width="1452" alt="image" src="https://user-images.githubusercontent.com/2377/158985235-26455533-f177-4b97-ac4a-5cf3db5255bc.png">

and worse, it's called for every single error-report, regardless if BackgroundWorker is about to discard the error because the queue is full. `config.before_send` is called after event_from_exception, so there's not an obvious way for me to control whether that work gets done.


I'm not sure what a good fix would look like here - 
* Possibly the fault lies with Rails 7 - for now, I think I'm going to patch the RedisCacheStore#failsafe method to not call `error_reporter&.report`.  It does seem unfortunate that even if you provide your own `error_handler` callback, ActiveSupport.error_reporter gets called regardless.
* It would be nice if `event_from_exception` happened asynchronously in BackgroundWorker - that way a) it's not blocking the current thread, and b) it would benefit from not doing unnecessary work when we're just about to discard the exception.  But that looks like quite a big change from the current flow, especially if it's keeping compatibility with the other `config.async` options.
* Maybe Sentry.capture_exception could have its own throttle that just discards exceptions if there's been more than X in the past Y seconds ?

Any other suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dealing with thousands of errors (Or, event_from_exception is slow. Or, Rails 7's error_reporter is awkward) #1765

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Dealing with thousands of errors (Or, event_from_exception is slow. Or, Rails 7's error_reporter is awkward) #1765

Description

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions