We recently had a cache server go down, which caused our servers to become overwhelmed and start dropping requests like crazy. This seemed unusual: previously we've been fine with a cache server disappearing - we might get a handful of errors, but Rails' cache normally behaves fairly sensibly, just behaving like a cache-miss & silently failing on writes.
I think the new problem lies in a mix of Rails 7's error reporting behaviour, and Sentry's event_from_exception being a relatively slow method.
In Rails 7, every single cache error gets sent to the error_reporter (https://github.com/rails/rails/blob/0169d15bc7ec4557971d6ac6120e48b2cac9c407/activesupport/lib/active_support/cache/redis_cache_store.rb#L460-L466)
These then get sent to Sentry. We're using Sentry's BackgroundWorker to process these asynchronously (and drop events that exceed the max_queue size), but before that happens, Sentry calls event_from_exception on the current thread. This seems like quite a heavy method, mostly due to StackTraceBuilder -

and worse, it's called for every single error-report, regardless if BackgroundWorker is about to discard the error because the queue is full. config.before_send is called after event_from_exception, so there's not an obvious way for me to control whether that work gets done.
I'm not sure what a good fix would look like here -
- Possibly the fault lies with Rails 7 - for now, I think I'm going to patch the RedisCacheStore#failsafe method to not call
error_reporter&.report. It does seem unfortunate that even if you provide your own error_handler callback, ActiveSupport.error_reporter gets called regardless.
- It would be nice if
event_from_exception happened asynchronously in BackgroundWorker - that way a) it's not blocking the current thread, and b) it would benefit from not doing unnecessary work when we're just about to discard the exception. But that looks like quite a big change from the current flow, especially if it's keeping compatibility with the other config.async options.
- Maybe Sentry.capture_exception could have its own throttle that just discards exceptions if there's been more than X in the past Y seconds ?
Any other suggestions?
We recently had a cache server go down, which caused our servers to become overwhelmed and start dropping requests like crazy. This seemed unusual: previously we've been fine with a cache server disappearing - we might get a handful of errors, but Rails' cache normally behaves fairly sensibly, just behaving like a cache-miss & silently failing on writes.
I think the new problem lies in a mix of Rails 7's error reporting behaviour, and Sentry's event_from_exception being a relatively slow method.
In Rails 7, every single cache error gets sent to the error_reporter (https://github.com/rails/rails/blob/0169d15bc7ec4557971d6ac6120e48b2cac9c407/activesupport/lib/active_support/cache/redis_cache_store.rb#L460-L466)
These then get sent to Sentry. We're using Sentry's BackgroundWorker to process these asynchronously (and drop events that exceed the
max_queuesize), but before that happens, Sentry calls event_from_exception on the current thread. This seems like quite a heavy method, mostly due to StackTraceBuilder -and worse, it's called for every single error-report, regardless if BackgroundWorker is about to discard the error because the queue is full.
config.before_sendis called after event_from_exception, so there's not an obvious way for me to control whether that work gets done.I'm not sure what a good fix would look like here -
error_reporter&.report. It does seem unfortunate that even if you provide your ownerror_handlercallback, ActiveSupport.error_reporter gets called regardless.event_from_exceptionhappened asynchronously in BackgroundWorker - that way a) it's not blocking the current thread, and b) it would benefit from not doing unnecessary work when we're just about to discard the exception. But that looks like quite a big change from the current flow, especially if it's keeping compatibility with the otherconfig.asyncoptions.Any other suggestions?