Skip to content

Add a resilient sidekiq client (liked resilient logged webhooks)#965

Merged
rgalanakis merged 4 commits intomainfrom
reliability
Jun 9, 2025
Merged

Add a resilient sidekiq client (liked resilient logged webhooks)#965
rgalanakis merged 4 commits intomainfrom
reliability

Conversation

@rgalanakis
Copy link
Contributor

Add a resilient sidekiq client

There are generally four situations the app can be in
when a webhook comes in:

  • Stable, everything is handled well.
  • Programming error, which should 500.
  • Postgres is unavailable; in this case, we use the resilient
    logged webhooks, which will automatically retry the webhook later.
    It stores the webhook in another Postgres in the meantime.
  • Redis in unavailable; in this case, we were 500ing,
    but because the LoggedWebhook insert succeeded,
    we would never retry the webhook.

This last condition means we didn't make the uptime guarantees
we otherwise should be, allowing us to ingest webhooks for processing
even when data stores are unavailable.

To solve this, we add a new 'resilient' sidekiq client
that will push to a Postgres database, the same way logged webhooks
write to a Postgres database if the initial insert succeeds.

Using the same logic as resilient logged webhooks,
we replay these events when Redis becomes available.

In this case, because the job is done async,
we can still 200 from the webhook handler;
it doesn't matter to the caller whether we are processing the job
directly via Sidekiq, or through the resilient postgres datastore
and then eventually onto Sidekiq.


Refactor LoggedWebhook::Resilient into reusable base class and helper


Fix non-optimal query in avoid_writes?

Was using COUNT with LIMIT 1,
which doesn't do anything.

Was using COUNT with LIMIT 1,
which doesn't do anything.
There are generally four situations the app can be in
when a webhook comes in:

- Stable, everything is handled well.
- Programming error, which should 500.
- Postgres is unavailable; in this case, we use the resilient
  logged webhooks, which will automatically retry the webhook later.
  It stores the webhook in another Postgres in the meantime.
- Redis in unavailable; in this case, we were 500ing,
  but because the LoggedWebhook insert succeeded,
  we would never retry the webhook.

This last condition means we didn't make the uptime guarantees
we otherwise should be, allowing us to ingest webhooks for processing
even when data stores are unavailable.

To solve this, we add a new 'resilient' sidekiq client
that will push to a Postgres database, the same way logged webhooks
write to a Postgres database if the initial insert succeeds.

Using the same logic as resilient logged webhooks,
we replay these events when Redis becomes available.

In this case, because the job is done async,
we can still 200 from the webhook handler;
it doesn't matter to the caller whether we are processing the job
directly via Sidekiq, or through the resilient postgres datastore
and then eventually onto Sidekiq.
@codecov
Copy link

codecov bot commented Jun 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.57%. Comparing base (20f73d5) to head (8bf1b80).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #965      +/-   ##
==========================================
+ Coverage   97.07%   97.57%   +0.49%     
==========================================
  Files         488      490       +2     
  Lines       31046    30991      -55     
==========================================
+ Hits        30139    30240     +101     
+ Misses        907      751     -156     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Delete unused code
- Cover some easy code missing coverage
- Rewrite NotImplemented to single line methods that pass coverage
@rgalanakis rgalanakis merged commit 36267a9 into main Jun 9, 2025
4 checks passed
@rgalanakis rgalanakis deleted the reliability branch June 9, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant