Skip to content

Idempotent counter caches, fix concurrency issues with counter caches#14735

Merged
tenderlove merged 2 commits intorails:masterfrom
byroot:idempotent-counter-caches
Apr 14, 2014
Merged

Idempotent counter caches, fix concurrency issues with counter caches#14735
tenderlove merged 2 commits intorails:masterfrom
byroot:idempotent-counter-caches

Conversation

@byroot
Copy link
Copy Markdown
Member

@byroot byroot commented Apr 13, 2014

Fixes #7965 among other that I remember but can't find anymore.

Problem

In SQL a DELETE statement is inherently idempotent. You can repeat it indefinitely, it will always succeed.

On the other hand counter cache decrementation is not. So if you have 2 process following this scenario:

  • Process 1 load the record
  • Process 2 load the record
  • Process 1 destroy the record
  • Process 2 destroy the record

Then the counter cache is decremented twice.

Solution

This patch make the decrementation on after_destroy so it can check the affected_rows value returned by the database to only perform the decrementation if a record was actually deleted.

We (Shopify) are running a monkey patch version of this PR since 3 months now, and so far the problem disappeared.

Implementation

So far I'm not very happy with my implementation, but I think the only way to make it better is to make the counter cache logic happen in destroy_row, insert_row etc, instead of doing it in callbacks.

But it mean moving a lot of code, so I prefer to validates the idea before attempting it.

Non addressed issues

The counter update that happen when you change the foreign key, is subject to the same issue, but way harder to fix using this pattern. It would mean updating foreign key in distinct UPDATE queries before the main update, and checking the affected_rows for each before incrementing / decrementing the counters.

So it's feasible but again I prefer to validate the idea before attempting it.

/cc @ArthurN @jonleighton @tenderlove : What do you think?

@byroot
Copy link
Copy Markdown
Member Author

byroot commented Apr 13, 2014

Oops, sorry @ArthurN. I meant @arthurnn

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause uninitialized variable warnings.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if called only in after_destroy, which is the case here since this method is supposed to remain purely private.

I could actually access the variable directly.

Anyway, as I said, I think it would be better to integrate counter cache code in the CounterCache module instead of using callbacks.

@byroot
Copy link
Copy Markdown
Member Author

byroot commented Apr 14, 2014

@tenderlove beside the initialized variable thing, are you interested by this PR? Should I go ahead?

@tenderlove
Copy link
Copy Markdown
Member

Have you thought about adding the expected cache value to the WHERE clause of the SQL statement? Right now, it looks like the update query is this:

UPDATE "topics"
SET "replies_count" = COALESCE("replies_count", 0) - 1
WHERE "topics"."id" = 7

But we could change it to something like this:

UPDATE "topics"
SET "replies_count" = COALESCE("replies_count", 0) - 1
WHERE "topics"."id" = 7
AND COALESCE("replies_count", 0) = 10

If something else had updated the count, then this query would fail. Also it looks like we have access to the expected count, we just don't use it. Here is a non-working patch proof of concept to show you the gist. WDYT?

@tenderlove
Copy link
Copy Markdown
Member

Also, I am obligated to say that if you don't make your apps acquire a lock when updating the count, it doesn't matter what Ruby changes we make, it's possible for the count to be incorrect.

@byroot
Copy link
Copy Markdown
Member Author

byroot commented Apr 14, 2014

With all due respect what you propose is a terrible idea.

The topics record is shared by multiple replies records. So if 2 distinct replies are destroyed in the following timeline:

  • Process 1 load Reply 1
  • Process 2 load Reply 2
  • Process 1 destroy Reply 1 and decrement counter
  • Process 2 destroy Reply 2 and don't decrement counter -> FAIL

@byroot
Copy link
Copy Markdown
Member Author

byroot commented Apr 14, 2014

Also, I am obligated to say that if you don't make your apps acquire a lock when updating the count, it doesn't matter what Ruby changes we make, it's possible for the count to be incorrect.

Don't forget that the destroy and the decrement happen in a transaction. So when you destroy the record it implicitly lock it. If another transaction try to destroy the same record while the first one is not committed, it will block.

@tenderlove
Copy link
Copy Markdown
Member

With all due respect what you propose is a terrible idea.

That escalated quickly.

Don't forget that the destroy and the decrement happen in a transaction.

Ah, makes sense. This should be fine then I think.

@byroot
Copy link
Copy Markdown
Member Author

byroot commented Apr 14, 2014

That escalated quickly.

Sorry :/ I didn't know how to say it...

This should be fine then I think.

For what it's worth we used to have thousands of negatives counters, not mentioning the just slightly desynchronized ones. So far with this monkey patch our monitoring task did not detected a single desynchronized counter in 3 months.

Also what do you think of only fixing the destroy? Should I fix the update too even if it mean one more SQL query per counter cache? Or is it not acceptable?

And how would you feel about moving the code out of the callbacks?

@tenderlove
Copy link
Copy Markdown
Member

For what it's worth we used to have thousands of negatives counters, not mentioning the just slightly desynchronized ones. So far with this monkey patch our monitoring task did not detected a single desynchronized counter in 3 months.

Awesome, I'll merge this patch in.

Also what do you think of only fixing the destroy? Should I fix the update too even if it mean one more SQL query per counter cache? Or is it not acceptable?

Honestly, we don't use the counter caches at work, so I'm not 100% sure. It seems fine to me, will it be OK for your systems?

And how would you feel about moving the code out of the callbacks?

👍

tenderlove added a commit that referenced this pull request Apr 14, 2014
Idempotent counter caches, fix concurrency issues with counter caches
@tenderlove tenderlove merged commit a1e2db2 into rails:master Apr 14, 2014
@byroot
Copy link
Copy Markdown
Member Author

byroot commented Apr 14, 2014

Woo! Awesome! Thanks! ❤️

will it be OK for your systems?

We never update the foreign key, so we don't really care about fixing this part... But my conscience say that we should not fix only half of the feature...

I'll submit one or 2 other PRs to refactor counter cache and maybe fix the update too then.

Thanks again!

@byroot byroot deleted the idempotent-counter-caches branch April 14, 2014 17:20
bquorning added a commit to zendesk/zombie_record that referenced this pull request Aug 24, 2015
Previously we have just overridden the behavior of
`ActiveRecord::Persistence#destroy_row` to apply soft-delete
functionality. This does not work, since Rails depends on inheritance
to add extra functionality to `#destroy_row`. In Rails 4.2, this
includes `ActiveRecord::CounterCache#destroy_row` calling `super`, as
well as `ActiveRecord::Locking::Optimistic#destroy_row` also calling
`super`.

See also

rails/rails#14735
rails/rails#14765
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants