-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Prefill and invalidate getEvent cache are happening out of order when persisting transactions. #15757
Description
While working on learning how the _get_event_cache work, I spotted a strange condition of invalidation and prefill being added to a transaction to be run after the transaction was finished, but it was running the prefill, then invalidating it. Subsequently, it would just re-get the event the next time it was retrieved from the database.
In PersistEventsStore.persist_event_txn():
synapse/synapse/storage/databases/main/events.py
Lines 393 to 399 in fcc3ca3
| # Once the txn completes, invalidate all of the relevant caches. Note that we do this | |
| # up here because it captures all the events_and_contexts before any are removed. | |
| for event, _ in events_and_contexts: | |
| self.store.invalidate_get_event_cache_after_txn(txn, event.event_id) | |
| if event.redacts: | |
| self.store.invalidate_get_event_cache_after_txn(txn, event.redacts) | |
Which uses txn.call_after() and txn.async_call_after() under the hood.
Lower down(inside the _update_metadata_tables_txn() which then calls add_to_cache()) near the end, there is a database read of the events and then the prefill bits(which are inside add_to_cache()):
synapse/synapse/storage/databases/main/events.py
Lines 1732 to 1739 in fcc3ca3
| async def prefill() -> None: | |
| for cache_entry in to_prefill: | |
| await self.store._get_event_cache.set( | |
| (cache_entry.event.event_id,), cache_entry | |
| ) | |
| txn.async_call_after(prefill) | |
This uses txn.async_call_after() (as you can see at the bottom)
The docustring for LoggingTransaction has the information on what happens with txn.call_after() and txn.async_call_after():
synapse/synapse/storage/database.py
Lines 245 to 252 in fcc3ca3
| after_callbacks: A list that callbacks will be appended to | |
| that have been added by `call_after` which should be run on | |
| successful completion of the transaction. None indicates that no | |
| callbacks should be allowed to be scheduled to run. | |
| async_after_callbacks: A list that asynchronous callbacks will be appended | |
| to by `async_call_after` which should run, before after_callbacks, on | |
| successful completion of the transaction. None indicates that no | |
| callbacks should be allowed to be scheduled to run. |
...which summarizes nicely as 'async_call_after() runs before call_after()'.
So:
- The invalidation of events is added with
call_after()andasync_call_after() - The prefill is added with
async_call_after() - The transaction finishes
- The async invalidation runs, purging the event from the cache
- The prefill runs(because it's async, and those run before the non-async)
- The non-async invalidation runs, purging the cache of the event we just loaded.
- ...time passes(probably not much)
- The event is reloaded from the database, because it's not in the cache.
Which, I think, means it's hitting the database for a read twice as often as necessary.
Bonus points attempt:
These were added to assist in the preparatory work for an external caching system, and for that reason I want to take that into account when looking at this. The external system will require the code to run async/await, so this will have to be maintained in any fix. I recommend an additional method to be added to AsyncLruCache that will explicitly handle setting a value for the external cache. However that isn't actually implemented yet(or upstreamed, whatever), so will no-op just as get_external does now.