Deduplicate redundant lazy-loaded members by ara4n · Pull Request #3331 · matrix-org/synapse

ara4n · 2018-06-03T22:10:08Z

As per the proposal; we can deduplicate redundant lazy-loaded members which are sent in the same sync sequence (by assuming any client capable of persisting a sync token and understanding lazy-loaded members is also capable of persisting membership details).

We do this heuristically rather than requiring the client to somehow tell us which members it has chosen to cache, by instead caching the last N members sent to a client in a LruCache, and not sending them again. For now we hardcode N to 100.

Each such cache for a given (user,device) tuple is in turn cached for up to X minutes (to avoid the caches building up) in an ExpiringCache. For now we hardcode X to 30.

Builds on #2970.
Sytest at matrix-org/sytest#467

It deliberately doesn't attempt to deduplicate redundant members in a limited sync response (for now).

To do:

see if it works
sytest
make it optional so that thick clients like bots can opt out.

as per the proposal; we can deduplicate redundant lazy-loaded members which are sent in the same sync sequence. we do this heuristically rather than requiring the client to somehow tell us which members it has chosen to cache, by instead caching the last N members sent to a client, and not sending them again. For now we hardcode N to 100. Each cache for a given (user,device) tuple is in turn cached for up to X minutes (to avoid the caches building up). For now we hardcode X to 30.

ara4n · 2018-07-19T23:11:20Z

@matrixbot retest this please

…azy_members

ara4n · 2018-07-25T23:20:07Z

@richvdh this is next up in the LL stack. however i suspect it's sufficiently decoupled from #2970 that someone else could take it if you're fed up with this.

ara4n · 2018-07-26T14:29:25Z

@matrixbot retest this please

richvdh · 2018-07-26T14:55:04Z

synapse/api/filtering.py

        return self.filter_json.get("lazy_load_members", False)

+    def include_redundant_members(self):
+        return self.filter_json.get("include_redundant_members", False)


as per https://docs.google.com/document/d/11yn-mAkYll10RJpN0mkYEVqraTbU3U4eQx9MNrzqX1U/edit?disco=AAAACEx0noo, we should consider validating the value passed by the client - presumably in the constructor rather than here.

(this applies to lazy_load_members too, of course; I just forgot it there.)

i've fixed up the proposal doc to explicitly demand {true|false} there. This is however being strictly validated anyway via the JSON schema validation over at: https://github.com/matrix-org/synapse/pull/3331/files#diff-ed81002a2d319904392e1a6f871eb2edR121

oooh I hadn't spotted that. well, yay!

richvdh · 2018-07-26T14:55:43Z

synapse/handlers/sync.py

+        # ExpiringCache((User, Device)) -> LruCache(membership event_id)
+        self.lazy_loaded_members_cache = ExpiringCache(
+            "lazy_loaded_members_cache", self.clock,
+            max_len=0, expiry_ms=LAZY_LOADED_MEMBERS_CACHE_MAX_AGE


trailing comma please

richvdh · 2018-07-26T14:58:44Z

synapse/handlers/sync.py

                    )
                ]

+                if not include_redundant_members:


why are we doing this here, rather than where the cache is used below?

hysterical reasons. fixed.

richvdh · 2018-07-26T15:06:12Z

synapse/handlers/sync.py

+                    }
+                    logger.debug("...to %r", state_ids)
+
+                # add any member IDs we are about to send into our LruCache


it seems problematic that we only populate the cache if lazy_load_members and not include_redundant_members. what if the client calls sync with include_redundant_members=False, and then later calls it with it True?

I can see an efficiency argument, but if we're going to say that's a thing that clients can't do, let's spell it out in the proposal, along with the steps they would need to take to change their mind (presumably a re-initial-sync?)

Relatedly, is there a danger of it breaking for people who switch between client versions that have support and those that don't? I can't think of a failure offhand, but it might be worth thinking a bit harder about it?

@bwindels already hit this actually whilst implementing it on riot-web. we'll need to mandate that clients do a re-initialsync if they change their lazy-loading config (whether that's wrt redundancy or laziness). i'll add it to the prop.

richvdh · 2018-07-26T15:11:27Z

synapse/handlers/sync.py

+                    state_ids = {
+                        t: state_id
+                        for t, state_id in state_ids.iteritems()
+                        if not cache.get(state_id)


I've got a feeling this isn't going to be adequate. It's possible for state to revert to an earlier event thanks to state resolution: so for example Bob's member event might be A, then B, then back to A. In this case we won't tell clients it's gone back to A, because A is already in the cache.

(Admittedly there are probably other bugs in the sync code in this area, but let's not add more.)

I suspect you need to maintain the latest (type, state_key) => event_id mapping in the cache, rather than just a list of event ids.

good point. fixed.

(although I maintain the cache as state_key => event_id for simplicity and efficiency, as type is redundant)

richvdh · 2018-07-26T15:12:48Z

synapse/handlers/sync.py

+                    logger.debug("filtering state from %r...", state_ids)
+                    state_ids = {
+                        t: state_id
+                        for t, state_id in state_ids.iteritems()


I won't ask you change the existing state_ids var, but can you s/state_id/event_id/ if it's an event id? To me a state_id sounds more like a state group id than an event id.

ara4n · 2018-07-26T17:24:22Z

@richvdh ptal

richvdh

lgtm

richvdh · 2018-07-26T21:25:27Z

(It's worrying that the postgres tests are failing, though it looks unrelated :/)

richvdh · 2018-07-26T21:25:36Z

retest this please

ara4n mentioned this pull request Jun 3, 2018

Implement the lazy_load_members room state filter parameter #2970

Merged

14 tasks

ara4n added 2 commits June 10, 2018 12:41

Merge branch 'develop' into matthew/remove_redundant_lazy_members

c341d81

add include_redundant_members filter option & make it work

f7bd5da

ara4n mentioned this pull request Jun 10, 2018

Full test for lazy loading members matrix-org/sytest#445

Merged

manuroe mentioned this pull request Jul 4, 2018

Implement room members lazy-loading client side element-hq/riot-meta#209

Open

merge and apply isort

589e5aa

ara4n changed the title ~~WIP: deduplicate redundant lazy-loaded members~~ Deduplicate redundant lazy-loaded members Jul 19, 2018

merge in #2970

8e66dd1

ara4n assigned richvdh Jul 19, 2018

fix bad merge

a08b37b

This was referenced Jul 19, 2018

Redundant lazy members test matrix-org/sytest#467

Merged

make /context lazyload & filter aware #3567

Merged

speed up /members and add at= and membership params #3568

Merged

initial cut at a room summary API #3574

Merged

ara4n added 2 commits July 23, 2018 19:23

Merge branch 'matthew/filter_members' into matthew/remove_redundant_l…

f9c3c26

…azy_members

changelog

7d99b0e

ara4n mentioned this pull request Jul 23, 2018

lazyload aware /messages #3589

Merged

ara4n added 7 commits July 24, 2018 21:53

Merge branch 'matthew/filter_members' into matthew/remove_redundant_l…

d32e5f8

…azy_members

deduplicating redundant members via event_id rather than mxid

238f750

remove stale todo

03d3ce2

add tests for _get_some_state_from_cache

c8825b0

incorporate more review.

575a368

Merge branch 'matthew/filter_members' into matthew/remove_redundant_l…

0e3257c

…azy_members

Merge branch 'develop' into matthew/remove_redundant_lazy_members

de37ce6

ara4n changed the base branch from matthew/filter_members to develop July 25, 2018 23:16

richvdh requested a review from a team July 26, 2018 14:45

richvdh suggested changes Jul 26, 2018

View reviewed changes

richvdh assigned ara4n and unassigned richvdh Jul 26, 2018

ara4n added 2 commits July 26, 2018 18:23

incorporate review

6b17981

Merge branch 'develop' into matthew/remove_redundant_lazy_members

bbb70de

ara4n assigned richvdh and unassigned ara4n Jul 26, 2018

richvdh approved these changes Jul 26, 2018

View reviewed changes

ara4n merged commit a75231b into develop Jul 26, 2018

richvdh mentioned this pull request Aug 16, 2018

Proposal for lazy-loading room members to improve initial sync speed and client RAM usage matrix-org/matrix-spec-proposals#1227

Closed

hawkowl deleted the matthew/remove_redundant_lazy_members branch September 20, 2018 14:01

Uh oh!

Conversation

ara4n commented Jun 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ara4n commented Jul 19, 2018

Uh oh!

ara4n commented Jul 25, 2018

Uh oh!

ara4n commented Jul 26, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ara4n commented Jul 26, 2018

Uh oh!

richvdh left a comment

Choose a reason for hiding this comment

Uh oh!

richvdh commented Jul 26, 2018

Uh oh!

richvdh commented Jul 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ara4n commented Jun 3, 2018 •

edited

Loading