Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Begin adding implementing room chunks#3226

Merged
erikjohnston merged 9 commits intoerikj/room_chunksfrom
erikj/chunk_base
May 18, 2018
Merged

Begin adding implementing room chunks#3226
erikjohnston merged 9 commits intoerikj/room_chunksfrom
erikj/chunk_base

Conversation

@erikjohnston
Copy link
Member

This commit adds the necessary tables and columns, as well as an
implementation of an online topological sorting algorithm to maintain an
absolute ordering of the room chunks.

Note: This is the start of a series of PRs. I'm landing them on a separate branch as we should probably land the full set onto develop at once, but want to make separate PRs to make review easier

This commit adds the necessary tables and columns, as well as an
implementation of an online topological sorting algorithm to maintain an
absolute ordering of the room chunks.
psql_only=True,
)

self.register_background_index_update(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to be expensive. Might want to warn users about it somewhere.

"""This module contains an implementation of the Katriel-Bodlaender algorithm,
which is used to do online topological ordering of graphs.

Note that the ordering derived from the graph has the first node one with no
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the first node one"?


class OrderedListStore(object):
"""An abstract base class that is used to store a topological ordering of
a graph. Suitable for use with the Katriel-Bodlaender algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused: "Suitable for use with" or "Implements" ?

class OrderedListStore(object):
"""An abstract base class that is used to store a topological ordering of
a graph. Suitable for use with the Katriel-Bodlaender algorithm.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some more words about the assumptions being made here would be useful. Likewise re the interface being exposed, to save me having to go and grok the paper.

For example:

  • Looks like this assumes nodes are identified by unique strings?
  • How do edges and ordering interact?
  • does adding a node before/after another imply an edge between those nodes?

in the database.

Internally the ordering is implemented using floats, and the average is
taken when a node is inserted inbetween other nodes. To avoid presicion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/inbetween/between/

in the database.

Internally the ordering is implemented using floats, and the average is
taken when a node is inserted inbetween other nodes. To avoid presicion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision

in a range around the node, where the bounds are rounded to this
number of digits.
min_difference (int): A rebalance is triggered when the difference
between two successive orderings are less than the reverse of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reverse/reciprocal/ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/are/is/. Two orderings, one difference.

logger = logging.getLogger(__name__)


class ChunkDBOrderedListStore(OrderedListStore):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth noting what a room chunk actually is, as well as noting that we are storing them in an ordered graph

with Measure(self.clock, "chunk_rebalance"):
# We pick the interval to try and minimise the number of decimal
# places, i.e. we round to nearest float with `rebalance_digits` and
# use that as the middle of the interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we use it as the end of the interval to me

@richvdh richvdh assigned erikjohnston and unassigned richvdh May 17, 2018
This both simplifies the code, and ensures that the target node is
roughly in the center of the range rather than at an end.
This makes it clearer what the public interface is vs what subclasses
need to implement.
@erikjohnston erikjohnston assigned richvdh and unassigned erikjohnston May 17, 2018
CHANGES.rst Outdated
=======================

This release adds an index to the events table. This means that on first
startup there will be an inceased amount of IO until the index is created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and it will use up a bunch of disk space


A room chunk is a connected portion of the room events DAG. As such it
inherits a DAG, i.e. if an event in one chunk references an event in a
second chunk, then we say that the first chunk references the second, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably explain how we get into this situation, rather than the events just being in the same chunk.

the room DAG, as newer messages would be added to the start rather than the
end.

***We therefore invert the direction of edges when using the algorithm***
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/we/ChunkDBOrderedListStore/ ?

self._insert_before(node_id, None)

def add_edge(self, source, target):
"""Adds a new edge is added to the graph and updates the ordering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is added//



class InMemoryOrderedListStore(OrderedListStore):
"""An in memory OrderedListStore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you planning to use this outside the tests? if not I'd move it to the tests heirarchy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's a few cases where we want to order events by depth where this would be useful. Though admittedly I'm not 100% sure that it'll be needed in the end.

"""Used as the list store for room chunks, efficiently maintaining them in
topological order on updates.

A room chunk is a connected portion of the room events DAG. As such it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it" -> the set of chunks in a room?

A room chunk is a connected portion of the room events DAG. As such it
inherits a DAG, i.e. if an event in one chunk references an event in a
second chunk, then we say that the first chunk references the second, and
thus forming a DAG.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but potentially a disconnected DAG. How does the ordering happen in that case?


The server may only have a subset of all events in a room, in which case
its possible for the server to have chunks that are unconnected from each
other. The ordering between unconnected chunks is arbitrary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have four chunks, with two connected parts (A->B , C->D), so we know that A<B and C<D, is it also the case that either B<C (so A<B<C<D) or D<A (so C<D<A<B) ? or could we end up with B in the middle of C and D or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is anything stopping B being in the middle, since theoretically if you inserted the chunks in the order A, C, B, D, then that obeys all the constraints and so the KB algorithm will leave things alone (I think).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally this lgtm and I am now quibbling about words which can easily be updated in the future

an event in a second chunk, then we say that the first chunk references the
second, and thus forming a DAG.

Chunks are constructed so that they have the aditional property that for all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aditional

reference an event in the chunk and be referenced by an event in the chunk
(assuming no cycles).

Multiple chunks can therefore happen when the server misses some events,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We can therefore end up with multiple chunks in a room when..."

inherits a DAG, i.e. if an event in one chunk references an event in a
second chunk, then we say that the first chunk references the second, and
thus forming a DAG.
A room chunk is a connected portion of the room events DAG. As such the set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest moving "As such" to after the explanation about how they are constructed

an event in a second chunk, then we say that the first chunk references the
second, and thus forming a DAG.

Chunks are constructed so that they have the aditional property that for all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth noting somewhere here that chunks start out disconnected, and then get connected later when we receive events that, uh, connect them.

@erikjohnston erikjohnston merged commit 0a325e5 into erikj/room_chunks May 18, 2018
@hawkowl hawkowl deleted the erikj/chunk_base branch September 20, 2018 13:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants