Add endpoints for backfilling history (MSC2716)#9247
Conversation
Edits to make TARDIS work with Synapse while writing Complement tests for [MSC 2716](matrix-org/matrix-spec-proposals#2716). - matrix-org/synapse#9247 - matrix-org/complement#68
Also edits for TARDIS visualation: matrix-org/tardis#1
TODO: Is the assumption of anytime we pass in prev_event_ids, we use same depth good enough? What corner cases are there? I see that we also pass in prev_event_ids this in synapse/handlers/room_member.py so need to make sure that still work as expected
46625b7 to
9b5e057
Compare
…n prev_event query param is used
|
@MadLittleMods This seems to have some style / CI issues. Were you looking for general feedback or were you hoping to get this merged? |
|
@clokep Some general comments and answer questions in the PR. What tests and where to add, etc? What's more needed content wise to merge. Outside are a few lints to clean up. Behind feature flag or unstable build type of thing? |
55815e3 to
e7d7f92
Compare
| old_depth = await self._store.get_max_depth_of(prev_event_ids) | ||
| depth = old_depth + 1 | ||
| # Otherwise, progress the depth as normal | ||
| if depth is None: |
There was a problem hiding this comment.
The last test failures are coming from this new line (if I comment it out to always calculate depth, the tests pass). Is there a way to see logger.info from within the app while running the twisted.trial tests? I am having digging into this further without being able to log.
Test failure: https://github.com/matrix-org/synapse/pull/10049/checks?check_run_id=2764607442#step:8:2373
Reproduction locally:
python -m twisted.trial tests.handlers.test_presence.PresenceJoinTestCase.test_remote_gets_presence_when_local_user_joins
There was a problem hiding this comment.
Setting the env to SYNAPSE_TEST_LOG_LEVEL=DEBUG will give you debug logs in _trial_temp/test.log 🙂
There was a problem hiding this comment.
It looks like I accidentally set depth to False in the tests.
Found the problem first by logging what the weird depth value was above the line in question and saw it was False at some point:
logger.info("depth1=%s", depth)Then I added this line which will cause it to throw an exception (because no %s in the formatting string)when this condition is met and give us a full stack trace.
if depth == False:
logging.info("Why is depth False?", depth)$ SYNAPSE_TEST_LOG_LEVEL=DEBUG python -m twisted.trial tests.handlers.test_presence.PresenceJoinTestCase.test_remote_gets_presence_when_local_user_joins
tests.handlers.test_presence
PresenceJoinTestCase
test_remote_gets_presence_when_local_user_joins ... [FAIL]
===============================================================================
[FAIL]
Traceback (most recent call last):
File "/Users/eric/Documents/github/element/synapse/tests/handlers/test_presence.py", line 815, in test_remote_gets_presence_when_local_user_joins
self._add_new_user(room_id, "@alice:server2")
- File "/Users/eric/Documents/github/element/synapse/tests/handlers/test_presence.py", line 866, in _add_new_user
- event = self.get_success(builder.build(prev_event_ids, None, False))
File "/Users/eric/Documents/github/element/synapse/tests/unittest.py", line 500, in get_success
return self.successResultOf(d)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/site-packages/twisted/trial/_synctest.py", line 706, in successResultOf
self.fail(
twisted.trial.unittest.FailTest: Success result expected on <Deferred at 0x10de9a3a0 current result: None>, found failure result instead:
Traceback (most recent call last):
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/site-packages/twisted/internet/defer.py", line 477, in callback
self._startRunCallbacks(result)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/site-packages/twisted/internet/defer.py", line 580, in _startRunCallbacks
self._runCallbacks()
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/site-packages/twisted/internet/defer.py", line 1514, in gotResult
current_context.run(_inlineCallbacks, r, g, status)
--- <exception caught here> ---
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/site-packages/twisted/internet/defer.py", line 1445, in _inlineCallbacks
result = current_context.run(g.send, result)
File "/Users/eric/Documents/github/element/synapse/synapse/events/builder.py", line 142, in build
logging.info("Why is depth False?", depth)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 2070, in info
root.info(msg, *args, **kwargs)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 1434, in info
self._log(INFO, msg, args, **kwargs)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 1577, in _log
self.handle(record)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 1587, in handle
self.callHandlers(record)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 1649, in callHandlers
hdlr.handle(record)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 950, in handle
self.emit(record)
File "/Users/eric/Documents/github/element/synapse/tests/test_utils/logging_setup.py", line 28, in emit
log_entry = self.format(record)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 925, in format
return fmt.format(record)
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 664, in format
record.message = record.getMessage()
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/logging/__init__.py", line 369, in getMessage
msg = msg % self.args
builtins.TypeError: not all arguments converted during string formattingIs there a better way to get a stack trace/call stack? I tried logger.exception and adding exc_info=True to the logger functions but they just gave small trace back to the defer and don't play well with the async stuff.
2021-06-09 01:56:40-0500 [-] 2021-06-09 01:56:40,722 - root - 142 - INFO - sentinel - Why is depth False?
Traceback (most recent call last):
File "/Users/eric/.pyenv/versions/3.8.6/lib/python3.8/site-packages/twisted/internet/defer.py", line 1445, in _inlineCallbacks
result = current_context.run(g.send, result)
StopIteration: _GetStateGroupDelta(prev_group=5, delta_ids={('m.room.history_visibility', ''): '$6KwfF_Su1zGCDsZn4FvqkfkcPqI2CnPn_-Uue7pXRCc'})
There was a problem hiding this comment.
Documented the SYNAPSE_TEST_LOG_LEVEL=DEBUG( -> _trial_temp/test.log) trick in #10148. Thanks for sharing!
There was a problem hiding this comment.
Can you raise if depth is wrong? That gives me a good stack trace. I think the defer code does special things to make the stack traces work out?
There was a problem hiding this comment.
raise does work well 👍 Thanks for the better way to do this!
Is there a way I can just log a line and get a stack trace without breaking the normal execution flow?
In JavaScript, I would use console.log('something bad happened', new Error().stack);
There was a problem hiding this comment.
I think creating a twisted Failure object might work: https://github.com/matrix-org/synapse/blob/develop/synapse/handlers/pagination.py#L285-L288, but I'm not 100% sure
Conflicts: synapse/config/experimental.py synapse/handlers/room_member.py
Conflicts: synapse/config/experimental.py
erikjohnston
left a comment
There was a problem hiding this comment.
I think once we've fixed up the paths we should be done 🙂
|
On the complemenet tests: you may need to merge in latest develop into this branch and the same named branch on complement repo (or delete that branch)? The tests failures look to be mostly for things like spaces and knocking, which have recently had work done on them? |
|
Woot! Thank you @erikjohnston for all the help guiding this along and review to push this to the right places! 🐗 |

Implement MSC2716 to add endpoints for backfilling history. This PR does not support federation use cases with the "marker" and "insertion" events.
For reviewers, it's probably best to see this in action with the associated Complement tests.
Complement MR: matrix-org/complement#68
TARDIS visualization MR: matrix-org/tardis#1
Getting started
The PR adds the
POST /_matrix/client/unstable/org.matrix.msc2716/rooms/<roomID>/batch_send?prev_event=<eventID>&chunk_id=<chunkID>endpoint which can insert a chunk of events historically back in time next to the givenprev_event.chunk_idcomes fromnext_chunk_idin the response of the batch send endpoint and is derived from the "insertion" events added to each chunk. It's not required for the first batch send.The
/batchsendendpoint is behind a feature flag:experimental_features->msc2716_enabled(defined inhomseserver.yaml). And is only available to application services so you will need to add one to yourhomeserver.ymland use theas_tokendefined to interact with the API (other tokens will 403).state_events_at_startis used to define the historical state events needed to auth theeventslike join events. These events will float outside of the normal DAG asoutlier's and won't be visible in the chat history which also allows us to insert multiple chunks without having a bunch of@mxid joined the roomnoise between each chunk.eventsis chronological(oldest to newest) chunk/list of events you want to insert. There is a reverse-chronological constraint on chunks so once you insert some messages, you can only insert older ones after that. tldr; Insert from your most recent history -> oldest history.Why?
depthis not re-calculated when historical messages are inserted into the DAG. This means we have to take care to insert in the right order. Events are sorted by(topological_ordering, stream_ordering)wheretopological_orderingis justdepth. Normally,stream_orderingis an auto incrementing integer but forbackfilled=trueevents, it decrements. Historical messages are inserted all at the same depth, and marked as backfilled so thestream_orderingdecrements and each event is sorted behind the next. (from #9247 (comment))If you're curious to look at a known working example, the Complement tests have barebones test cases interacting with this API, matrix-org/complement#68
Steps to reproduce:
homeserver.yaml, add the feature flag to enable the/batchsendendpointhomeserver.yaml. This could be one of your existing bridges. See the application service guide for an example of what the registration file would look like. We only care about theas_tokenin this case.POST /_matrix/client/unstable/org.matrix.msc2716/rooms/<roomID>/batch_send?prev_event=<eventID>&chunk_id=<chunkID>with theAuthorization: Bearer <as_token>header and the followingbody:-
prev_eventis the event you want to insert next to. Your historical messages will appear after this event so pick one where the timestamp makes sense. To be a little more idiomatic for inserting historical events that happened before the Matrix room creation,prev_eventcould be some primordial creation event for the room.-
chunk_idcomes fromnext_chunk_idin the previous batch send response. It connects the last (most recent) message to the insertion event of the previous chunk. The parameter is not needed for your first chunk because there is nothing to connect to yet. Note: The messages will appear correctly on your local server without it but it's important to have this set for federated servers so messages backfill correctly.- You can change
origin_server_tsin your events to whatever you want to display it as- Add a join event in
state_events_at_startfor any message author in theevents-
m.historicalwill automatically be added to each yourevents. This is important to mark them asbackfilled, sort correctly, and skip the push notifications actions.{ "events": [{ "type": "m.room.message", "sender": "@maria:hs1", "origin_server_ts": 1620336731128, "content": { "msgtype": "m.text", "body": "Some historical message", "m.historical": true, }, }], "state_events_at_start": [{ "type": "m.room.member", "sender": "@maria:hs1", "origin_server_ts": 1620336731128, "content": { "membership": "join", }, "state_key": "@maria:hs1", }] }Dev notes
API:
PUT /_matrix/client/r0/rooms/{roomId}/send/{eventType}/{txnId}GET /_matrix/client/r0/rooms/{roomId}/messagesRelevant code:
scripts-dev/complement.shsynapse/rest/client/v1/room.py#L218->/rooms/$roomid/send/$event_type[/$txn_id]]PATTERNS = "/rooms/(?P<room_id>[^/]*)/send/(?P<event_type>[^/]*)"synapse/handlers/message.py#L683->event_creation_handler.create_and_send_nonmember_eventtsquery parameter to override theorigin_server_tsif the request is coming from an app service,synapse/rest/client/v1/room.py#L233-L234(via add ?ts massaging for ASes #2754)synapse/federation/transport/server.py#L372Call stacks for passing
depth,prev_event_ids,auth_event_ids:Accessing the database
Access the sqlite database in the Docker container after Complement runs. Be sure to change the
defer deployment.Destroy(t)call in the Complement tests to adefer time.Sleep(2 * time.Hour)so the Docker container stays alive after it's finished.Event signing
test-event-signing.py(v1)test-event-signing.pytest-event-signing.py(v2)test-event-signing.pyTodo
tsquery param to overrideorigin_server_tsprev_eventquery paramdepthm.historicalevent fieldPull Request Checklist
EventStoretoEventWorkerStore.".code blocks.Pull request includes a sign off