Conversation
`testReceiveEventDuringPartialStateJoin()` sends an event over federation, checks that a client can see it and checks the state at the event.
…isisng_events` requests
There was a problem hiding this comment.
We do this because interface{}(nil) is not equal to []string(nil)
There was a problem hiding this comment.
This part tests matrix-org/synapse#13002 . If Synapse does not correctly flag that it does not have partial state in the CanReceiveEventsWithMissingPrevWithHalfMissingPrevsDuringPartialStateJoin test case, it will erroneously return state (which was derived via state res and only happens to be correct) instead of blocking.
There was a problem hiding this comment.
Suggestions for better names are welcome.
There was a problem hiding this comment.
maybe "can receive events with some missing grandparents" or something?
… missing and one known prev event
6b43730 to
5e38528
Compare
richvdh
left a comment
There was a problem hiding this comment.
looks generally good, a few suggestions.
| // derek sends an event in the room | ||
| event := psjResult.CreateMessageEvent(t, "derek", nil) | ||
| psjResult.Server.MustSendTransaction(t, deployment, "hs1", []json.RawMessage{event.JSON()}, nil) | ||
| t.Logf("Derek sent event event ID %s", event.EventID()) |
| // fire off a /state_ids request for the last event. | ||
| // it must either: | ||
| // * block because the homeserver does not have full state at the last event | ||
| // * or 403 because the homeserver does not have full state yet and does not consider the | ||
| // Complement homeserver to be in the room |
There was a problem hiding this comment.
shouldn't we know which, and test for that specifically?
There was a problem hiding this comment.
Right now only the last test case blocks instead of 403s. I'm somewhat reluctant to be more specific in the tests, because it feels like an implementation detail of synapse (which will change when we get around to matrix-org/synapse#13288).
If you're really keen on being more specific, I'm happy to add a boolean parameter or suchlike. (It's not easy to move this part of the test out, since it needs to happen after the /event request above has succeeded)
There was a problem hiding this comment.
ah right. tbh I don't think either blocking or returning a 403 is correct behaviour, though I'm not quite sure what is.
It would be very useful to add a link to #13288 and say that this is somewhat temporary until we solve it properly.
There was a problem hiding this comment.
hm, is there a better way to test whether synapse thinks an event has partial state?
this part of the test was intended to check that _resolve_state_at_missing_prevs calculates the partial state flag correctly and it sounds like it's not a good idea to rely on /state_ids
There was a problem hiding this comment.
none that I can immediately think of. The ideal thing would be to check that after the resync, /state_ids returns some result that would have been impossible without the information conveyed during the resync - but that sounds rather fiddly for a test (and, probably, a hard-to-understand test failure if it later turns out not to be true).
| select { | ||
| case <-time.After(1 * time.Second): | ||
| t.Logf("/state_ids request for event %s blocked as expected", event.EventID()) | ||
| defer func() { <-stateIdsResponseChan }() |
There was a problem hiding this comment.
when the /state_ids request blocks, the send of the response happens some time later, perhaps even after the test ends.
If we don't read from the channel, the goroutine above blocks on sending into the channel and complement gets very unhappy 10 minutes later because it's still not finished.
There was a problem hiding this comment.
right, so we're making sure we read from the channel so that the goroutine doesn't block. Can you add a comment to that effect?
| if err := psjResult.Server.SendFederationRequest(deployment, stateReq, &respStateIDs); err != nil { | ||
| httpErr, ok := err.(gomatrix.HTTPError) | ||
| t.Logf("%v", httpErr) | ||
| if ok && httpErr.Code == 403 { | ||
| stateIdsResponseChan <- nil | ||
| return | ||
| } | ||
| t.Errorf("/state_ids request returned non-200: %s", err) | ||
| return | ||
| } | ||
| stateIdsResponseChan <- &respStateIDs |
There was a problem hiding this comment.
rather than matching for the 403 here and sending a nil magic value down the channel, I suggest you define
type stateIDsResult struct {
RespStateIDs gomatrixserverlib.RespStateIDs
Error error
}... and then send that back down the channel, and do the matching in the main thread.
| result.ServerRoom, | ||
| result.fedStateIdsRequestReceivedWaiter, | ||
| result.fedStateIdsSendResponseWaiter, | ||
| result.fedStateIdsAllowedEvents, |
There was a problem hiding this comment.
That looks a lot cleaner. I've borrowed that commit and it works great. Do we want to split that change into a separate PR?
There was a problem hiding this comment.
maybe "can receive events with some missing grandparents" or something?
| t.Logf("Derek sent event A with ID %s", eventA.EventID()) | ||
| t.Logf("Derek sent event B with ID %s", eventB.EventID()) | ||
| t.Logf("Derek sent event C with ID %s", eventC.EventID()) |
There was a problem hiding this comment.
s/sent/created/. He hasn't sent them to hs1 yet.
… to complete" This reverts commit 2cbeaae.
I'm going to add a test which will involve *multiple* /state and /state_ids requests, so we need to make the registered handlers more selective.
| // read from the channel and discard the result, so that the goroutine above doesn't block | ||
| // indefinitely on the send |
There was a problem hiding this comment.
(I'm a bit surprised that closing the channel isn't sufficient for this, but fair enough)
There was a problem hiding this comment.
We get a panic: send on closed channel without this. To clarify:
no close and no read -> complement times out after 10 minutes
defer close + no read -> panic
defer close + defer read -> complement is happy
There was a problem hiding this comment.
I don't really see why this is done in a goroutine at all - is this because SendFederationRequest can block indefinitely? There is a default 30s timeout - if that is too long then you need to send a custom context with a timeout here -
complement/internal/federation/server.go
Line 227 in d842670
context.Background() so it won't ever expire, but you can change that easily by doing:
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
httpClient.DoRequestAndParseResponse(ctx, httpReq, resBody)This will require a change to Complement's API to:
func (s *Server) SendFederationRequest(ctx context.Context, deployment *docker.Deployment, req gomatrixserverlib.FederationRequest, resBody interface{}) error {but that's fine.
There was a problem hiding this comment.
Thanks for the suggestion, that's a lot cleaner!
…ns_calculate_partial_state_flag_for_backfill
Before we merge, is there a better way to check that Synapse has calculated the partial state flag for an event correctly? |
| if time.Since(start) > time.Second { | ||
| t.Fatalf("timeout waiting for received event to be visible") | ||
| } | ||
| res := alice.DoFunc(t, "GET", []string{"_matrix", "client", "r0", "rooms", psjResult.ServerRoom.RoomID, "event", event.EventID()}) |
There was a problem hiding this comment.
You can now use WithRetryUntil in your DoFunc so you can remove the for loop etc. It would look a bit like:
res := alice.DoFunc(t, "GET", []string{"_matrix", "client", "r0", "rooms", psjResult.ServerRoom.RoomID, "event", event.EventID()},
client.WithRetryUntil(time.Second, func(res *http.Response) bool {
if res.StatusCode == 200 {
return true
}
eventResBody := client.ParseJSON(t, res)
if res.StatusCode == 404 && gjson.GetBytes(eventResBody, "errcode").String() == "M_NOT_FOUND" {
return false
}
t.Fatalf("GET /event failed with %d: %s", res.StatusCode, string(eventResBody))
return false
})
kegsay
left a comment
There was a problem hiding this comment.
Use client.WithRetryUntil and add a context to SendFederationRequest and you should be able to remove a fair chunk of boilerplate here, and help other test authors.
.vscode/settings.json
Outdated
| { | ||
| "go.testEnvVars": { | ||
| "COMPLEMENT_BASE_IMAGE": "complement-synapse" | ||
| } | ||
| } |
There was a problem hiding this comment.
not sure if we should be commiting this?
There was a problem hiding this comment.
Ideally not, as it will make Dendrite people sad.
There was a problem hiding this comment.
Good spot! I must have added it by accident while merging in main.
|
Thank you both for the reviews! |
Add tests for three cases where a homeserver receives a latest event with missing prev events while it is still resyncing state after a partial join.
Namely:
and
where event M is the last event the homeserver knows about and the rightmost event is the one it receives.
The last case tests the code path mentioned in matrix-org/synapse#13002
Best reviewed commit by commit.