Test homeserver restarts during a partial state join#378
Test homeserver restarts during a partial state join#378
Conversation
…serve requests Signed-off-by: Sean Quah <seanq@matrix.org>
…hey will change on restart Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
| HostIP: "127.0.0.1", | ||
| HostPort: strconv.Itoa(port1), | ||
| }, | ||
| }, | ||
| nat.Port("8448/tcp"): []nat.PortBinding{ | ||
| { | ||
| HostIP: "127.0.0.1", | ||
| HostIP: "127.0.0.1", | ||
| HostPort: strconv.Itoa(port2), |
There was a problem hiding this comment.
We must provide an explicit port, otherwise we'll get a different random port when the container is restarted. Which breaks everything.
There was a problem hiding this comment.
BaseURL and FedBaseURL of HomeserverDeployment become incorrect because they include the port number. We can fix up those URLs. But any CSAPI clients will continue to use the previous port number after the restart and that's not so easy to fix up.
There was a problem hiding this comment.
That seems like a reasonably easy thing to fix? There's several possible solutions:
- Add documentation to
Restart()which explicitly states that any clients created with this deployment need to be recreated. Or... - Add a callback hook to
client.CSAPIto update the base URL and get test authors to call that callback with the updated URL. Or... - Remember all
client.CSAPIinstances created viafunc (d *Deployment) Client(t *testing.T, hsName, userID string) *client.CSAPIand automatically call the callback hook when the deployment is restarted to re-point port numbers.
The last option is the most preferable because:
- All the complexity is in one place and does not need to be repeated in tests.
- Test authors don't need to know or care that the port number changes from under them when they call
deployment.Restart().
In addition, I would change the function signature of restart from:
func (dep *Deployment) Restart() errorto
func (dep *Deployment) Restart(t *testing.T)which will fail the test if the restart fails (returns an error).
At present, this PR cannot be accepted because it will introduce flakiness because these port numbers will race: we have to let the underlying OS decide the ports.
There was a problem hiding this comment.
I've split the PR and made the changes in #396. Let me know if I've missed anything.
| }() | ||
|
|
||
| // wait for the state_ids request to arrive | ||
| psjResult.AwaitStateIdsRequest(t) |
There was a problem hiding this comment.
AwaitStateIdsRequest cannot be used twice.
There was a problem hiding this comment.
I'm tempted to drop the second usage rather than try to rejig things.
…ns_resume_state_resyncing_on_restart
|
@matrix-org/dendrite-core The first three commits touch some core complement machinery and might be worth taking a look at? |
|
@erikjohnston I made one small change: c0caa5e |
kegsay
left a comment
There was a problem hiding this comment.
My main problem with this lies with the usage of an explicit port. We cannot guarantee this as:
- Tests in the same file can run in parallel via
.Parallel() - Tests in different directories can run in parallel (modifiable via
go test -p N)
We need to preserve that ability to ensure we can grow the test case library without suffering increasingly long test times, so we cannot add features which would compromise that. I've asked for a bit more information than just "which breaks everything".
| HostIP: "127.0.0.1", | ||
| HostPort: strconv.Itoa(port1), | ||
| }, | ||
| }, | ||
| nat.Port("8448/tcp"): []nat.PortBinding{ | ||
| { | ||
| HostIP: "127.0.0.1", | ||
| HostIP: "127.0.0.1", | ||
| HostPort: strconv.Itoa(port2), |
|
|
||
| // Picks two free ports on localhost. Does not reserve them in any way. | ||
| // The returned ports must be used before the next call to `allocateHostPorts`, | ||
| // otherwise the same pair of ports may be returned. |
There was a problem hiding this comment.
We cannot guarantee that as homeservers are deployed in parallel.
There was a problem hiding this comment.
That's a good point, I didn't realise that we ran tests in parallel at the time.
internal/docker/deployer.go
Outdated
| // The returned ports must be used before the next call to `allocateHostPorts`, | ||
| // otherwise the same pair of ports may be returned. | ||
| func allocateHostPorts() (int, int, error) { | ||
| localhost_any_port := net.TCPAddr{ |
There was a problem hiding this comment.
no snake_case please. Be consistent with existing code.
Tests matrix-org/synapse#12642