Add some extra closing handshake tests#1906
Add some extra closing handshake tests#1906pimterry wants to merge 4 commits intowebsockets:simplify/closing-handshakefrom
Conversation
- When the socket emits the `'end'` event, do not call `socket.end()`. End only the `receiver` stream. - Do not wait for a close frame to be received and sent before calling `socket.end()`. Call it right after the close frame is sent. - When the `receiver` stream emits `'finish'`, send a close frame if no close frame is received. The assumption is that the socket is allowed to be half-open. On the server side this is always true (unless the user explicitly sets the `allowHalfOpen` property of the socket to `false`). On the client side the user might use an agent so we set `socket.allowHalfOpen` to `true` when the `http.ClientRequest` object emits the `'upgrade'` event. Refs: websockets#1899
|
I was testing in Node 14 locally - works great there but it turns out there are test failures in 8, 10 & 12. I'll look into those now. |
If we're already closing, then we shouldn't move into the new -2 ready state (i.e. closing-because-the-remote-end-closed). We're already closing, we shouldn't go back to a state that allows sending another close frame. This didn't seem to affect the happy path, or v14, or the existing tests, but in the new error test on v10 and similar it resulted in a double close(), and a write-after-end error on the socket itself.
04dca84 to
104c4d3
Compare
|
Ok, I think I've fixed this. This was an actual bug in the PR that the tests found, affecting older node versions only. In Node older than v14, after a receiver error, the receiver still fires a 'finish' event when the socket itself ends. In Node v14+ it seems that this doesn't happen. No idea why, but this causes the bug. In all cases: when the receiver hits the error, we close() the websocket with an error code, we Unfortunately later when socket actually ends, we set _readyState back to -2 (the new 'PRECLOSING' state where we're still allowed to send a close frame) even though we've already sent a close frame. That wouldn't normally break anything, and it doesn't with Node v14+. With older node though, we get an extra 'finish' event. When that fires it tries to close the websocket, and it's allowed to close it again, even though we've already closed it, because the readyState is no longer CLOSING and it's not yet CLOSED. This double-close then throws write-after-end on the socket. This is fixed here by not setting _readyState to -2 unless the connection is currently OPEN, so you can't go from CLOSING back to -2 again. Does that seem reasonable? |
| // `WebSocket.prototype.close()` from sending a close frame. | ||
| // | ||
| websocket._readyState = -2; | ||
| if (websocket._readyState === WebSocket.OPEN) { |
There was a problem hiding this comment.
Why not removing the receiverOnFinish listener from receiverOnError() instead?
Nvm. I think that is not sufficient because the issue could be triggered even on Node.js >= 14 and even if the receiver does not emit an 'error' event. For example if the user calls websocket.close() right after a 2 -> -2 readyState change.
|
Good catch. This confirms your finding const { Writable } = require('stream');
const writable = new Writable({
write(chunk, encoding, callback) {
process.nextTick(function () {
callback(new Error('Oops'));
});
}
});
writable.on('error', function () {
writable.end();
});
writable.on('finish', function () {
throw new Error("The 'finish' event was emitted after the 'error' event");
});
writable.write('foo');It seems it was fixed by nodejs/node@ba3be578d8 included in Node.js 13. |
|
I've cherry picked the fix into the simplify/closing-handshake branch (ef7c3c7). Will look at tests later. |
|
I didn't notice before, but the Mac & Windows tests were still running, not passing. They've since timed out, because they were actually failing & stuck in that same parsing error test too. In their case the failure is different, and I think it's acceptable behaviour, so I've updated the test. The cause is that on Windows & Mac, unlike Linux, during this error shutdown the event order is:
Meanwhile on Linux, I see:
I.e. the order of server/client 'close' events is different. I suspect on Windows & Mac the 'close' event fires synchronously if the other end has already ended, while on Linux it's async, or there's some other step involved. The test expected the client to always fire 'close' first because I tested it on Linux, but actually I don't think it really matters. Either way the test checks that both sides have sent close frames and closed their sockets without error, so I think it's fine either way. Does that sound right to you? For now I've updated the test to allow 'close' to fire in either order, as long as both sides get their close frames with no errors, and it seems to work nicely now. |
|
|
||
| ws.on('open', () => { | ||
| // Add 200ms delay to compression: | ||
| makeDeflateSlow(ws, 200); |
There was a problem hiding this comment.
Can we avoid using timers? We can push the data directly to the socket without relying on the other peer, similarly to how it is done here
ws/test/create-websocket-stream.test.js
Line 445 in 145480a
sender._deflating === true.
| }); | ||
| }); | ||
|
|
||
| it('handles connection closures over both half-open-allowed sockets', (done) => { |
There was a problem hiding this comment.
I think this and the next tests are not very useful because no data is buffered/compressed/decompressed. They are not testing edge cases.
Yes, I did the same in #1902. |
d5b758f to
e350b86
Compare
|
|
||
| wss.on('connection', (ws) => { | ||
| // Server closes after 100ms, before the client finishes sending its compressed message: | ||
| setTimeout(() => ws.close(1000), 100); |
There was a problem hiding this comment.
Also, this behaves differently depending on where the test is run:
- On the master branch this just calls
socket.write(closeFrame); - On this branch it calls
socket.write(closeFrame)andsocket.end().
The test should be independent from websocket.close() behavior. Should we test case 1, 2, or both?
|
Great! Sorry I haven't had time to work through the comments above myself. I think that's fine though, these tests were mostly useful as a way to explore how the old & new behaviour differ and confirm that we've fixed the original issues. If you've pulled in the useful ones then we can close this now. |
This is an attempt to test some of the behaviour discussed in #1899 (comment) and test how that's improved with the changes currently pending in #1902.
The tests are a bit rough in places, feel free to tweak or drop some of them if you think they aren't relevant. When run against master, 2 of these tests currently fail:
Both of these are current bugs on master: simultaneous errors (or anything else that means the remote peer doesn't respond) will break clean connection shutdown so the socket stays fully open until timeout, and also a remote
ws.end()that starts socket shutdown immediately drops all pending outgoing data, incorrectly & unnecessarily.Various tests here fail in interesting ways given the partial fixes I suggested in that PR thread, basically as predicted there. When run against the new PR with the full set of changes though, all the tests do happily pass, which is great!
This tries to cover the shutdown for the various half-open combinations too, by directly simulating no-half-open behaviour rather than using allowHalfOpen=false, so that it's still effective even when the new PR resets that. It's not a super rigorous real-world test by any means, and it doesn't test various possible data loss cases for each combination, but from this alone it seems like the basic behaviour is working great 👍