packets: discard connection on zero sequence ID#1471
packets: discard connection on zero sequence ID#1471owbone wants to merge 1 commit intogo-sql-driver:masterfrom
Conversation
MySQL 8 introduced some error packets that can be sent to otherwise idle connections, notably `ER_CLIENT_INTERACTION_TIMEOUT` and `ER_DISCONNECTING_REMAINING_CLIENTS`. In rare circumstances, or in extremely busy applications, these packets can be sent by the server at the same instant that the client sends a command. This results in the client receiving a packet with a zero sequence ID when trying to read the result of a command. Currently, when the client receives a sequence ID that it doesn't expect then it returns a `ErrPktSync` error and leaves the connection open, since it normally indicates that the user is trying to run a command at the wrong time (e.g. before having finished reading the results of a previous query) rather than the connection having gone bad. However, in the case of these errors, the connection has gone bad, since we expect that the server has closed the connection and there is unread data left in the buffer. Since a connection in this state is not closed then it gets returned to the pool, and when it gets picked up from the pool then many commands will fail with `ErrBusyBuffer` (because the buffer still contains the unread error packet), and it will return to the pool again. This can turn a single error packet into thousands and thousands of errors, as the connections are reused again and again until they eventually timeout. To add some perspective, GitHub recently encountered 146 connections which failed with `ErrPktSync` due to a replica forcibly disconnecting clients (`ER_DISCONNECTING_REMAINING_CLIENTS`). These 146 connections resulted in around 1.3 million request failures with `ErrBusyBuffer` before they timed out within 30 seconds. Ideally, the client should be able to read these error packets and handle them properly, but doing so is a larger project. This change instead makes a relatively small and uninvasive fix to add special handling for zero sequence IDs. With this, when we receive a packet with a zero sequence ID then we always assume that the server has sent a terminal error packet, close the connection, and return `ErrInvalidConn`.
I am not sure that this is safe assumption. |
You're right. The assumption breaks if the sequence ID wraps around. I have two other options in mind:
|
Not only it. MySQL protocol is used by not only MySQL.
I think we should not reuse such connection even if we read the packet payload. |
Ok, I'll open a separate PR for the first option then. |
Description
MySQL 8 introduced some error packets that can be sent to otherwise idle connections, notably
ER_CLIENT_INTERACTION_TIMEOUTandER_DISCONNECTING_REMAINING_CLIENTS. In rare circumstances, or in extremely busy applications, these packets can be sent by the server at the same instant that the client sends a command. This results in the client receiving a packet with a zero sequence ID when trying to read the result of a command.Currently, when the client receives a sequence ID that it doesn't expect then it returns a
ErrPktSyncerror and leaves the connection open, since it normally indicates that the user is trying to run a command at the wrong time (e.g. before having finished reading the results of a previous query) rather than the connection having gone bad.However, in the case of these errors, the connection has gone bad, since we expect that the server has closed the connection and there is unread data left in the buffer. Since a connection in this state is not closed then it gets returned to the pool, and when it gets picked up from the pool then many commands will fail with
ErrBusyBuffer(because the buffer still contains the unread error packet), and it will return to the pool again. This can turn a single error packet into thousands and thousands of errors, as the connections are reused again and again until they eventually timeout.To add some perspective, GitHub recently encountered 146 connections which failed with
ErrPktSyncdue to a replica forcibly disconnecting clients (ER_DISCONNECTING_REMAINING_CLIENTS). These 146 connections resulted in around 1.3 million request failures withErrBusyBufferbefore they timed out within 30 seconds.Ideally, the client should be able to read these error packets and handle them properly, but doing so is a larger project. This change instead makes a relatively small and uninvasive fix to add special handling for zero sequence IDs. With this, when we receive a packet with a zero sequence ID then we always assume that the server has sent a terminal error packet, close the connection, and return
ErrInvalidConn.Closes #1394.