-
Notifications
You must be signed in to change notification settings - Fork 4.1k
copy: optimize I/O for small data segments #93156
Description
The pg protocol allows any sizing for the CopyData messages that make up a copy. psql sends many small message (1 per line) and pgconn does something smarter and sends the data in 64kb segments. Because of some inefficient buffering and some extraneous data copies ([]byte->string->[]byte conversions) psql is slower. We should optimize COPY for the small buffer case and arrange for data to be copied straight from the underlying pgwire conn.rd (bufio.Reader) to the COPY machines buffer. The pgwire conn.rd buffer defaults to the goruntime default of 4k, we should probably bump that up to something beefier like 64kb.
Another data point is that apparently the AWS DMS service uses the tiny buffer approach:
W2e21206 20:24:39.743311 22794 sql/copy.go:448 ⋮ [n1,client=35.174.169.2:39170,user=‹dms›] 80984 copy processing 41 bytes
If I change the pgconn buffer from 64k to 50 bytes I get this performance difference, 3.92 MB/s vs 2.56 MB/s so we're definitely loosing performance when dealing with little messages.
Jira issue: CRDB-22191
Metadata
Metadata
Assignees
Labels
Type
Projects
Status