Skip to content

copy: optimize I/O for small data segments #93156

@cucaroach

Description

@cucaroach

The pg protocol allows any sizing for the CopyData messages that make up a copy. psql sends many small message (1 per line) and pgconn does something smarter and sends the data in 64kb segments. Because of some inefficient buffering and some extraneous data copies ([]byte->string->[]byte conversions) psql is slower. We should optimize COPY for the small buffer case and arrange for data to be copied straight from the underlying pgwire conn.rd (bufio.Reader) to the COPY machines buffer. The pgwire conn.rd buffer defaults to the goruntime default of 4k, we should probably bump that up to something beefier like 64kb.

Another data point is that apparently the AWS DMS service uses the tiny buffer approach:

W2e21206 20:24:39.743311 22794 sql/copy.go:448 ⋮ [n1,client=35.174.169.2:39170,user=‹dms›] 80984  copy processing 41 bytes

If I change the pgconn buffer from 64k to 50 bytes I get this performance difference, 3.92 MB/s vs 2.56 MB/s so we're definitely loosing performance when dealing with little messages.

Jira issue: CRDB-22191

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-sql-queriesSQL Queries Team

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions