Skip to content

Add QueryExecMode to CopyFrom#1412

Merged
jackc merged 1 commit intojackc:masterfrom
alejandrodnm:adn/copy-cache
Dec 23, 2022
Merged

Add QueryExecMode to CopyFrom#1412
jackc merged 1 commit intojackc:masterfrom
alejandrodnm:adn/copy-cache

Conversation

@alejandrodnm
Copy link
Copy Markdown
Contributor

@alejandrodnm alejandrodnm commented Dec 5, 2022

Use DefaultQueryExecMode in CopyFrom

CopyFrom had to create a prepared statement to get the OIDs of the data
types that were going to be copied into the table. Every COPY operation
required an extra round trips to retrieve the type information. There
was no way to customize this behavior.

By leveraging the QueryExecMode feature, like in Conn.Query, users can
specify if they want to cache the prepared statements, execute
them on every request (like the old behavior), or bypass the prepared
statement relying on the pgtype.Map to get the type information.

The QueryExecMode behave exactly like in Conn.Query in the way the
data type OIDs are fetched, meaning that:

  • QueryExecModeCacheStatement: caches the statement.
  • QueryExecModeCacheDescribe: caches the statement and assumes they do
    not change.
  • QueryExecModeDescribeExec: gets the statement description on every
    execution. This is like to the old behavior of CopyFrom.
  • QueryExecModeExec and QueryExecModeSimpleProtocol: maintain the
    same behavior as before, which is the same as QueryExecModeDescribeExec.
    It will keep getting the statement description on every execution

The QueryExecMode can only be set via
ConnConfig.DefaultQueryExecMode, unlike Conn.Query there's no
support for specifying the QueryExecMode via optional arguments
in the function signature.

@cristaloleg
Copy link
Copy Markdown
Contributor

Hi, thanks for the PR, are there any performance improvements in benchmarks?

@jackc
Copy link
Copy Markdown
Owner

jackc commented Dec 6, 2022

I like the direction this is going, but I am concerned about the change in the CopyFrom() signature. It won't directly break any calling code, but it does break any interfaces defined with Conn, Tx, etc. in mind. e.g.

type DB interface {
	Query(ctx context.Context, sql string, optionsAndArgs ...interface{}) (pgx.Rows, error)
	QueryRow(ctx context.Context, sql string, optionsAndArgs ...interface{}) pgx.Row
	CopyFrom(ctx context.Context, tableName pgx.Identifier, columnNames []string, rowSrc pgx.CopyFromSource) (int64, error)
}

An application or library using interface DB would break.

Would it be sufficient to use the default query exec mode without the ability to override? AFAIK, overriding the mode per query is extremely rare. I only included it on Query, QueryRow, and Exec for completeness and testing.

If it really needs to be controlled per call, perhaps a new optional interface could be created that rowSrc could implement. This interface could have a method to get the query exec mode. A wrapper struct could be created that wrapped the original CopyFromSource and added the new method.

@alejandrodnm
Copy link
Copy Markdown
Contributor Author

@jackc I really need to override it to avoid the the prepare statement and the cache 😅

We are using pgx in timescale/promscale to ingest metrics and traces, we insert the data using COPY. Basically we have a table per metric, and our COPY operations are always of the same 3 columns.

We want to get rid of the extra roundtrip per copy. In our use case the cache would be a lot of unnecessary memory since we are always sending (time, int64 and float64), which the Type map resolves correctly.

If it really needs to be controlled per call, perhaps a new optional interface could be created that rowSrc could implement. This interface could have a method to get the query exec mode. A wrapper struct could be created that wrapped the original CopyFromSource and added the new method.

I'll try to give ☝️ a shot.

@alejandrodnm
Copy link
Copy Markdown
Contributor Author

alejandrodnm commented Dec 7, 2022

I'm going to try our use case with allowing setting the Exec mode only via ConnConfig.DefaultQueryExecMode.

I reverted the signature changes to CopyFrom.

This is the benchmark comparison. One thing, this is running against a DB listening on localhost, meaning that this does not account for the extra latency of having to go to connect to an external instance:

λ ➜ benchstat no_cache.txt with_cache.txt
name                      old time/op  new time/op  delta
Write5RowsViaCopy-10       152µs ± 4%   122µs ± 3%  -19.56%  (p=0.008 n=5+5)
Write10RowsViaCopy-10      165µs ± 5%   134µs ± 1%  -18.65%  (p=0.008 n=5+5)
Write100RowsViaCopy-10     392µs ± 3%   379µs ± 7%     ~     (p=0.310 n=5+5)
Write1000RowsViaCopy-10   3.53ms ±12%  3.43ms ± 8%     ~     (p=0.841 n=5+5)
Write10000RowsViaCopy-10  14.9ms ± 3%  14.6ms ± 3%     ~     (p=0.222 n=5+5)


func TestConnCopyWithKnownOIDQueryExecModes(t *testing.T) {

for _, mode := range pgxtest.KnownOIDQueryExecModes {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests is only for the pgxtest.KnownOIDQueryExecModes because we can't resolved between varchar and text, or between date and timestampz, with the exec modes that rely on the pgtype.Map

@alejandrodnm alejandrodnm requested a review from jackc December 12, 2022 13:48
@alejandrodnm alejandrodnm force-pushed the adn/copy-cache branch 3 times, most recently from fd2d502 to ed05619 Compare December 21, 2022 15:09
CopyFrom had to create a prepared statement to get the OIDs of the data
types that were going to be copied into the table. Every COPY operation
required an extra round trips to retrieve the type information. There
was no way to customize this behavior.

By leveraging the QueryExecMode feature, like in `Conn.Query`, users can
specify if they want to cache the prepared statements, execute
them on every request (like the old behavior), or bypass the prepared
statement relying on the pgtype.Map to get the type information.

The `QueryExecMode` behave exactly like in `Conn.Query` in the way the
data type OIDs are fetched, meaning that:

- `QueryExecModeCacheStatement`: caches the statement.
- `QueryExecModeCacheDescribe`: caches the statement and assumes they do
  not change.
- `QueryExecModeDescribeExec`: gets the statement description on every
  execution. This is like to the old behavior of `CopyFrom`.
- `QueryExecModeExec` and `QueryExecModeSimpleProtocol`: maintain the
  same behavior as before, which is the same as `QueryExecModeDescribeExec`.
  It will keep getting the statement description on every execution

The `QueryExecMode` can only be set via
`ConnConfig.DefaultQueryExecMode`, unlike `Conn.Query` there's no
support for specifying the `QueryExecMode` via optional arguments
in the function signature.
Comment on lines +111 to +119
case QueryExecModeExec, QueryExecModeSimpleProtocol:
// These modes don't support the binary format. Before the inclusion of the
// QueryExecModes, Conn.Prepare was called on every COPY operation to get
// the OIDs. These prepared statements were not cached.
//
// Since that's the same behavior provided by QueryExecModeDescribeExec,
// we'll default to that mode.
ct.mode = QueryExecModeDescribeExec
fallthrough
Copy link
Copy Markdown
Contributor Author

@alejandrodnm alejandrodnm Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackc I updated the PR, I'd say this is the most contentious part. Is this ok? or do you want me to just call Prepare directly here?

@jackc jackc merged commit c4ac6d8 into jackc:master Dec 23, 2022
@jackc
Copy link
Copy Markdown
Owner

jackc commented Dec 23, 2022

LGTM! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants