fix: Handle max concurrent queries errors#20907
Conversation
|
🚀 ClickHouse Cloud UI deployed to Vercel: |
6778728 to
56298c6
Compare
56298c6 to
10badf5
Compare
|
|
||
| have := have.Get(want.Name) | ||
| if have == nil { | ||
| tableName := want.Name |
There was a problem hiding this comment.
This is the part about re-using the tables changes state
| } | ||
|
|
||
| func (c *Client) createTable(ctx context.Context, table *schema.Table, partition []spec.PartitionStrategy, orderBy []spec.OrderByStrategy) (err error) { | ||
| c.logger.Debug().Str("table", table.Name).Msg("Table doesn't exist, creating") |
There was a problem hiding this comment.
This was confusing as we called dropTable then createTable when doing force migration, so even if the table existed this log was printed
marianogappa
left a comment
There was a problem hiding this comment.
Congrats on finding and fixing this bug 👍
My only concern here is the retry logic.
Obviously, without more context there's no smartness possible to the retry, like we do on InsertSplitter, so we agree that it's not gonna get a lot smarter than "try running the same query again later".
However, retrying up to 10 times, 1s +/- .5s later in the case of "Too many simultaneous queries" sounds 💣 🤔 no? I think this will by default increase congestion. My proposed remediation isn't very smart...I'd 10x the delay or something like that.
The default strategy is backoff + random jitter, but I don't mind increasing the retry delay. Based on my tests with the current values and 2000 concurrent syncs (from the test) we hardly every retry more than twice |
| retry.Attempts(5), | ||
| retry.Delay(3 * time.Second), | ||
| retry.MaxJitter(1 * time.Second), |
There was a problem hiding this comment.
Would it make sense to expose these variables in the spec? Not blocking, just thinking out loud.
There was a problem hiding this comment.
Not sure actually I think we can expose based on user feedback. Could be too much control. The default retry logic is backoff + random jitter so I think that should cover quite a lot of cases (especially on newer versions)
maaarcelino
left a comment
There was a problem hiding this comment.
LGTM, one minor non-blocking suggestion
🤖 I have created a release *beep* *boop* --- ## [7.1.1](plugins-destination-clickhouse-v7.1.0...plugins-destination-clickhouse-v7.1.1) (2025-07-02) ### Bug Fixes * **deps:** Update golang.org/x/exp digest to b7579e2 ([#20935](#20935)) ([aac340d](aac340d)) * **deps:** Update module github.com/cloudquery/codegen to v0.3.29 ([#20947](#20947)) ([af179be](af179be)) * Handle max concurrent queries errors ([#20907](#20907)) ([92c2827](92c2827)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
| defer rows.Close() | ||
|
|
||
| row := rowArr(rows.ColumnTypes()) | ||
| builder := array.NewRecordBuilder(memory.DefaultAllocator, table.ToArrowSchema()) |
There was a problem hiding this comment.
you can create record builder once and then reset it before scanning in the retried function
Summary
This PR fixes an issue that's more evident on older ClickHouse versions (e.g. 22 that we don't support anymore) where
max_concurrent_queriesdefaulted to100. On newer versions the default is 1500.The result of hitting the limit is that some tables can be dropped unexpectedly, since
checkPartitionOrOrderByChangedcalled fromcheckForcedherecloudquery/plugins/destination/clickhouse/client/migrate.go
Line 31 in a69aa8c
checkPartitionOrOrderByChangedcan return an error when called further down the line herecloudquery/plugins/destination/clickhouse/client/migrate.go
Line 149 in a69aa8c
Had to override the
max_concurrent_queriesconfiguration so using the docker compose file for that.The solution implemented in this PR has 2 parts:
connection.Execrow.Scanand so on, so I'd basically need to wrap all ClickHouse's driver interfaces.