-
Notifications
You must be signed in to change notification settings - Fork 549
bug: mysql destination does not actually bulk write to the database #19122
Description
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
When syncing thousands of rows into a table, it is noticeably slow. On a MySQL server, you can execute show full processlist; and get a snapshot of queries running at that moment in time. Using this query, you can monitor the types of INSERT queries being run by CloudQuery. Using this, you can confirm that the plugin is only inserting one row at a time and not bulk inserting even though the plugin claims to bulk write.
| BatchSize int64 `json:"batch_size,omitempty" jsonschema:"minimum=1,default=1000"` |
Here is an example query being run by the MySQL plugin: (the tables are from a plugin i am working on)
INSERT INTO `jamf_computer_inventory_services` (`_cq_sync_time`, `_cq_source_name`, `_cq_id`, `_cq_parent_id`, `computer_id`, `name`) VALUES (?,?,?,?,?,?) ON DUPLICATE KEY UPDATE `_cq_sync_time` = VALUES(`_cq_sync_time`), `_cq_source_name` = VALUES(`_cq_source_name`), `_cq_id` = VALUES(`_cq_id`), `_cq_parent_id` = VALUES(`_cq_parent_id`), `computer_id` = VALUES(`computer_id`), `name` = VALUES(`name`)
Expected Behavior
You can run bulk inserts by simply providing multiple value blocks separated by commas.
INSERT INTO `jamf_computer_inventory_services` (`_cq_sync_time`, `_cq_source_name`, `_cq_id`, `_cq_parent_id`, `computer_id`, `name`)
VALUES (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), ...
ON DUPLICATE KEY UPDATE `_cq_sync_time` = VALUES(`_cq_sync_time`), `_cq_source_name` = VALUES(`_cq_source_name`), `_cq_id` = VALUES(`_cq_id`), `_cq_parent_id` = VALUES(`_cq_parent_id`), `computer_id` = VALUES(`computer_id`), `name` = VALUES(`name`)
CloudQuery (redacted) config
kind: destination
spec:
name: "mysql"
path: "cloudquery/mysql"
registry: "cloudquery"
version: "v5.2.7"
spec:
connection_string: "..."
Steps To Reproduce
- Pick a source that will lead to thousands of records being inserted into your destination
- Watch queries being executed using
show full processlist - See that the query is not inserting in bulk, it is inserting one-by-one.
CloudQuery (redacted) logs
unnecessary
CloudQuery version
6.5.0
Additional Context
i have a PR upcoming to fix this
Pull request (optional)
- I can submit a pull request