Skip to content

bug: mysql destination does not actually bulk write to the database #19122

@jsifuentes

Description

@jsifuentes

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When syncing thousands of rows into a table, it is noticeably slow. On a MySQL server, you can execute show full processlist; and get a snapshot of queries running at that moment in time. Using this query, you can monitor the types of INSERT queries being run by CloudQuery. Using this, you can confirm that the plugin is only inserting one row at a time and not bulk inserting even though the plugin claims to bulk write.

BatchSize int64 `json:"batch_size,omitempty" jsonschema:"minimum=1,default=1000"`

Here is an example query being run by the MySQL plugin: (the tables are from a plugin i am working on)

INSERT INTO `jamf_computer_inventory_services` (`_cq_sync_time`, `_cq_source_name`, `_cq_id`, `_cq_parent_id`, `computer_id`, `name`) VALUES (?,?,?,?,?,?) ON DUPLICATE KEY UPDATE `_cq_sync_time` = VALUES(`_cq_sync_time`), `_cq_source_name` = VALUES(`_cq_source_name`), `_cq_id` = VALUES(`_cq_id`), `_cq_parent_id` = VALUES(`_cq_parent_id`), `computer_id` = VALUES(`computer_id`), `name` = VALUES(`name`)

Expected Behavior

You can run bulk inserts by simply providing multiple value blocks separated by commas.

INSERT INTO `jamf_computer_inventory_services` (`_cq_sync_time`, `_cq_source_name`, `_cq_id`, `_cq_parent_id`, `computer_id`, `name`)
VALUES (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), (?,?,?,?,?,?), ...
ON DUPLICATE KEY UPDATE `_cq_sync_time` = VALUES(`_cq_sync_time`), `_cq_source_name` = VALUES(`_cq_source_name`), `_cq_id` = VALUES(`_cq_id`), `_cq_parent_id` = VALUES(`_cq_parent_id`), `computer_id` = VALUES(`computer_id`), `name` = VALUES(`name`)

CloudQuery (redacted) config

kind: destination
spec:
name: "mysql"
path: "cloudquery/mysql"
registry: "cloudquery"
version: "v5.2.7"
spec:
connection_string: "..."

Steps To Reproduce

  1. Pick a source that will lead to thousands of records being inserted into your destination
  2. Watch queries being executed using show full processlist
  3. See that the query is not inserting in bulk, it is inserting one-by-one.

CloudQuery (redacted) logs

unnecessary

CloudQuery version

6.5.0

Additional Context

i have a PR upcoming to fix this

Pull request (optional)

  • I can submit a pull request

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions