Skip to content

[Proposal]: Dbal Bulk Data Optimization #1712

@norberttech

Description

@norberttech

Describe the Proposal

Currently BulkData expects only a list of rows and later it try to guess the best type for it: https://github.com/flow-php/flow/blob/1.x/src/lib/doctrine-dbal-bulk/src/Flow/Doctrine/Bulk/BulkData.php#L120-L160

This not only happens for each row (a lot of redundant checks) but we don't even have a way to adjust it if needed.

API Adjustments

Parameters types detection should be moved away to a standalone class and it should be used only when Parameter Types are not provided.

Parameter Types can be passed through BulkData constructor.

/**
 * @param array<int, array<string, mixed>> $rows
 * @param array<string, ParameterType|ArrayParameterType> $parameters
 */
public function __construct(array $rows, array $parameters = [])

And then in src/lib/doctrine-dbal-bulk/src/Flow/Doctrine/Bulk/Bulk.php if BulkData comes without all (or just some) parameters we could detect them (all that are missing) automatically using the same technique we are using now.

Implementation steps

Currently BulkData::toSqlParameters is using TableDefinition to get the column type and convert value to a format that will satisfy column.

1) Extract toSqlParameters logic to standalone class.

We need to move that logic outside of the BulkData, ideally to some kind of BulkParametersFactory.

BulkParametersFactory should receive Connection from which he will create $tableDefinition = new TableDefinition($table, ...\array_values($connection->createSchemaManager()->listTableColumns($table))); (but only when needed).

BulkParametersFactory should have one method, toParameters(BulkData $data) : array and return array that is exactly the same as the one returned by BulkData::toSqlParameters() currently.

2) Allow to pass column types to Bulk Data.

So when column types are passed to BulkData, BulkParametersFactory should take them from here instead initializing the TableDefinition and getting it from there.

3) DX Improvements

First two steps will provide us a bit more flexibility and maybe some performance improvement but we should create another interface called ValueConverter and create implementations that based on value type and column type will convert the value to format expected by the column.

For example:

StringJsonConverter - value type string, column type json,json_array returns \json_decode($value, true, 512, JSON_THROW_ON_ERROR)

StringDateTimeImmutable - value type string, column type datetime_immutable, datetimetz_immutable, date_immutable, time_immutable returns new \DateTimeImmutable($entry)

etc. The logic should be the same as in the match in line 121 in BulkData, just split into smaller converters.

So now we are going to have 2 mechanics in the factory:

  1. get or detect the destination column type
  2. based on detected column type and value type convert the value

Are you intending to also work on proposed change?

None

Are you interested in sponsoring this change?

None

Integration & Dependencies

No response

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions