Skip to content

Conversation

@norberttech
Copy link
Member

Resolves: #1712

Change Log


Added

  • option to pass column types to DbalLoader to avoid auto detection
  • DbalTypesDetector that detects target types from dataset schema

Fixed

  • unknown db types while writing to db

Changed

Removed

Deprecated

Security

Before (total sql queries)
image

After

image

@codecov
Copy link

codecov bot commented Jul 18, 2025

Codecov Report

Attention: Patch coverage is 92.92929% with 7 lines in your changes missing coverage. Please review.

Project coverage is 81.87%. Comparing base (b65cc2e) to head (64ebc1d).
Report is 2 commits behind head on 1.x.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1776      +/-   ##
==========================================
+ Coverage   81.79%   81.87%   +0.08%     
==========================================
  Files         726      729       +3     
  Lines       20835    20878      +43     
==========================================
+ Hits        17041    17094      +53     
+ Misses       3794     3784      -10     
Components Coverage Δ
etl 88.46% <ø> (+0.04%) ⬆️
cli 85.46% <ø> (ø)
lib-array-dot 94.56% <ø> (ø)
lib-azure-sdk 61.35% <ø> (ø)
lib-doctrine-dbal-bulk 95.02% <100.00%> (+1.14%) ⬆️
lib-filesystem 78.02% <ø> (ø)
lib-types 53.55% <ø> (ø)
lib-parquet 85.50% <ø> (ø)
lib-parquet-viewer 83.11% <ø> (ø)
lib-snappy 89.76% <ø> (ø)
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 97.04% <ø> (ø)
bridge-openapi-specification 93.16% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.70% <ø> (ø)
adapter-csv 88.85% <ø> (ø)
adapter-doctrine 91.03% <89.39%> (+1.14%) ⬆️
adapter-elasticsearch 97.23% <ø> (ø)
adapter-google-sheet 83.87% <ø> (ø)
adapter-http 58.10% <ø> (ø)
adapter-json 87.98% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.95% <ø> (ø)
adapter-parquet 78.92% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 82.73% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 18, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject                | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k      | 1    | 3   | 4.871mb +0.02%  | 440.256ms +0.47% | ±0.44% -28.65%  |
| ExcelExtractorBench   | bench_extract_10k_ods  | 1    | 3   | 65.566mb +0.00% | 1.065s +0.52%    | ±0.67% -22.32%  |
| ExcelExtractorBench   | bench_extract_10k_xlsx | 1    | 3   | 67.666mb +0.00% | 1.691s +0.03%    | ±0.57% +47.78%  |
| JsonExtractorBench    | bench_extract_10k      | 1    | 3   | 5.463mb +0.01%  | 1.134s -0.71%    | ±1.10% +141.08% |
| ParquetExtractorBench | bench_extract_10k      | 1    | 3   | 10.670mb -0.17% | 9.227s -20.12%   | ±0.20% -71.44%  |
| TextExtractorBench    | bench_extract_10k      | 1    | 3   | 4.593mb +0.02%  | 41.460ms -1.95%  | ±0.97% +44.33%  |
| XmlExtractorBench     | bench_extract_10k      | 1    | 3   | 4.579mb +0.02%  | 594.996ms -0.31% | ±0.75% -24.56%  |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.590mb +0.00%  | 73.007ms -2.00% | ±0.70% +12.66% |
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.328mb +0.00% | 67.645ms -0.58% | ±1.16% +0.84%  |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.532mb +0.00%  | 84.064ms -0.02%  | ±1.74% +82.44% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 80.613mb +0.00%  | 100.417ms -0.60% | ±0.37% -34.08% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 819.273mb +0.04% | 20.045s -26.24%  | ±0.50% +44.15% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.896mb +0.00%  | 29.814ms -1.51%  | ±0.12% -73.50% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 106.010mb +0.00% | 650.790ms -1.67% | ±0.56% -62.99%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.284mb +0.00%  | 328.527ms -0.76% | ±0.52% -32.65%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.870mb +0.01%  | 70.201ms -2.96%  | ±1.63% +159.75% |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.540mb +0.00%  | 402.853ms -0.96% | ±0.11% -67.99%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.598mb +0.01%  | 81.490ms +0.11%  | ±1.83% +5.66%   |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.481mb +0.00%  | 3.222ms -23.62%  | ±2.21% +99.94%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.851mb +0.00% | 236.297ms -1.10% | ±0.23% -65.19%  |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.571mb +0.00%  | 23.666ms -2.84%  | ±0.71% -7.54%   |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.355mb +0.00%  | 1.324ms -30.07%  | ±1.91% +43.95%  |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.355mb +0.00%  | 1.366ms -28.26%  | ±0.60% -60.95%  |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.516mb +0.00%  | 3.326ms -11.24%  | ±0.28% -82.92%  |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 93.045mb +0.00%  | 14.985ms -6.54%  | ±1.62% -26.82%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 93.045mb +0.00%  | 15.357ms +0.04%  | ±1.82% +82.72%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.734mb +0.00%  | 1.794μs -14.82%  | ±2.67% +20.75%  |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.734mb +0.00%  | 0.300μs -25.00%  | ±0.00% -100.00% |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.794mb +0.00% | 14.430ms -10.40% | ±0.56% -81.30%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.222mb +0.00% | 67.106ms -4.30%  | ±1.70% +128.97% |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.565mb +0.00%  | 1.196ms -32.43%  | ±2.12% -34.22%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 96.934mb +0.00%  | 60.605ms -2.89%  | ±0.72% -2.52%   |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.618mb +0.00%  | 3.507ms -9.52%   | ±1.97% +309.92% |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.096mb +0.00%  | 39.024ms -2.70%  | ±0.46% -40.96%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.096mb +0.00%  | 39.976ms +0.27%  | ±1.00% -20.33%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.096mb +0.00%  | 39.583ms -3.81%  | ±1.17% +60.22%  |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.177mb +0.00%  | 7.932ms -3.61%   | ±0.50% +27.18%  |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.927mb +0.00%  | 28.781ms -3.02%  | ±1.19% +121.51% |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.734mb +0.00%  | 13.743μs -7.27%  | ±2.38% +106.28% |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.734mb +0.00%  | 16.134μs -5.30%  | ±2.72% +65.24%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.852mb +0.00% | 237.947ms -1.63% | ±0.11% -84.54%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
Parquet Library
+--------------------+---------------------------------+------+-----+------------------+-------------------+-----------------+
| benchmark          | subject                         | revs | its | mem_peak         | mode              | rstdev          |
+--------------------+---------------------------------+------+-----+------------------+-------------------+-----------------+
| ParquetReaderBench | bench_page_headers              | 1    | 3   | 6.668mb +0.01%   | 3.273s -0.53%     | ±1.36% -2.25%   |
| ParquetReaderBench | bench_read_metadata             | 1    | 3   | 5.353mb +0.01%   | 18.043ms -2.03%   | ±0.55% -34.09%  |
| ParquetReaderBench | bench_read_schema               | 1    | 3   | 5.353mb +0.01%   | 18.129ms -1.73%   | ±0.46% -40.83%  |
| ParquetReaderBench | bench_read_values_all_columns   | 1    | 3   | 9.102mb -0.20%   | 5.635s -29.07%    | ±0.72% +151.91% |
| ParquetReaderBench | bench_read_values_single_column | 1    | 3   | 6.400mb -0.28%   | 237.576ms -47.85% | ±0.88% +149.20% |
| ParquetReaderBench | bench_read_values_with_limit    | 1    | 3   | 6.930mb -0.44%   | 28.815ms -15.29%  | ±0.21% -53.73%  |
| ParquetWriterBench | bench_write_batch               | 1    | 3   | 11.720mb -14.44% | 191.287ms -14.03% | ±0.64% +0.42%   |
| ParquetWriterBench | bench_write_gzip                | 1    | 3   | 10.345mb +0.02%  | 217.486ms -0.16%  | ±0.04% -96.22%  |
| ParquetWriterBench | bench_write_row_by_row          | 1    | 3   | 11.720mb -14.44% | 193.081ms -12.99% | ±0.34% +85.71%  |
| ParquetWriterBench | bench_write_snappy              | 1    | 3   | 11.720mb -14.44% | 193.682ms -12.83% | ±0.30% -69.07%  |
| ParquetWriterBench | bench_write_uncompressed        | 1    | 3   | 10.021mb +0.03%  | 193.003ms +0.03%  | ±0.62% -5.29%   |
+--------------------+---------------------------------+------+-----+------------------+-------------------+-----------------+

norberttech and others added 3 commits July 18, 2025 23:08
Removed problematic column from Dbal Bulk insert integration tests, due
to differences in how different versions of dbal are handling jsonb
columns.
@norberttech norberttech merged commit 4f6ed64 into 1.x Jul 19, 2025
21 checks passed
@norberttech norberttech deleted the 1712-proposal-dbal-bulk-data-optimization branch July 19, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal]: Dbal Bulk Data Optimization

2 participants