Skip to content

Conversation

@norberttech
Copy link
Member

Resolves: #xxx

Change Log


Added

Fixed

  • performance of comparison transformations
  • performance of calculating rows/nulls in parquet row groups
  • make json and string types comparable

Changed

  • updated nix packages version in nix shell

Removed

Deprecated

Security

@github-actions
Copy link
Contributor

github-actions bot commented Jun 26, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject                | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k      | 1    | 3   | 4.828mb +0.01%  | 438.068ms +1.33% | ±3.38% +181.34% |
| ExcelExtractorBench   | bench_extract_10k_ods  | 1    | 3   | 65.539mb +0.00% | 1.067s -1.21%    | ±0.82% -26.70%  |
| ExcelExtractorBench   | bench_extract_10k_xlsx | 1    | 3   | 67.585mb +0.00% | 1.697s -0.97%    | ±0.52% -62.20%  |
| JsonExtractorBench    | bench_extract_10k      | 1    | 3   | 5.436mb -0.00%  | 1.168s +0.12%    | ±0.33% -26.10%  |
| ParquetExtractorBench | bench_extract_10k      | 1    | 3   | 86.399mb -0.00% | 905.099ms -1.54% | ±0.42% +53.89%  |
| TextExtractorBench    | bench_extract_10k      | 1    | 3   | 4.567mb +0.01%  | 42.113ms +0.30%  | ±0.57% -18.32%  |
| XmlExtractorBench     | bench_extract_10k      | 1    | 3   | 4.552mb +0.01%  | 602.401ms -2.48% | ±1.46% +555.66% |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.563mb +0.00%  | 73.059ms -0.78% | ±0.85% -40.38% |
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.301mb +0.00% | 68.129ms +2.56% | ±1.13% +18.26% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.490mb +0.00%  | 89.606ms +1.49%  | ±1.10% +69.51% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 80.585mb +0.00%  | 104.030ms +0.41% | ±0.16% -51.07% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 166.298mb +0.00% | 2.017s -0.21%    | ±0.40% -23.92% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.869mb +0.00%  | 31.353ms +0.20%  | ±0.69% +49.39% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.512mb +0.00%  | 405.754ms +0.23% | ±0.40% -62.29%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.570mb +0.01%  | 82.596ms +0.85%  | ±1.20% +61.70%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.982mb +0.00% | 652.418ms +0.11% | ±0.29% -51.90%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.257mb +0.00%  | 333.930ms +1.25% | ±0.81% +10.74%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.843mb +0.01%  | 70.216ms -0.13%  | ±0.74% +225.18% |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.453mb +0.00%  | 3.953ms -2.86%   | ±3.31% +9.95%   |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.823mb +0.00% | 240.381ms +1.02% | ±1.13% +37.37%  |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.543mb +0.00%  | 24.369ms -0.54%  | ±1.38% -14.63%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.328mb +0.00%  | 1.709ms -13.58%  | ±2.93% +152.66% |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.328mb +0.00%  | 1.710ms -5.87%   | ±3.16% +31.75%  |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.489mb +0.00%  | 3.950ms +9.99%   | ±1.84% +0.63%   |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 93.018mb +0.00%  | 15.872ms -5.24%  | ±0.98% -19.95%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 93.018mb +0.00%  | 16.001ms -3.93%  | ±0.37% -85.18%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.707mb +0.00%  | 1.994μs +5.28%   | ±2.40% -5.08%   |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.707mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.767mb +0.00% | 15.004ms -4.99%  | ±0.30% -78.36%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.194mb +0.00% | 68.885ms +0.40%  | ±1.38% +38.29%  |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.538mb +0.00%  | 1.617ms -0.23%   | ±3.66% +127.68% |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 96.907mb +0.00%  | 64.033ms +1.26%  | ±0.37% -60.89%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.590mb +0.00%  | 4.075ms +3.26%   | ±1.73% +35.92%  |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.068mb +0.00%  | 40.354ms -4.27%  | ±1.85% -22.05%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.069mb +0.00%  | 40.568ms -1.59%  | ±1.23% -19.50%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.068mb +0.00%  | 40.922ms -2.69%  | ±3.12% +58.12%  |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.150mb +0.00%  | 8.809ms +4.43%   | ±1.35% -33.26%  |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.899mb +0.00%  | 30.956ms -0.03%  | ±1.34% -0.10%   |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.707mb +0.00%  | 15.142μs +2.86%  | ±1.96% +130.79% |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.707mb +0.00%  | 17.004μs -9.61%  | ±2.34% +367.69% |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.824mb +0.00% | 243.445ms +1.06% | ±0.21% +63.98%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@codecov
Copy link

codecov bot commented Jun 26, 2025

Codecov Report

Attention: Patch coverage is 98.18182% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.26%. Comparing base (c86657f) to head (8cf5dbf).
Report is 1 commits behind head on 1.x.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1739      +/-   ##
==========================================
+ Coverage   81.23%   81.26%   +0.03%     
==========================================
  Files         715      715              
  Lines       19873    19901      +28     
==========================================
+ Hits        16143    16173      +30     
+ Misses       3730     3728       -2     
Components Coverage Δ
etl 88.40% <100.00%> (+0.07%) ⬆️
cli 85.46% <ø> (ø)
lib-array-dot 94.56% <ø> (ø)
lib-azure-sdk 61.35% <ø> (ø)
lib-doctrine-dbal-bulk 93.88% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-types 53.43% <85.71%> (-0.15%) ⬇️
lib-parquet 84.17% <100.00%> (+0.03%) ⬆️
lib-parquet-viewer 83.11% <ø> (ø)
lib-snappy 90.69% <ø> (ø)
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 97.04% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.70% <ø> (ø)
adapter-csv 87.74% <ø> (ø)
adapter-doctrine 89.89% <ø> (ø)
adapter-elasticsearch 97.23% <ø> (ø)
adapter-google-sheet 83.87% <ø> (ø)
adapter-http 58.10% <ø> (ø)
adapter-json 87.98% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.95% <ø> (ø)
adapter-parquet 78.64% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 82.73% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

public function eval(Row $row) : bool
{
$left = (new Parameter($this->left))->eval($row);
$leftType = (new Parameter($this->left))->asType($row);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get null as $left & $right should we still run version comparison?

Copy link
Member Author

@norberttech norberttech Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but it wont hurt, since the types comparator will get type_null 2x and will return that types are comparable.
Adding extra condition just for nulles doesn't seem to bring any significant performance boost (was just testing this locally)

@norberttech norberttech merged commit 1f550f0 into 1.x Jun 26, 2025
21 checks passed
@norberttech norberttech deleted the bug/parquet-reading-performance branch June 26, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants