Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented May 20, 2025

Change Log

Added

Fixed

Changed

  • Improve reading of headers in the `CSVExtractor`

Removed

Deprecated

Security


Description

@stloyd stloyd force-pushed the csv-extractor-header-rework branch from 6e4d047 to 921f3d7 Compare May 20, 2025 16:22
@github-actions github-actions bot added size: XS and removed size: S labels May 20, 2025
@stloyd stloyd changed the title Close Excel extractor when the limit is reached Improve reading of headers in the CSVExtractor May 20, 2025
@github-actions
Copy link
Contributor

github-actions bot commented May 20, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject                | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k      | 1    | 3   | 4.775mb +0.02%  | 591.057ms -2.29% | ±0.93% +216.60% |
| ExcelExtractorBench   | bench_extract_10k_ods  | 1    | 3   | 65.486mb +0.00% | 1.052s -0.14%    | ±0.95% -9.98%   |
| ExcelExtractorBench   | bench_extract_10k_xlsx | 1    | 3   | 67.532mb +0.00% | 1.694s -0.23%    | ±2.28% +131.96% |
| JsonExtractorBench    | bench_extract_10k      | 1    | 3   | 5.018mb +0.00%  | 1.280s +0.47%    | ±0.60% +100.89% |
| ParquetExtractorBench | bench_extract_10k      | 1    | 3   | 86.321mb +0.00% | 941.025ms +2.98% | ±0.65% +115.97% |
| TextExtractorBench    | bench_extract_10k      | 1    | 3   | 4.499mb +0.01%  | 38.590ms -1.28%  | ±0.76% +141.71% |
| XmlExtractorBench     | bench_extract_10k      | 1    | 3   | 4.494mb +0.01%  | 603.226ms +0.19% | ±0.16% -69.81%  |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.236mb +0.00% | 67.288ms +1.46% | ±0.26% -76.99%  |
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.498mb +0.00%  | 72.274ms -0.48% | ±0.67% +397.59% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev          |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.435mb +0.00%  | 85.563ms -0.48% | ±1.07% +104.86% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 79.706mb +0.00%  | 96.912ms +0.42% | ±0.93% +19.52%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 165.387mb +0.00% | 21.084s +1.70%  | ±0.90% +110.81% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.805mb +0.00%  | 31.387ms +0.36% | ±0.91% -42.97%  |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 101.784mb +0.00% | 644.074ms -0.25% | ±0.58% +323.26% |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 53.134mb +0.00%  | 332.340ms +0.38% | ±2.09% +26.05%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.384mb +0.00%  | 70.150ms -0.02%  | ±1.21% +33.59%  |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.389mb +0.00%  | 3.945ms +10.66%  | ±3.07% +175.19% |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.758mb +0.00% | 236.345ms +0.42% | ±0.12% -86.21%  |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.478mb +0.00%  | 24.362ms +2.91%  | ±1.07% +430.06% |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.264mb +0.00%  | 1.811ms +15.80%  | ±2.28% -37.82%  |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.264mb +0.00%  | 1.904ms +28.47%  | ±1.82% -49.10%  |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.424mb +0.00%  | 3.715ms +9.42%   | ±2.25% +165.97% |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 92.953mb +0.00%  | 16.336ms +5.62%  | ±1.10% -29.28%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 92.953mb +0.00%  | 16.327ms +6.62%  | ±3.34% +163.34% |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.642mb +0.00%  | 2.006μs +5.92%   | ±2.32% -8.20%   |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.642mb +0.00%  | 0.500μs +25.00%  | ±0.00% -100.00% |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.703mb +0.00% | 16.177ms +12.61% | ±2.01% +144.25% |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.130mb +0.00% | 67.699ms +1.12%  | ±0.92% +26.34%  |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.473mb +0.00%  | 1.827ms +36.55%  | ±2.29% +292.36% |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 96.841mb +0.00%  | 64.032ms +3.62%  | ±0.37% -53.43%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.526mb +0.00%  | 4.160ms +18.57%  | ±3.36% +125.46% |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.003mb +0.00%  | 41.159ms +2.34%  | ±1.18% +11.60%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.004mb +0.00%  | 41.823ms +4.00%  | ±0.60% -62.36%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.003mb +0.00%  | 41.575ms +3.02%  | ±0.80% -72.42%  |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.085mb +0.00%  | 8.635ms +6.13%   | ±2.56% +125.58% |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.835mb +0.00%  | 31.163ms +4.90%  | ±3.57% +416.94% |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.642mb +0.00%  | 14.734μs +7.21%  | ±1.40% -41.13%  |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.642mb +0.00%  | 16.924μs +6.07%  | ±1.11% -51.79%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.759mb +0.00% | 241.265ms +2.30% | ±0.93% +73.57%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.070mb +0.00%  | 426.867ms +1.35% | ±0.83% +86.04%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.448mb +0.00%  | 86.169ms +0.85%  | ±3.05% +201.69% |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@stloyd stloyd marked this pull request as ready for review May 20, 2025 16:27
@stloyd stloyd requested a review from norberttech as a code owner May 20, 2025 16:27
@codecov
Copy link

codecov bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.08%. Comparing base (60072b2) to head (6f3df22).
Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1664      +/-   ##
==========================================
+ Coverage   82.07%   82.08%   +0.01%     
==========================================
  Files         703      703              
  Lines       19053    19064      +11     
==========================================
+ Hits        15637    15649      +12     
+ Misses       3416     3415       -1     
Components Coverage Δ
etl 88.27% <ø> (ø)
cli 84.42% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 90.11% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-parquet 84.37% <ø> (ø)
lib-parquet-viewer 82.02% <ø> (ø)
lib-snappy 91.16% <ø> (+0.46%) ⬆️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 90.00% <100.00%> (+0.42%) ⬆️
adapter-doctrine 89.69% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 83.87% <ø> (ø)
adapter-http 59.15% <ø> (ø)
adapter-json 90.62% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 78.42% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stloyd stloyd force-pushed the csv-extractor-header-rework branch from bd21d54 to 6f3df22 Compare May 20, 2025 16:44
@norberttech norberttech merged commit 981b9ba into flow-php:1.x May 20, 2025
21 checks passed
@stloyd stloyd deleted the csv-extractor-header-rework branch May 20, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants