Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented Jul 2, 2025

Resolves: #1750

Added new (in fact, it's a copy of the existing filesystem stream) filesystem with a custom scheme to confirm that the protocol is lost when partitioning is done.

Change Log


Added

Fixed

  • Partitioning should not overwrite schema

Changed

Removed

Deprecated

Security

@github-actions
Copy link
Contributor

github-actions bot commented Jul 2, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject                | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k      | 1    | 3   | 4.846mb -0.00%  | 437.282ms -0.62% | ±0.11% -67.95%  |
| ExcelExtractorBench   | bench_extract_10k_ods  | 1    | 3   | 65.541mb -0.00% | 1.063s -0.05%    | ±1.89% +91.16%  |
| ExcelExtractorBench   | bench_extract_10k_xlsx | 1    | 3   | 67.587mb -0.00% | 1.680s -0.41%    | ±0.77% +15.89%  |
| JsonExtractorBench    | bench_extract_10k      | 1    | 3   | 5.438mb -0.03%  | 1.162s -1.66%    | ±0.56% -71.35%  |
| ParquetExtractorBench | bench_extract_10k      | 1    | 3   | 86.399mb -0.00% | 901.605ms +0.07% | ±1.12% +266.17% |
| TextExtractorBench    | bench_extract_10k      | 1    | 3   | 4.569mb -0.00%  | 42.004ms -0.95%  | ±0.10% -91.25%  |
| XmlExtractorBench     | bench_extract_10k      | 1    | 3   | 4.554mb -0.00%  | 602.358ms -1.21% | ±0.90% -23.90%  |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                       | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 18.565mb -0.00%  | 73.624ms +0.66% | ±1.23% +47.98%  |
| RenameEntryTransformerBench     | bench_transform_10k_rows | 1    | 3   | 123.303mb -0.00% | 67.322ms +0.54% | ±0.90% +600.18% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 62.506mb +0.00%  | 88.919ms +0.28%  | ±0.43% -76.11% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 80.587mb +0.00%  | 102.855ms -0.73% | ±0.52% +31.25% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 166.299mb +0.00% | 2.012s +0.29%    | ±0.30% -52.69% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.871mb +0.00%  | 30.673ms -2.85%  | ±0.41% -55.45% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 42.515mb +0.00%  | 409.041ms +0.09% | ±0.85% +85.20%  |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.572mb +0.00%  | 80.910ms -1.81%  | ±0.58% -68.92%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.984mb +0.00% | 649.393ms +0.19% | ±0.49% -56.10%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.259mb +0.00%  | 326.852ms +0.50% | ±0.63% -51.88%  |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.845mb +0.00%  | 69.885ms -1.07%  | ±0.50% -33.37%  |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 93.455mb +0.00%  | 3.500ms -2.81%   | ±0.41% -80.01%  |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 110.826mb +0.00% | 243.040ms +1.55% | ±0.99% +165.58% |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 93.546mb +0.00%  | 24.463ms -0.57%  | ±0.59% +53.73%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 94.330mb +0.00%  | 1.520ms -12.19%  | ±3.11% +1.97%   |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 94.330mb +0.00%  | 1.442ms -19.49%  | ±1.74% -51.98%  |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 92.490mb +0.00%  | 3.661ms -3.44%   | ±1.99% -33.42%  |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 93.019mb +0.00%  | 16.177ms +1.09%  | ±1.44% +39.55%  |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 93.019mb +0.00%  | 15.615ms +0.85%  | ±1.27% -20.16%  |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 91.708mb +0.00%  | 1.994μs -0.30%   | ±2.40% +0.00%   |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 91.708mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 100.769mb +0.00% | 14.851ms -0.71%  | ±2.32% -14.53%  |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 130.196mb +0.00% | 68.055ms -1.07%  | ±1.08% +218.24% |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 93.539mb +0.00%  | 1.436ms -3.34%   | ±0.68% -74.03%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 96.908mb +0.00%  | 64.933ms +2.10%  | ±1.31% +50.33%  |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 94.592mb +0.00%  | 3.589ms -1.48%   | ±2.61% +58.07%  |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 92.070mb +0.00%  | 40.427ms -0.13%  | ±1.36% -51.19%  |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 92.070mb +0.00%  | 39.577ms -5.26%  | ±2.31% +65.13%  |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 92.070mb +0.00%  | 40.167ms -0.80%  | ±1.10% +213.14% |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 94.151mb +0.00%  | 8.227ms -3.01%   | ±2.62% +227.16% |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 91.901mb +0.00%  | 29.975ms -1.25%  | ±1.81% +74.24%  |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 91.708mb +0.00%  | 14.442μs -3.01%  | ±2.26% -26.62%  |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 91.708mb +0.00%  | 16.462μs -4.53%  | ±2.30% +33.34%  |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 110.827mb +0.00% | 241.318ms -1.44% | ±1.27% -12.67%  |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@codecov
Copy link

codecov bot commented Jul 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.30%. Comparing base (7f0cd6d) to head (d315052).
Report is 2 commits behind head on 1.x.

Additional details and impacted files
@@           Coverage Diff           @@
##              1.x    #1752   +/-   ##
=======================================
  Coverage   81.30%   81.30%           
=======================================
  Files         717      717           
  Lines       19949    19948    -1     
=======================================
  Hits        16219    16219           
+ Misses       3730     3729    -1     
Components Coverage Δ
etl 88.40% <100.00%> (-0.01%) ⬇️
cli 85.46% <ø> (ø)
lib-array-dot 94.56% <ø> (ø)
lib-azure-sdk 61.35% <ø> (ø)
lib-doctrine-dbal-bulk 93.88% <ø> (ø)
lib-filesystem 78.02% <ø> (ø)
lib-types 53.43% <ø> (ø)
lib-parquet 84.17% <ø> (ø)
lib-parquet-viewer 83.11% <ø> (ø)
lib-snappy 91.16% <ø> (+0.46%) ⬆️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 97.04% <ø> (ø)
symfony-http-foundation 74.41% <ø> (ø)
adapter-chartjs 86.70% <ø> (ø)
adapter-csv 88.85% <ø> (ø)
adapter-doctrine 89.89% <ø> (ø)
adapter-elasticsearch 97.23% <ø> (ø)
adapter-google-sheet 83.87% <ø> (ø)
adapter-http 58.10% <ø> (ø)
adapter-json 87.98% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.95% <ø> (ø)
adapter-parquet 78.64% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 82.73% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stloyd stloyd marked this pull request as ready for review July 2, 2025 10:34
@stloyd stloyd requested a review from norberttech as a code owner July 2, 2025 10:34
{
$this->setupFiles([__FUNCTION__ => []]);

$fs = new FakeNativeLocalFilesystem();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be replaced with src/lib/filesystem/src/Flow/Filesystem/Local/MemoryFilesystem.php maybe? I'm not sure how I feel about Fake Filesystem in ETL package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory doesn't support mv() method, so it will unfortunately not help :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could create a dummy to log action calls, if that copy is a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I tried before merging, ended up moving this double to Filesystem package

if ($this->saveMode === SaveMode::Overwrite) {
if ($fileStream->path()->partitions()->count()) {
$partitionFilesPatter = new Path($fileStream->path()->parentDirectory()->path() . '/*', $fileStream->path()->options());
$partitionFilesPatter = new Path($fileStream->path()->parentDirectory()->uri() . '/*', $fileStream->path()->options());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@norberttech norberttech merged commit 6158974 into flow-php:1.x Jul 3, 2025
27 checks passed
@stloyd stloyd deleted the bugfix/1750-1 branch July 3, 2025 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Protocol lost using partitioning mechanism whilst uploading to S3

2 participants