Skip to content

Conversation

@norberttech
Copy link
Member

@norberttech norberttech commented Dec 19, 2023

Change Log

Added

  • FileListExtractor

Fixed

Changed

Removed

Deprecated

Security


Description

Refs: #881

This can be used as a more advanced/flexible replacement for partition pruning or just to simply list directory content.

etl-adapter-filesystem should bring a similar one for remote files, but that is something for another PR.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 19, 2023

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.138mb +0.01%  | 722.629ms -1.81% | ±0.63% -79.78%  |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.836mb +0.06%   | 303.106ms -1.43% | ±0.22% -78.84%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.935mb +0.05%   | 945.146ms -0.34% | ±1.02% +130.92% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 239.713mb +0.00% | 1.117s -0.29%    | ±0.95% -18.39%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.711mb +0.06%   | 27.278ms +0.38%  | ±0.41% -75.86%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.713mb +0.06%   | 408.549ms -1.50% | ±0.35% -57.24%  |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev        |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 110.407mb +0.00% | 64.382ms -1.07% | ±1.37% +5.93% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 94.806mb +0.00%  | 441.480ms -5.48% | ±1.11% -60.22%  |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.884mb +0.00%  | 70.186ms -3.57%  | ±0.84% +399.75% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 105.466mb +0.00% | 57.482ms -2.27%  | ±1.21% +181.11% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 320.676mb +0.00% | 1.288s -1.83%    | ±2.50% -1.15%   |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.761mb +0.02%  | 40.866ms -1.42%  | ±1.21% -33.88%  |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.044mb +0.00% | 378.767ms -2.93% | ±1.07% -25.43%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 59.762mb +0.00%  | 193.722ms +0.80% | ±1.17% -39.62%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.837mb +0.02%  | 39.641ms -2.52%  | ±2.35% +23.93%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.393mb +0.00%  | 337.972ms +0.77% | ±0.95% +45.61%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.316mb +0.02%  | 65.208ms -0.88%  | ±0.80% -51.80%  |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 76.458mb +0.00%  | 3.732ms -7.01%   | ±2.38% -28.25%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 96.251mb +0.00%  | 181.669ms -1.57% | ±0.59% +271.53% |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 74.776mb +0.00%  | 18.008ms -3.14%  | ±1.03% -51.28%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 77.698mb +0.00%  | 1.652ms -14.10%  | ±0.24% -84.74%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 77.698mb +0.00%  | 1.653ms -17.04%  | ±1.16% -67.59%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 74.810mb +0.00%  | 2.492ms -11.65%  | ±0.78% +70.59%  |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 75.339mb +0.00%  | 14.178ms -5.69%  | ±1.19% -29.83%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 75.339mb +0.00%  | 14.307ms -3.84%  | ±0.67% -6.43%   |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 73.242mb +0.00%  | 1.594μs -6.24%   | ±3.01% +0.00%   |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 73.242mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 86.865mb +0.00%  | 12.595ms -0.28%  | ±0.88% -3.00%   |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 116.158mb +0.00% | 63.067ms -4.19%  | ±0.11% -95.95%  |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 75.859mb +0.00%  | 1.208ms -16.22%  | ±3.52% +124.94% |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 78.132mb +0.00%  | 35.491ms -4.66%  | ±0.20% -83.20%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 77.960mb +0.00%  | 3.799ms -6.69%   | ±1.55% +29.75%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 73.387mb +0.00%  | 40.966ms +0.92%  | ±2.23% +503.67% |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 73.388mb +0.00%  | 39.513ms -1.10%  | ±0.47% -66.69%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 73.387mb +0.00%  | 39.783ms -2.50%  | ±0.68% +62.91%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 75.684mb +0.00%  | 7.344ms -0.10%   | ±0.83% +188.62% |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 73.242mb +0.00%  | 28.634ms -3.84%  | ±3.11% +986.26% |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 73.242mb +0.00%  | 14.390μs +3.44%  | ±3.18% +371.34% |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 73.242mb +0.00%  | 16.200μs -3.43%  | ±1.01% -10.90%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 96.252mb +0.00%  | 184.464ms -2.38% | ±1.03% +66.76%  |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@norberttech norberttech changed the title Added FileListExtractor Added LocalFileListExtractor Dec 19, 2023
@norberttech norberttech merged commit c143d41 into flow-php:1.x Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant