Skip to content

Conversation

@flavioheleno
Copy link
Contributor

@flavioheleno flavioheleno commented Jul 3, 2024

Change Log

Added

  • Added support for LZ4 compression algorithm to parquet

Fixed

Changed

Removed

Deprecated

Security


Description

Add support for LZ4 compression.

Closes #783.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+------------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev           |
+-----------------------+-------------------+------+-----+------------------+------------------+------------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 3.912mb +0.04%   | 511.742ms +0.71% | ±3.08% +1129.74% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 3.944mb +0.04%   | 1.063s -1.11%    | ±0.95% -64.43%   |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 135.378mb +0.00% | 735.216ms -0.49% | ±0.80% +156.88%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 3.671mb +0.04%   | 33.514ms -0.94%  | ±0.36% -78.04%   |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 3.618mb +0.04%   | 433.853ms +0.74% | ±0.39% -75.53%   |
+-----------------------+-------------------+------+-----+------------------+------------------+------------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 115.964mb +0.00% | 60.352ms +0.79% | ±2.47% +81.71% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev          |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.067mb +0.00%  | 84.376ms -0.67% | ±0.58% -38.68%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 106.499mb +0.00% | 52.001ms -2.75% | ±0.63% -29.58%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 225.835mb +0.00% | 1.396s +0.61%   | ±0.44% +150.69% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.860mb +0.01%  | 43.636ms -1.58% | ±0.47% -9.09%   |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 116.514mb +0.00% | 495.351ms +1.69% | ±3.29% +25.51%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 59.992mb +0.00%  | 247.766ms -1.16% | ±3.26% +80.56%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.926mb +0.01%  | 52.707ms -1.63%  | ±1.18% +164.02% |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.693mb +0.00%  | 433.731ms +1.07% | ±3.59% +573.34% |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.232mb +0.01%  | 94.167ms +2.86%  | ±1.93% -46.04%  |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 86.784mb +0.00%  | 3.387ms +5.13%   | ±1.03% +53.99%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.382mb +0.00% | 187.891ms -1.95% | ±1.01% -40.27%  |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.102mb +0.00%  | 18.625ms -1.38%  | ±0.80% +89.01%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.024mb +0.00%  | 1.705ms +0.85%   | ±0.44% -79.50%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.024mb +0.00%  | 1.719ms +0.40%   | ±1.64% -18.41%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.136mb +0.00%  | 2.555ms -0.69%   | ±1.13% -32.38%  |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.665mb +0.00%  | 16.999ms +14.24% | ±0.75% -46.85%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.665mb +0.00%  | 16.778ms +13.64% | ±1.61% +130.23% |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.569mb +0.00%  | 1.600μs 0.00%    | ±0.00% 0.00%    |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.569mb +0.00%  | 0.300μs -25.00%  | ±0.00% -100.00% |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 92.919mb +0.00%  | 12.041ms -0.71%  | ±0.73% -49.62%  |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.290mb +0.00% | 61.088ms -0.85%  | ±2.70% +87.87%  |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.185mb +0.00%  | 1.198ms -3.98%   | ±0.47% -82.33%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 89.531mb +0.00%  | 62.147ms -0.07%  | ±1.92% -13.84%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.286mb +0.00%  | 3.880ms -1.37%   | ±0.61% -45.09%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 83.712mb +0.00%  | 39.381ms +1.88%  | ±0.97% +250.92% |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 83.713mb +0.00%  | 39.367ms +0.63%  | ±0.71% +85.77%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 83.712mb +0.00%  | 39.072ms +0.18%  | ±1.02% +142.87% |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.010mb +0.00%  | 7.244ms -1.23%   | ±0.23% -72.69%  |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.569mb +0.00%  | 29.382ms +2.84%  | ±1.32% +80.92%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.569mb +0.00%  | 13.420μs 0.00%   | ±1.27% 0.00%    |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.569mb +0.00%  | 15.888μs +1.88%  | ±0.60% +96.63%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.383mb +0.00% | 192.733ms +0.36% | ±1.09% +189.19% |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

@norberttech norberttech merged commit 9436986 into flow-php:1.x Jul 4, 2024
@flavioheleno flavioheleno deleted the feat/lz4 branch July 4, 2024 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for LZ4 compression

2 participants