Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

Fixed

  • Date String detection

Changed

  • Significantly reduced complexity of parquet schema converter

Removed

  • EntryClass from Schema Definition

Deprecated

Security


Description

This is just a first step to reduce schema conversion complexity in order to progress on: #1353

@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2025

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.773mb +0.23%  | 556.000ms +0.24% | ±0.48% -52.17%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.842mb +0.31%  | 1.060s -0.66%    | ±1.01% +64.48%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.491mb -0.22% | 896.991ms +0.83% | ±0.25% -61.08%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.503mb +0.25%  | 35.765ms +1.62%  | ±0.46% -59.34%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.480mb +0.20%  | 606.402ms +0.23% | ±0.44% +276.32% |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev          |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 127.302mb +0.01% | 72.561ms +3.66% | ±0.82% +129.33% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev          |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 63.905mb +0.12%  | 103.379ms +1.91% | ±0.42% -86.27%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 84.313mb +0.02%  | 100.079ms +0.39% | ±0.69% +49.84%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 161.207mb -0.02% | 20.537s +0.07%   | ±0.41% +464.89% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.969mb +0.06%  | 31.068ms +0.00%  | ±0.55% -64.75%  |
+--------------------+----------------+------+-----+------------------+------------------+-----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------+----------------------------+------+-----+------------------+------------------+------------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.939mb +0.01% | 461.800ms +1.48% | ±0.70% +12.79%   |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.130mb +0.03%  | 230.317ms +0.07% | ±1.31% -42.21%   |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.652mb +0.09%  | 50.180ms +1.38%  | ±1.28% +224.32%  |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 96.988mb +0.01%  | 3.142ms +1.25%   | ±1.48% +91.24%   |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 114.270mb +0.01% | 184.152ms -2.67% | ±0.14% -83.45%   |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 96.990mb +0.01%  | 18.834ms -0.26%  | ±0.71% +892.67%  |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 97.863mb +0.01%  | 1.484ms +0.30%   | ±1.72% -43.99%   |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 97.863mb +0.01%  | 1.518ms +2.26%   | ±0.18% -37.60%   |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 96.023mb +0.01%  | 4.481ms +3.69%   | ±1.28% -43.82%   |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 96.552mb +0.01%  | 16.599ms +1.19%  | ±0.95% +40.83%   |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 96.552mb +0.01%  | 16.545ms +1.69%  | ±1.17% +56.46%   |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 95.244mb +0.01%  | 1.906μs +6.25%   | ±2.44% -8.62%    |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 95.244mb +0.01%  | 0.400μs 0.00%    | ±0.00% 0.00%     |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 104.462mb +0.01% | 14.506ms +1.05%  | ±0.60% -56.00%   |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 134.529mb +0.01% | 73.403ms +3.93%  | ±1.80% +1498.79% |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 97.072mb +0.01%  | 1.340ms +6.23%   | ±1.96% +127.60%  |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 100.369mb +0.01% | 65.009ms +3.25%  | ±0.74% -47.09%   |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 98.125mb +0.01%  | 3.679ms -0.53%   | ±0.39% -49.15%   |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 95.532mb +0.01%  | 42.633ms +2.10%  | ±1.16% +68.30%   |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 95.532mb +0.01%  | 42.269ms -0.48%  | ±0.06% -96.51%   |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 95.532mb +0.01%  | 42.558ms +2.44%  | ±0.71% -6.17%    |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 97.684mb +0.01%  | 8.271ms +0.35%   | ±0.74% -53.35%   |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 95.434mb +0.01%  | 29.282ms -0.27%  | ±0.36% -60.01%   |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 95.244mb +0.01%  | 13.160μs +2.81%  | ±2.55% +299.68%  |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 95.244mb +0.01%  | 16.096μs +6.87%  | ±2.54% -17.43%   |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 114.271mb +0.01% | 191.918ms +0.15% | ±0.86% -17.78%   |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 43.778mb +0.02%  | 360.663ms +0.03% | ±0.93% +23.67%   |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.588mb +0.09%  | 73.137ms +1.75%  | ±1.57% +406.53%  |
+-------------------+----------------------------+------+-----+------------------+------------------+------------------+

@codecov
Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 78.65169% with 38 lines in your changes missing coverage. Please review.

Project coverage is 82.89%. Comparing base (0cc69d7) to head (135798c).
Report is 3 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1415      +/-   ##
==========================================
+ Coverage   82.56%   82.89%   +0.32%     
==========================================
  Files         655      655              
  Lines       17601    17522      -79     
==========================================
- Hits        14533    14524       -9     
+ Misses       3068     2998      -70     
Components Coverage Δ
etl 85.73% <71.20%> (-0.12%) ⬇️
cli 85.17% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 97.36% <ø> (ø)
lib-filesystem 76.23% <ø> (ø)
lib-parquet 84.33% <100.00%> (-0.24%) ⬇️
lib-parquet-viewer 82.02% <ø> (ø)
lib-rdsl 87.09% <ø> (ø)
lib-snappy 90.69% <ø> (-0.47%) ⬇️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 77.10% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 89.49% <ø> (ø)
adapter-doctrine 90.14% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 78.04% <ø> (ø)
adapter-http 59.15% <ø> (ø)
adapter-json 92.85% <ø> (ø)
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 80.85% <96.00%> (+20.96%) ⬆️
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)

@norberttech norberttech force-pushed the 1353-openapi-tofrom-flow-schema branch from 93008af to e9919c7 Compare January 28, 2025 16:48
@norberttech norberttech force-pushed the 1353-openapi-tofrom-flow-schema branch from cf70750 to 4296d00 Compare January 28, 2025 17:03
@norberttech norberttech merged commit 9f6ed86 into flow-php:1.x Jan 28, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant