Skip to content

Conversation

@norberttech
Copy link
Member

Change Log

Added

  • Dremel to properly shred/assemble nested structures with nullable elements

Fixed

Changed

  • Dremel algorithms are no longer working as Generators

Removed

Deprecated

Security


Description

While working on Dremel algorithms implementation I was mostly focused on non nullable nested structures, that's how this bug was missed in the first implementation.
After this fix, it will not only properly handle maps/lists with nulls but also empty maps/lists.
Dremel is no longer working as a generator because all data needs to sit in the memory anyway and since we are operating at small integers, this was bringing very little value compared to complexity of debugging it.

);
}

public function test_dremel_with_rows_with_combined_scalar_values_and_array_values() : void
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constraint was incorrect since there is a scenario where arrays also contain nulls, not only scalar values.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 11, 2023

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode              | rstdev          |
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.271mb +0.00%  | 538.754ms +46.99% | ±2.45% +189.05% |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.837mb +0.01%   | 438.088ms +62.60% | ±0.99% +409.14% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.007mb +0.01%   | 857.833ms +44.48% | ±3.21% +172.24% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 233.597mb +2.61% | 1.209s +54.30%    | ±1.61% +288.61% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.831mb +0.01%   | 25.594ms +32.32%  | ±1.04% -28.38%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.831mb +0.01%   | 639.734ms +58.02% | ±0.46% -62.15%  |
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+-----------------+------------------+---------------+
| benchmark                   | subject                  | revs | its | mem_peak        | mode             | rstdev        |
+-----------------------------+--------------------------+------+-----+-----------------+------------------+---------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 87.098mb +0.00% | 82.279ms +66.81% | ±1.14% -6.46% |
+-----------------------------+--------------------------+------+-----+-----------------+------------------+---------------+
Loaders
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode              | rstdev          |
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 95.281mb +0.00%  | 857.591ms +48.70% | ±2.04% +104.61% |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 46.114mb +0.00%  | 82.650ms +17.13%  | ±0.61% -65.91%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 90.558mb +0.00%  | 93.111ms +46.61%  | ±1.19% +9.18%   |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 286.030mb +2.23% | 2.463s +87.11%    | ±1.21% +52.68%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.608mb +0.00%  | 43.523ms +7.33%   | ±1.22% +141.73% |
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-------------------------------+
| benchmark               | subject                    | revs | its | mem_peak        | mode              | rstdev                        |
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-------------------------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 60.707mb +0.00% | 5.616ms +151.11%  | ±1.25% -39.90%                |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 80.499mb +0.00% | 225.495ms +48.52% | ±0.62% -71.82%                |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 59.025mb +0.00% | 20.509ms +35.79%  | ±1.28% +139.97%               |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 59.846mb +0.00% | 3.071ms +74.09%   | ±2.95% +163.78%               |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 59.846mb +0.00% | 3.197ms +66.44%   | ±3.17% +92.95%                |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 59.059mb +0.00% | 4.767ms +73.34%   | ±1.95% +47.36%                |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 59.588mb +0.00% | 24.109ms +71.44%  | ±3.39% +173.25%               |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 59.588mb +0.00% | 23.871ms +68.72%  | ±2.92% +604.03%               |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 57.659mb +0.00% | 2.206μs +16.11%   | ±2.11% +18061489817610000.00% |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 57.659mb +0.00% | 0.400μs 0.00%     | ±0.00% 0.00%                  |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 65.892mb +0.00% | 16.885ms +61.12%  | ±3.42% +67.57%                |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 91.412mb +0.00% | 73.053ms +54.02%  | ±2.54% +506.37%               |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 60.109mb +0.00% | 3.261ms +66.17%   | ±1.09% -26.66%                |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 62.379mb +0.00% | 59.537ms +77.61%  | ±2.45% +3531.50%              |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 62.209mb +0.00% | 9.349ms +96.04%   | ±2.84% +83.45%                |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 57.659mb +0.00% | 61.812ms +63.11%  | ±0.82% +16.28%                |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 57.659mb +0.00% | 64.372ms +70.50%  | ±0.93% -21.45%                |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 57.659mb +0.00% | 63.206ms +65.30%  | ±1.91% -28.01%                |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 59.933mb +0.00% | 12.215ms +67.36%  | ±3.52% +369.21%               |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 57.658mb +0.00% | 42.951ms +50.60%  | ±2.90% +99.45%                |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 57.659mb +0.00% | 25.596μs +97.20%  | ±2.41% +85.34%                |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 57.659mb +0.00% | 37.688μs +144.63% | ±0.25% -17.98%                |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 80.500mb +0.00% | 223.549ms +42.78% | ±1.20% +307.85%               |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 93.715mb +0.00% | 203.846ms +64.64% | ±1.07% +166.39%               |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 48.632mb +0.00% | 109.705ms +75.31% | ±3.51% +98.71%                |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 12.653mb +0.01% | 24.297ms +62.36%  | ±0.88% -67.86%                |
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-------------------------------+

@norberttech norberttech merged commit b11906c into flow-php:1.x Nov 11, 2023
@norberttech norberttech deleted the bug/serialize-nulls branch December 5, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant