Skip to content

Conversation

@norberttech
Copy link
Member

@norberttech norberttech commented Aug 4, 2024

Change Log

Added

Fixed

  • Use basenamePrefix instead of suffix when creating temporary file in overwrite save mode

Changed

Removed

Deprecated

Security


Description

Previously FilesystemStreams mechanism was adding ._flow_tmp suffix to a destination path when it was creating a temporary file. This creates one problem, if path is a file path with CSV, suffix will prevent Path object from properly detecting extension from file.csv._flow_tmp.

So let say we want to write to a /var/files/file.csv using df->saveMode(overwrite()).

Flow will first create a temporary file:

  • /var/files/file.csv._flow_tmp

To avoid overwriting an existing file which is a safety mechanism that is preventing overwriting existing file by unfinished transformation pipelines.

That's why instead of adding file suffix it's now adding basenamePrefix() which instead creates something like this:

  • /var/files/._flow_php_tmp.file.csv

Why detecting extension is important? There are Loaders like JsonLoader for example that are closing open streams by extension.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2024

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev          |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 3.953mb +0.03%  | 510.822ms +0.15% | ±2.36% +322.18% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.085mb +0.03%  | 1.069s +0.47%    | ±0.20% -50.33%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 28.545mb +0.00% | 425.211ms -1.89% | ±0.84% +61.84%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 3.713mb +0.04%  | 33.547ms -0.25%  | ±1.16% +42.03%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 3.659mb +0.04%  | 434.047ms -1.67% | ±0.46% -48.60%  |
+-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 116.054mb +0.00% | 58.147ms -3.71% | ±0.20% -89.44% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode            | rstdev          |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.175mb +0.00%  | 83.434ms -2.50% | ±0.57% -46.90%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 102.497mb +0.00% | 51.699ms -3.99% | ±0.56% -71.84%  |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 123.824mb +0.00% | 1.218s -1.08%   | ±0.21% -45.60%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.969mb +0.01%  | 42.730ms -4.33% | ±1.09% +130.69% |
+--------------------+----------------+------+-----+------------------+-----------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev           |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 52.638mb +0.00%  | 410.872ms +5.46% | ±1.74% +333.93%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 12.904mb +0.00%  | 77.612ms -1.69%  | ±0.49% -85.25%   |
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 86.812mb +0.00%  | 3.186ms -9.10%   | ±0.08% -97.40%   |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 102.413mb +0.00% | 186.796ms -1.64% | ±1.30% -37.35%   |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 85.133mb +0.00%  | 18.588ms -1.21%  | ±0.32% -78.82%   |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 88.052mb +0.00%  | 1.568ms -12.10%  | ±2.65% +48.39%   |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 88.052mb +0.00%  | 1.628ms -11.71%  | ±1.54% -46.54%   |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 85.164mb +0.00%  | 2.535ms -4.89%   | ±1.22% -61.17%   |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 85.693mb +0.00%  | 15.073ms -1.44%  | ±2.10% +151.80%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 85.693mb +0.00%  | 14.783ms -2.23%  | ±1.00% +434.44%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 83.596mb +0.00%  | 1.594μs -11.15%  | ±3.01% +12.77%   |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 83.596mb +0.00%  | 0.300μs -25.00%  | ±0.00% -100.00%  |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 92.947mb +0.00%  | 12.058ms -0.20%  | ±0.85% -52.43%   |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 122.318mb +0.00% | 59.771ms -3.37%  | ±1.17% +11.05%   |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 86.212mb +0.00%  | 1.201ms -6.64%   | ±2.73% +244.14%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 89.561mb +0.00%  | 60.037ms -5.60%  | ±2.08% +109.94%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 88.314mb +0.00%  | 3.763ms -6.99%   | ±1.09% -18.93%   |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 83.743mb +0.00%  | 38.820ms -3.50%  | ±0.88% -10.47%   |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 83.744mb +0.00%  | 39.158ms -0.29%  | ±1.08% +103.43%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 83.743mb +0.00%  | 39.481ms -0.15%  | ±1.05% +184.57%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 86.038mb +0.00%  | 7.317ms -1.20%   | ±0.11% -91.10%   |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 83.596mb +0.00%  | 29.251ms +0.51%  | ±1.82% +2563.89% |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 83.596mb +0.00%  | 13.066μs -5.40%  | ±1.56% -18.53%   |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 83.596mb +0.00%  | 15.824μs -4.00%  | ±1.18% -61.41%   |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 102.414mb +0.00% | 190.480ms -1.31% | ±0.84% -23.59%   |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 106.829mb +0.00% | 459.518ms -0.81% | ±0.47% +192.06%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.187mb +0.00%  | 233.463ms -2.39% | ±2.97% +131.11%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.025mb +0.00%  | 50.642ms -5.09%  | ±0.23% -92.35%   |
+-------------------------+----------------------------+------+-----+------------------+------------------+------------------+

@norberttech norberttech merged commit a0234b3 into flow-php:1.x Aug 4, 2024
@norberttech norberttech deleted the feature/overwriting_stream_suffix branch December 5, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant