Skip to content

Conversation

@jmortlock
Copy link
Contributor

@jmortlock jmortlock commented Feb 11, 2025

Change Log

Added

Fixed

  • JSONLines Loader would occasionally write a newline to the start of the file.

Changed

Removed

Deprecated

Security


Description

In my local usage of this I would occasionally get a newline at the start of the file which causes upstream systems to reject the file, now I could not reproduce this in the test case and during my investigation of this I found I can simplify the writing by not tracking the write count, which solves the newline at start of file issue and simplifies the code.

In JSONL specification

The last character in a file following the last JSON value may be a line separator. In this case the line separator does not indicate the start of another JSON value. 

This means we can happily writeln without worry that the last line is just \n

@github-actions
Copy link
Contributor

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
| benchmark             | subject           | revs | its | mem_peak        | mode             | rstdev           |
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.799mb +0.01%  | 555.830ms -0.12% | ±0.12% -42.70%   |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.872mb +0.01%  | 1.057s -0.26%    | ±0.48% +179.79%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 86.315mb +0.00% | 891.871ms -0.03% | ±2.99% +1168.76% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.529mb +0.01%  | 35.435ms -0.01%  | ±0.38% -65.54%   |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.504mb +0.01%  | 607.052ms +0.69% | ±0.42% +11.63%   |
+-----------------------+-------------------+------+-----+-----------------+------------------+------------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 127.325mb +0.00% | 72.753ms +2.09% | ±0.72% -26.29% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 63.998mb +0.00%  | 103.788ms -0.03% | ±0.39% +14.47% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 84.345mb +0.00%  | 96.590ms -1.79%  | ±0.42% -40.39% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 161.186mb +0.00% | 20.534s -0.32%   | ±0.07% -72.21% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.995mb +0.00%  | 31.181ms -1.90%  | ±0.43% +96.62% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------+----------------------------+------+-----+------------------+------------------+-------------------------------+
| benchmark         | subject                    | revs | its | mem_peak         | mode             | rstdev                        |
+-------------------+----------------------------+------+-----+------------------+------------------+-------------------------------+
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 105.968mb +0.00% | 456.030ms -0.57% | ±0.61% -79.98%                |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 55.159mb +0.00%  | 231.813ms +0.91% | ±1.74% +149.96%               |
| EntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.681mb +0.00%  | 50.035ms +0.17%  | ±0.45% -58.75%                |
| RowsBench         | bench_chunk_10_on_10k      | 2    | 3   | 97.012mb +0.00%  | 3.173ms +1.32%   | ±1.73% +511.87%               |
| RowsBench         | bench_diff_left_1k_on_10k  | 2    | 3   | 114.295mb +0.00% | 189.405ms +3.47% | ±1.20% +400.40%               |
| RowsBench         | bench_diff_right_1k_on_10k | 2    | 3   | 97.015mb +0.00%  | 18.885ms +1.33%  | ±0.89% -48.19%                |
| RowsBench         | bench_drop_1k_on_10k       | 2    | 3   | 97.887mb +0.00%  | 2.076ms +31.14%  | ±1.35% -17.44%                |
| RowsBench         | bench_drop_right_1k_on_10k | 2    | 3   | 97.887mb +0.00%  | 1.429ms -2.20%   | ±1.98% -48.03%                |
| RowsBench         | bench_entries_on_10k       | 2    | 3   | 96.047mb +0.00%  | 4.216ms -2.03%   | ±2.06% +49.89%                |
| RowsBench         | bench_filter_on_10k        | 2    | 3   | 96.576mb +0.00%  | 16.355ms -0.94%  | ±0.13% -18.72%                |
| RowsBench         | bench_find_on_10k          | 2    | 3   | 96.576mb +0.00%  | 16.491ms +1.16%  | ±0.68% +10.77%                |
| RowsBench         | bench_find_one_on_10k      | 10   | 3   | 95.268mb +0.00%  | 1.806μs -4.94%   | ±2.57% +22002178505089000.00% |
| RowsBench         | bench_first_on_10k         | 10   | 3   | 95.268mb +0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%                  |
| RowsBench         | bench_flat_map_on_1k       | 2    | 3   | 104.486mb +0.00% | 14.528ms +0.08%  | ±0.83% -23.47%                |
| RowsBench         | bench_map_on_10k           | 2    | 3   | 134.553mb +0.00% | 70.354ms -3.78%  | ±0.65% +2.80%                 |
| RowsBench         | bench_merge_1k_on_10k      | 2    | 3   | 97.096mb +0.00%  | 1.438ms +8.67%   | ±2.15% -10.40%                |
| RowsBench         | bench_partition_by_on_10k  | 2    | 3   | 100.394mb +0.00% | 62.793ms -2.33%  | ±0.57% -60.92%                |
| RowsBench         | bench_remove_on_10k        | 2    | 3   | 98.149mb +0.00%  | 3.716ms -6.99%   | ±1.64% -22.58%                |
| RowsBench         | bench_sort_asc_on_1k       | 2    | 3   | 95.557mb +0.00%  | 42.055ms +2.27%  | ±0.33% -41.69%                |
| RowsBench         | bench_sort_by_on_1k        | 2    | 3   | 95.557mb +0.00%  | 41.400ms -0.98%  | ±1.04% -23.11%                |
| RowsBench         | bench_sort_desc_on_1k      | 2    | 3   | 95.557mb +0.00%  | 41.924ms +1.42%  | ±0.15% -89.85%                |
| RowsBench         | bench_sort_entries_on_1k   | 2    | 3   | 97.708mb +0.00%  | 8.199ms -2.73%   | ±0.50% -26.23%                |
| RowsBench         | bench_sort_on_1k           | 2    | 3   | 95.458mb +0.00%  | 29.296ms -1.24%  | ±0.18% -71.62%                |
| RowsBench         | bench_take_1k_on_10k       | 10   | 3   | 95.268mb +0.00%  | 13.499μs -7.81%  | ±2.42% +8.26%                 |
| RowsBench         | bench_take_right_1k_on_10k | 10   | 3   | 95.268mb +0.00%  | 15.188μs +0.18%  | ±1.71% -43.96%                |
| RowsBench         | bench_unique_on_1k         | 2    | 3   | 114.296mb +0.00% | 192.956ms +2.67% | ±0.89% -19.37%                |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 43.804mb +0.00%  | 361.683ms -0.47% | ±0.41% -52.95%                |
| TypeDetectorBench | bench_type_detector        | 1    | 3   | 11.613mb +0.00%  | 73.166ms -0.30%  | ±0.72% -10.73%                |
+-------------------+----------------------------+------+-----+------------------+------------------+-------------------------------+

@codecov
Copy link

codecov bot commented Feb 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.03%. Comparing base (dbba71d) to head (d1b9c0e).
Report is 4 commits behind head on 1.x.

Additional details and impacted files
@@            Coverage Diff             @@
##              1.x    #1469      +/-   ##
==========================================
- Coverage   83.05%   83.03%   -0.02%     
==========================================
  Files         663      663              
  Lines       17853    17847       -6     
==========================================
- Hits        14828    14820       -8     
- Misses       3025     3027       +2     
Components Coverage Δ
etl 85.77% <ø> (ø)
cli 86.73% <ø> (ø)
lib-array-dot 94.53% <ø> (ø)
lib-azure-sdk 62.56% <ø> (ø)
lib-doctrine-dbal-bulk 97.36% <ø> (ø)
lib-filesystem 76.75% <ø> (ø)
lib-parquet 84.33% <ø> (ø)
lib-parquet-viewer 82.02% <ø> (ø)
lib-rdsl 87.09% <ø> (ø)
lib-snappy 90.69% <ø> (-0.94%) ⬇️
bridge-filesystem-async-aws 90.38% <ø> (ø)
bridge-filesystem-azure 89.92% <ø> (ø)
bridge-monolog-http 96.38% <ø> (ø)
symfony-http-foundation 77.10% <ø> (ø)
adapter-chartjs 86.45% <ø> (ø)
adapter-csv 89.57% <ø> (ø)
adapter-doctrine 88.68% <ø> (ø)
adapter-elasticsearch 97.19% <ø> (ø)
adapter-google-sheet 78.04% <ø> (ø)
adapter-http 59.15% <ø> (ø)
adapter-json 90.62% <100.00%> (-0.29%) ⬇️
adapter-logger 53.84% <ø> (ø)
adapter-meilisearch 97.75% <ø> (ø)
adapter-parquet 80.85% <ø> (ø)
adapter-text 84.44% <ø> (ø)
adapter-xml 83.15% <ø> (ø)

@norberttech
Copy link
Member

hey @jmortlock sorry for super late response, was out most of the day, look great! 🙌

@norberttech norberttech merged commit e720fc5 into flow-php:1.x Feb 12, 2025
25 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants