Skip to content

Conversation

@stloyd
Copy link
Member

@stloyd stloyd commented Oct 13, 2023

Change Log

Added

  • Add PHPBench tool and first benchmark example

Fixed

Changed

Removed

Deprecated

Security


Description

Docs: https://phpbench.readthedocs.io/en/latest/quick-start.html

Refs: #560

Report:

composer run-script test:benchmark
> tools/phpbench/vendor/bin/phpbench run --report=aggregate --retry-threshold=5
PHPBench (1.2.14) running benchmarks... #standwithukraine
with configuration file: /Users/stloyd/Documents/flow/phpbench.json
with PHP version 8.1.24, xdebug ❌, opcache ❌

.......... 

Subjects: 10, Assertions: 0, Failures: 0, Errors: 0
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+
| benchmark                           | subject                    | set | revs | its | mem_peak | mode     | rstdev |
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+
| AvroExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.816μs  | ±1.03% |
| CSVExtractorBench                   | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.139μs  | ±2.73% |
| JsonExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.242μs  | ±2.88% |
| ParquetExtractorBench               | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.039μs  | ±2.38% |
| TextExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.713μs  | ±2.49% |
| XmlExtractorBench                   | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.053μs  | ±2.32% |
| RenameEntryTransformerBench         | bench_transform            |     | 1000 | 5   | 3.627mb  | 23.397μs | ±2.78% |
| EntryExpressionEvalTransformerBench | bench_transform_json_row   |     | 1000 | 5   | 3.627mb  | 14.574μs | ±1.08% |
| EntryExpressionEvalTransformerBench | bench_transform_string_row |     | 1000 | 5   | 3.627mb  | 14.349μs | ±0.62% |
| EntryExpressionEvalTransformerBench | bench_transform_xml_row    |     | 1000 | 5   | 3.627mb  | 40.931μs | ±1.04% |
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+

@stloyd stloyd requested a review from norberttech October 13, 2023 16:23
@stloyd stloyd changed the title Add note about rework of transformers into UPGRADE.md file Add PHPBench tool and first benchmark example Oct 13, 2023
@stloyd stloyd force-pushed the feature/phpbench-intro branch 2 times, most recently from 3e2f98d to 637bdd9 Compare October 14, 2023 07:31
@stloyd stloyd force-pushed the feature/phpbench-intro branch 4 times, most recently from ef2dca0 to 2075d7c Compare October 14, 2023 07:39
@stloyd stloyd requested a review from norberttech October 14, 2023 07:40
@norberttech
Copy link
Member

This looks great!
Now, we need to think about what we would like to monitor.
Your example looks nice, but it does not say anything about what is tested there. It can help to notice some memory leaks and maybe even a performance degradation, but still without any details on what is leaking or where the bottleneck is.

I was thinking about creating benchmarks for specific building blocks separately, for example:

  • Extractors - we could come up with some dataset schema, save it as all supported file types, and just benchmark extraction without doing any operations on the dataset.
  • Transformers - since we reduced the number of transformers, keeping only critical ones, we might want to start at least from those most frequently used, like the one that evaluates expressions. Here, I think we can take a similar approach, but instead of using extractors, we can directly pass prepared Rows to it and measure the performance of transformations themselves.
  • Expressions - just like with Transformers, but here we don't even need Rows. Single Row should be enough
  • Loaders - similarly to Transformers, prepare Rows and execute Loading them into the destination directly

Those are very granular benchmarks, which can test all building blocks separately, providing clear insights about each element separately. However, on top of that, I would probably still try to benchmark entire Pipelines on a selected subset of the most frequently used extractors/loaders/transformers (we would need to develop a few scenarios here).

So, to summarize, in order to finish this initial setup, I would probably start by preparing benchmarks for each of the elements I described above, except the global scenarios for now. This will not only be a good starting point for us but also a pretty nice template for anyone who would like to contribute, even without a full understanding of how the entire project works.

I'm not sure what are the right numbers for revisions (this is what revs stands for?) and iterations, we would need to find a sweet spot between time/value. We might need to use different values for different building blocks because for example Expressions would show any performance degradation only after a couple hundred of iterations when extractors might need only a few and a bigger input.
If we could run benchmarks of each building block in parallel, that would be even more amazing since it could reduce the time of all benchmarks significantly.

Those are my thoughts about adding phpbench to the project, in the past I made a few attempts to use it in other projects, and what I wrote here is pretty much a summary of my past experiences. I would love to hear some thoughts about it or different propositions.

@stloyd stloyd force-pushed the feature/phpbench-intro branch 2 times, most recently from 3132377 to 341bf2c Compare October 15, 2023 09:43
@github-actions github-actions bot added size: S and removed size: M labels Oct 15, 2023
@stloyd stloyd force-pushed the feature/phpbench-intro branch from 341bf2c to 97c3346 Compare October 15, 2023 10:00
@stloyd stloyd force-pushed the feature/phpbench-intro branch from 97c3346 to c0a8258 Compare October 15, 2023 10:40
@norberttech norberttech merged commit 68916ab into flow-php:1.x Oct 16, 2023
@stloyd stloyd deleted the feature/phpbench-intro branch October 16, 2023 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants