Memory usage optimization / avoid sending unneeded data to parallel processing #408

PragTob · 2023-12-13T20:09:52Z

Should combat memory consumption issues caused by sending too much data to processes.

edit: To highlight the severeness of this:

there is a benchmark I'm working on that I almost couldn't run before this as my 32GB of RAM would not be enough
- benchmark is this one: https://github.com/PragTob/tco_elixir
it also hung forever before calculating statistics and running the formatters
In that benchmark I used :save to compare performance across elixir version, before this PR each individual save file was 200+MB. With this MR, it's 1MB (as it removes the data from formatters and :save is a formatter under the hood). The file size isn't the major thing, but it illustrates how much memory we save and how much time copying memory we avoid.

I wanna write up all of this in like 2 blog posts but as per usual my speed is abysmal 😁

Done to avoid copying potentially hige inputs and functions to the processes when all they need to do is crunch some silly small numbers. HUGE performance gain while reducing memory usage a lot for benchmarks with bigger data in use. Also use Task.async_stream instead of our good old `Parallel.map` as with the dawn of `inputs` I have seen people do 30+ scenarios (10 benchmarks with 3 inputs will do that to you), where if we're running on a 4 core system we might be doing too much in parallel also potentially skyrocketing memory consumption.

… not mix them up

This includes a slight workaround to preserve the names of inputs as people may be relying on those. Doesn't feel great but... somehow reasonable?

PragTob · 2023-12-14T15:41:35Z

Mental notes:

We can be even more efficient for the most common case of just 1 formatter if we just don't do any parallel processing in that case, difference is small but exists and should be an easy fix:

Name                        ips        average  deviation         median         99th %
sequential_output        240.79        4.15 ms     ±5.70%        4.13 ms        4.34 ms
format & write           240.00        4.17 ms     ±3.71%        4.13 ms        4.82 ms
Formatter.output         228.23        4.38 ms    ±16.58%        4.07 ms        6.34 ms

Comparison: 
sequential_output        240.79
format & write           240.00 - 1.00x slower +0.0138 ms
Formatter.output         228.23 - 1.06x slower +0.23 ms

might want to provide "input_names" to use instead of the broken list 👀
adjust docs & Changelog

PragTob · 2023-12-14T16:21:13Z

🥳

Name                        ips        average  deviation         median         99th %
Formatter.output         248.71        4.02 ms     ±4.58%        4.03 ms        4.59 ms
sequential_output        248.38        4.03 ms     ±3.16%        4.00 ms        4.53 ms
format & write           247.77        4.04 ms     ±6.76%        4.02 ms        4.35 ms

Comparison: 
Formatter.output         248.71
sequential_output        248.38 - 1.00x slower +0.00539 ms
format & write           247.77 - 1.00x slower +0.0153 ms

Now, that scenario is finally fast/fastest in that lil benchmark: ``` Name ips average deviation median 99th % Formatter.output 248.71 4.02 ms ±4.58% 4.03 ms 4.59 ms sequential_output 248.38 4.03 ms ±3.16% 4.00 ms 4.53 ms format & write 247.77 4.04 ms ±6.76% 4.02 ms 4.35 ms Comparison: Formatter.output 248.71 sequential_output 248.38 - 1.00x slower +0.00539 ms format & write 247.77 - 1.00x slower +0.0153 ms ```

PragTob · 2023-12-14T16:22:53Z

I definitely should get exguard working again... the amount of force pushes due to minor credo things are embarrassing ._.

PragTob · 2023-12-14T19:05:43Z

👋 Hey @hrzndhrn @NickNeck as you have some of the most Benchee plugins I'd appreciate it if you took a look here, mainly from the PoV on impact for plugins. If you want to review the PR as a whole of course I'm happy too 😁

TLDR;

sending a lot of data between processes VERY BAD
formatters are losing access to Scenario.function, Scenario.input and Configuration.inputs will only retain the names, not the values

Thanks!

NickNeck · 2023-12-15T17:06:29Z

@PragTob thanks for the hint. Thanks for Benchee and all your work. I will take a look during the weekend.

PragTob · 2023-12-15T19:29:24Z

@NickNeck thanks for all your great OSS work and also specifically of creating Benchee plugins, it was designed to facillitate and allow this but seeing all the things people do warms my heart! 💚

NickNeck

LGTM 👍 . I think the necessary changes for plugins are not too hard and the benefit it is worth.

PragTob · 2023-12-17T08:24:45Z

@NickNeck thanks a lot! 💚

This reverts commit dfa6987. It turned out that the parallel processing wasn't the issue with memory consumption we faced but instead copying data to processes. See: * #408 * #414 * https://pragtob.wordpress.com/2023/12/18/careful-what-data-you-send-or-how-to-tank-your-performance-with-task-async/

PragTob added 4 commits December 14, 2023 16:11

scrub scenario before sending it to formatters

f64011a

Introduce more different stats to statistics tests to make sure we do…

d01d26d

… not mix them up

Use zip + map instead of zip_with for old elixir compatibility

c43880d

PragTob force-pushed the memory branch from 8e7e7a5 to c43880d Compare December 14, 2023 15:11

Also scrub inputs from the configuration before formatters

be1f13a

This includes a slight workaround to preserve the names of inputs as people may be relying on those. Doesn't feel great but... somehow reasonable?

PragTob force-pushed the memory branch from b44e2e3 to be1f13a Compare December 14, 2023 15:31

PragTob added 2 commits December 14, 2023 16:46

Deactivated that dialyzer warning, no need to honor it any more

25076dc

Only run formatters in parallel if there is more than one

40b7d4a

PragTob mentioned this pull request Dec 14, 2023

High memory consumption on some cases #389

Closed

PragTob force-pushed the memory branch from 1739ee2 to 5a8fda1 Compare December 14, 2023 16:22

PragTob added 5 commits December 14, 2023 17:36

more unit tests to deal with different formatter cases we could open up

5af0aa5

fix formatter option typespec

a950ab2

Introduce input_names to rely on for formatters instead of inputs

676b05d

delete accidentally duplicated doctest

8fd3a97

document newly introduced formatter limitations

f19fa49

NickNeck approved these changes Dec 17, 2023

View reviewed changes

NickNeck mentioned this pull request Dec 17, 2023

Add updates for benchee 1.3 hrzndhrn/benchee_markdown#12

Merged

PragTob merged commit 59f9886 into main Dec 17, 2023

PragTob deleted the memory branch December 17, 2023 08:30

PragTob mentioned this pull request Dec 17, 2023

Avoid handing all of inputs to every scenario process #412

Closed

PragTob mentioned this pull request Dec 19, 2023

Revert "Introduce optional sequential_output/2 callback" #415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory usage optimization / avoid sending unneeded data to parallel processing #408

Memory usage optimization / avoid sending unneeded data to parallel processing #408

Uh oh!

PragTob commented Dec 13, 2023 •

edited

Loading

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

NickNeck commented Dec 15, 2023

Uh oh!

PragTob commented Dec 15, 2023

Uh oh!

NickNeck left a comment

Uh oh!

PragTob commented Dec 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Memory usage optimization / avoid sending unneeded data to parallel processing #408

Memory usage optimization / avoid sending unneeded data to parallel processing #408

Uh oh!

Conversation

PragTob commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

PragTob commented Dec 14, 2023

Uh oh!

NickNeck commented Dec 15, 2023

Uh oh!

PragTob commented Dec 15, 2023

Uh oh!

NickNeck left a comment

Choose a reason for hiding this comment

Uh oh!

PragTob commented Dec 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PragTob commented Dec 13, 2023 •

edited

Loading