Skip to content

Allow to use a dedicated process for each job#3563

Closed
yguedidi wants to merge 6 commits intorectorphp:mainfrom
yguedidi:improve-parallel
Closed

Allow to use a dedicated process for each job#3563
yguedidi wants to merge 6 commits intorectorphp:mainfrom
yguedidi:improve-parallel

Conversation

@yguedidi
Copy link
Copy Markdown
Contributor

@yguedidi yguedidi commented Apr 4, 2023

Disclaimer: as the title says, this is a proof of concept, to open the discussion. It's of course not aim to be merged as is!

The story

I'm running Rector on a big project, and often it breaks because of no more memory...
It's impossible to run a full configuration on the project to clean it entirely based on the rules we decided to use.
We could argue that as a refactoring tool, rector aim to be run in one shot to do a just few rules migrations, and it actually work.
But still I find myself obliged to close PhpStorm and Chrome to free some memory for Rector.

The challenge

I challenged myself to be able to run Rector without memory issues until the end, with PhpStorm and Chrome still opened!
Here actual system monitoring before running Rector (there are other stuff needed for the project like Docker that are running too):

image

The investigation

I realized that Rector starts the configured number of processes and keep using them for each job.
So I thought about the idea of killing them after they finished their job, so one process = one parallel job.

The first results

On rector-src at master, on my machine with a configuration to use 3 threads and job size of 10, it takes around 5m 10s to run.
But it consume a lot of memory!

On this branch, with the same configuration it takes around 15min 20s!!! 3 times the initial time!
But changing the job size to 150 it's down to around 5m 20s, not that bad to me, when on master it takes around 4m 30s.

It has a cost to start processes!

The balance

The alternative don't fit for a parallel configuration with lots of small jobs, as it will lead to lots of processes to be created, and so exploding the time to run.
But being tweaked with a bigger job size, so less processes to run, it's not that bad from a time prespective.

So parallel configuration could depend on the project size, the number of rules configured, the running intentions of the developer.

The proposal

Have is alternative way of running parallel jobs available through configuration so it's up to the developers to decide what fits best their project and intention.

Random ideas

  • Have it configurable with a fourth boolean parameter $useSeparateProcessPerJob that default to false in ->parallel()
  • Show a warning when true and the number of jobs is high

@TomasVotruba @samsonasik what do you think?
With rectorphp/rector#7806 @keulinho you may be interested by this too

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented Apr 4, 2023

Challenge results: I wrote this PR on Chrome while running Rector from PhpStorm 😎 🎉

@samsonasik
Copy link
Copy Markdown
Member

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented Apr 4, 2023

Could you fix the phsptan notice https://github.com/rectorphp/rector-src/actions/runs/4613330600/jobs/8155197723?pr=3563?

@samsonasik not sure it's worth it for now, as this is a proof of concept, it will not be merged as is for sure, and the implementation will completely change for sure if the idea gets accepted 🙂
I'd prefer to first validate the idea with you maintainers

@samsonasik
Copy link
Copy Markdown
Member

when using default $rectorConfig->parallel(), it should be similar at least equal fast, if it require changing job size, then it will be bad for DX.

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented Apr 5, 2023

when using default $rectorConfig->parallel(), it should be similar at least equal fast, if it require changing job size, then it will be bad for DX.

Of course! I meant a fourth parameter $useSeparateProcessPerJob that default to false to keep actual behavior.
Updated the description to mention that.
Users will have to opt-in for it to use the alternative

@keulinho
Copy link
Copy Markdown
Contributor

keulinho commented Apr 5, 2023

Wow really cool 💙

When i wasn't able to run rector on our project in one go (after my performance and memory fixes), i thought about the same approach to keep the memory consumption stable in runs on larger codebases, but didn't have the time to actually try that out. So really cool that you had that idea as well and were able to get it working 👍

It has a cost to start processes!

It's not only the cost to start a process, but under the hood rector and phpstan also use a lot of internal caching and memoization, and with each new process those caches are empty again, that's why it's all about finding the balance between reusing a process or starting a new one.

My initial idea was to not tie the behaviour to the number of files per job, but to configure a maxMemoryPerProcess in the parallel() configuration, so that after each job was handled it's checked how much memory the process already consumed, if it is over the configured threshold the process will be killed and a new one will be spawned. In theory this should limit starting of new processes when it's only necessary (given that you configure a sensible max memory limit)

So in the case of a machine with 12 cores and 16 GiB memory you can configure a maxNumberOfProcesses of 12 and maxMemoryPerProcess of 1G.

$rectorConfig->parallel(maxNumberOfProcesses: 12, maxMemoryPerProcess: '1G', jobSize: 20);

By default you could set maxMemoryPerProcess to 0, meaning the feature is basically off and it works like before, so you have to actively have to opt-in.

What do you think?
From glimpsing at the code i have no idea how easy or hard it would be to implement, but maybe you can give that idea a try and see how it performs?

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented Apr 5, 2023

Thank you for your kind words!

What do you think?

To be honest I didn't thought about this idea!
So good to have another alternative in the discussion! 👍🏼

My first impression: usually when you set a max memory limit, it's because you want the memory usage to never go above.
If I understood your idea correctly, each process will actually be allowed to go above it for its last processed file.
An edge case could be that this file be a very big file.
So a bit confusing to me as is.

Maybe it's just a matter of the parameter wording to avoid this confusion. $newProcessThreshold maybe?

Next point is that we will need a userland memory usage management, is this a good thing? no strong opinion. me I know about the Symfony Stopwatch component that could be used

Finally, your idea looks like changing the concept of "job": the current is file list based, this new is memory usage based

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented Apr 5, 2023

@TomasVotruba @samsonasik to mention a real business case for being able to run Rector on a big project in one go: be able to set up Rector as a automated code review tool on an existing project.
On CI machines it breaks too, so impossible complete a first run that will build the cache for following PRs (that will run on changed files only so it's OK)

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented Apr 12, 2023

@TomasVotruba @samsonasik any feedback on the approach here?

@TomasVotruba
Copy link
Copy Markdown
Member

Hi @yguedidi ,

thanks for this feature proposal! There is quite a long discussion to follow, so only share my 2 cents: we mimic the PHPStan parallelism, as it has to match its workflow and is easier to maintain in case of PHPStan architecture changes in past and the future.

Before going into code, I'd like to ask you 3 things:

  • make the CI pass, so we can test it in case of merge without any further reviews, delays
  • could you share 2-3 bullet points bottle-necks this solves
  • what are the before/after numbers on one exact project?

Thank you

@yguedidi
Copy link
Copy Markdown
Contributor Author

make the CI pass, so we can test it in case of merge without any further reviews, delays

I'll make it pass if this alternative way of running in parallel get accepted.
Let's assume it's accepted, how would be the best way to configure Rector to use this alternative instead of the actual one that mimic PHPStan? so I can finalize the PR
I would have thought about configuring it with a new config function in ParallelConfig from #3602, but it got rejected.
A fourth boolean parameter $useSeparateProcessPerJob in ->parallel() as suggested in the description maybe?
There is of course no goal to remove the current way of doing parallelism like PHPStan!

could you share 2-3 bullet points bottle-necks this solves

from description:

  • no more excessive usage of memory
  • allow to run Rector fully in one go on big projects

what are the before/after numbers on one exact project?

will depend on the parallel configuration, so with the following one ->parallel(300, 3, 200) on rector-src:

  • on main, each persistent process exceeded the 2G (between 2G and 2.5G) of memory consumption, with a total execution time of ~4m 30s
  • on this branch, memory consumption remain stable with less than 1.5G per process, at the cost of moving to ~5m 10s
    of execution time

with job size of 100 (->parallel(300, 3, 100)):

  • on main, same memory performance wit execution time ~4m 20s (-10s from previous, less communication with the main process)
  • on this branch, each process doesn't exceed 1G with execution time ~5m 30s (+20s from previous, more processes to create)

This alternative way of parallelism better fit big projects configured with a high job size

@keulinho
Copy link
Copy Markdown
Contributor

👍 from my side for this proposal, it basically solves the issues described here rectorphp/rector#7806 as well.

@yguedidi
Copy link
Copy Markdown
Contributor Author

@TomasVotruba would you accept this if I finalize it by making the alternative way configurable with a 4th optional parameter $useSeparateProcesses = false to ->parallel()?
It may help in rectorphp/rector#7891 :)

@Flyingmana
Copy link
Copy Markdown
Contributor

Have it configurable with a fourth boolean parameter $useSeparateProcessPerJob that default to false in ->parallel()

instead of having one more parameter I would choose something more complex with more flexibility.
Working with parallelization and configurable restart conditions gets more complex fast.
As soon as you have it, there will come desire for restart on reaching a memory treshold (as was already suggested, but please dont call it a limit for this case, if its not setting an actual php limit for the process).
Not uncommon is also having a restart after $n iterations.

I tested the PR with one of the smaller more clean Code Paths I have.
Number of Files is 3114.

- Run A Run B
version last release last release this PR
parallelization 1 3 3
Time Real 2m 17s 1m 15s 12m 58s
Time User 2m 02s 2m 43s 32m 23s
totalmemory max ~4G ~6G ~0.5G

Memory usage is more estimated by looking into total memory usage (while also other apps are running)

So in my case, this PR really has an immense benefit on memory consumption, but it also looses a lot of efficiency by discarding everything loaded to often.

I was not able to figure out the added code on the first look. If you provide me a place for a simple counter, so it only restarts after 500 parsed files, I would like to do another test run

@yguedidi
Copy link
Copy Markdown
Contributor Author

@Flyingmana I agree with you! I proposed #3602 few days back to prepare for potential increase of parallelism ocnfiguration options, but it way rejected

@yguedidi yguedidi force-pushed the improve-parallel branch 3 times, most recently from 525b3b9 to 01d03bb Compare May 1, 2023 16:34
@yguedidi yguedidi changed the title [PoC] An alternative way to do parallelism Allow to use a dedicated process for each job May 1, 2023
@yguedidi yguedidi force-pushed the improve-parallel branch from 01d03bb to fc36aa1 Compare May 1, 2023 16:49
@yguedidi yguedidi marked this pull request as ready for review May 1, 2023 16:58
@yguedidi yguedidi requested a review from TomasVotruba as a code owner May 1, 2023 16:58
@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented May 1, 2023

@TomasVotruba @samsonasik this is now ready for review! the new behavior is now configurable, disabled by default.
@keulinho @Flyingmana if you want to have a look too :)

If this get accepted I plan to work on a following PR to improve DX by warning when the provided configuration lead to lot of processes to be created, impacting the performance a lot, as you experienced @Flyingmana.
Just need to find the sane threshold for the warning.

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented May 1, 2023

@Flyingmana to mitigate the performance impact, when using a separate process per job, I increase the job size in the parallel configuration too, to have less processes to be created, and adapt the timeout accordingly, for example ->parallel(1200, 16, 200) (10 times the defaults)

@yguedidi
Copy link
Copy Markdown
Contributor Author

yguedidi commented May 1, 2023

@TomasVotruba @samsonasik better be reviewed commit by commit, I did my best to integrate as much as possible the new logic to the existing code, to minimize the diff

Comment on lines +119 to +123
$jobs = $jobByProcessIdentifier;
$jobIdentifiers = array_keys($jobByProcessIdentifier);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when use separate process per job is enabled, the $jobs will become persistent with each job having as index the process identifiers.
the new $jobIdentifiers will be the new stack from which each process will pop the next one to run on exit

}

if ($jobs === []) {
if ($useSeparateProcessPerJob || $jobs === []) {
Copy link
Copy Markdown
Contributor Author

@yguedidi yguedidi May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when use separate process per job is enabled, always quit the actual process execution after it's done, don't take another job to work on

Comment on lines +186 to +215
if (! $useSeparateProcessPerJob) {
return;
}

if ($jobIdentifiers === []) {
return;
}

$nextProcessIdentifier = array_pop($jobIdentifiers);
$nextParallelProcess = $this->processPool->getProcess($nextProcessIdentifier);

$onDataCallable = $onDataCallableProvider($nextProcessIdentifier, $nextParallelProcess);
$onExitCallable = $onExitCallableProvider($nextProcessIdentifier);

$nextParallelProcess->start(
// 1. callable on data
$onDataCallable,

// 2. callable on error
$handleErrorCallable,

// 3. callable on exit
$onExitCallable
);
Copy link
Copy Markdown
Contributor Author

@yguedidi yguedidi May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when use separate process per job is enabled, here is the magic: start the next available job by its process identifier when the current one get exited

@yguedidi yguedidi force-pushed the improve-parallel branch from fc36aa1 to de3ae27 Compare May 1, 2023 19:40
): array {
$jobs = array_reverse($schedule->getJobs());
$jobIdentifiers = [];
$jobProcesses = [];
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In last force-push diff I had to introduce this because I found an issue when a process timeout, $this->processPool->quitAll(); is called, and it breaks when having attached processes that are not started yet.
So I can't use the process pool to store the job processes

@TomasVotruba
Copy link
Copy Markdown
Member

I'm going through this, but I'm unable to review/accept as there are many concepts changed at once.

Seeing #3563 (comment), it also seems the performance got 6x times slower, which is a no-go.

On the other hand, the series of well targeted and verified improvements are now released in Rector 0.16.
Please, check those PRs as inspiration for further PRs, that are good example for review and merge 👍

Thanks for understanding

@Flyingmana
Copy link
Copy Markdown
Contributor

@TomasVotruba the draft at this time did have this performance penalty, but with the later described increased jobsize it should be in theory get to similar results as my approach.

I had now time to do a proper retest with some open source code as example

Retest

Setup:
WSL2, Linux, php 8.1
cpu: physical cores 6, logical 12
Ram: 32 GB (25GB allowed for WSL2)

used config:

return static function (RectorConfig $rectorConfig): void {
    $rectorConfig->paths([
        __DIR__ . '/src',
        __DIR__ . '/../symfony',
    ]);

    // usual config
    //$rectorConfig->parallel(120*30,12,200);
    // config for yguedidi
    $rectorConfig->parallel(120*30,12,200, true);

    $rectorConfig->rule(\Rector\Php74\Rector\Ternary\ParenthesizeNestedTernaryRector::class);
};

On a current symfony/symfony checkout, 9629 files (without installed vendor)
last release: rector 1.6.0

- Run A Run B Run C Run D Run E Run F Run G Run H
version last release last release last release last release pr-3563 pr-3563 pr-3722 pr-3722
special commands --no-diffs --no-progress-bar
parallelization 16 6 6 12 12 12 12 12
job size 20 20 20 500 500 200 20 10
files per restart X X X x x x 200 100
Time Real 02m 28s 02m 04s 02m 06s 01m 40s 01m 54s 02m 12s 01m 55s 02m 21s
Time User 26m 49s 11m 48s 12m 02s 14m 44s 17m 28s 22m 39s 21m 24s 26m 12s
process memory 2G 2G ~ 4G 2G ~ 4G 0.6G ~ 3G 0.3G ~ 3G 0.2G ~ 2.8G 0.2G ~ 1.8G 0.2G - 0.7G
totalmemory max ~24G ~18G ~18G ~14G ~9G 5GB 5G 4G

So the performance degradation is not there anymore for a big enough job size.
The total memory usage is still improving a lot compared to the latest release, also the increased jobsize seems to have little difference compared with my PR #3722 which instead did go after limiting by itereations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants