Handle slowdown of GPSampler due to L-BFGS in SciPy v1.15#6191
Handle slowdown of GPSampler due to L-BFGS in SciPy v1.15#6191nabenabe0928 merged 9 commits intooptuna:masterfrom
GPSampler due to L-BFGS in SciPy v1.15#6191Conversation
|
@kAIto47802 |
b1976ce to
0f7dc17
Compare
GPSampler due to L-BFGS in SciPy v1.15
|
Sure! I’ve conducted benchmarks, updating the comment. |
|
@fusawa-yugo Could you review this PR? |
nabenabe0928
left a comment
There was a problem hiding this comment.
@kAIto47802
Thank you for the benchmarking experiments!
I confirmed that the slowdown caused by a single thread is very limited:)
|
Hello, am I understand correctly the OpenBLAS thread limit resolves the issue? Still not nice to have this requirement and apologies for this strange issue hitting your workflows. I am also a bit confused why this is happening but I just want to make sure that this is 100% OpenBLAS thread behavior not some silly mistake I made in the translations. |
|
@ilayn The problem probably comes from the instabilities in our optimization. Indeed, our optimization using Up to now, we observed slowdowns when we get huge gradients (> 2**32) and the precision handling depends on OS. |
|
Thank you for your kind words. When I am in the thick of it, I can't see all the possible repercussions of this type of chage hence it is a lot of learning for me next to the straightforward translation work. Regarding the large gradients and other details, I am trying to figure out whether some kind of a equilibriation step can help the conditioning of the problems both in L-BFGS-B and SLSQP. These are as you know very archaic algorithms written at a time where things were very restricted (late 80s). Now we are pushing them to places outside their comfort zone hence each step brings in another display of their shortcomings. So if there is a local optimization expert and knows what needs to be done, I'll be happy to implement those changes. I don't find too much time to go back to the paper-reading mode unfortunately. Regarding why this is now happening is because the original versions had their LAPACK copies baked in as additional code. Hence they did not have library dependencies but self-contained. Now I took out those parts and linked to the SciPy BLAS/LAPACK mechanism (be it MKL or OpenBLAS or else), and folks started to hit the shortcomings of these numerical libraries. That's why my response to these problems are slower than usual because I don't quite know where to look yet and will try to understand why this might be happening. Maybe it's a simple answer because libc is doing something that Fortran does not. Or something is lurking in OpenBLAS. Not sure yet. You might give it a go with MKL if you have the time and the patience, through a conda flow. |
|
Hey @gen740 , could you review this PR? |
|
LGTM! Thank you for your PR!! |
|
Apologies for this request but could any of you folks report whether you have improved or worsened performance also with We are trying to get to the bottom of it so that you don't need to jump hoops like this but it is quite difficult to replicate this behavior hence my request. |
|
I am checking right now and will get back to you once the results are out. import optuna
def objective(trial: optuna.Trial) -> float:
return sum((trial.suggest_float("x", -5, 5) - 2)**2 for i in range(10))
sampler = optuna.samplers.GPSampler(seed=0)
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=30)
print((study.trials[-1].datetime_complete - study.trials[0].datetime_start).total_seconds()) |
|
@ilayn
|
|
Perfect thank you very much. Also Ralf posted some instructions in the SciPy issue. I think this is because of a bad interaction of PyTorch and SciPy. |
|
This pull request has not seen any recent activity. |
|
@kAIto47802 Could you resolve the conflicts? |
Motivation
This PR is related to:
scipy.optimize.minimizewithL-BFGS-Bsincescipy>=1.15.0on Ubuntu scipy/scipy#23191.It appears that L-BFGS-B in
scipy.optimize.minimizehas become slower in some cases since SciPy version 1.15.0.This affects the performance of Optuna's GPSampler, which uses it for kernel fitting.
According to the issue above, setting the
OPENBLAS_NUM_THREADSenvironment variable to 1 mitigates this slowdown.Description of the changes
OPENBLAS_NUM_THREADSto 1 whenscipy>=1.15.0.Benchmarks
I benchmark the original slowdown and the update introduced in this PR.
Benchmarking Settings
Objective functions. I use the following functions available in OptunaHub:
Implementation details. I use dimensions 5 and 20, with 100 trials each. Each experiment is run with 5 different random seeds. The benchmarking codes and the visualization codes I use are as follows:
Benchmarking code
Visualization code
Environment. The benchmarking is done on a machine running Arch Linux with Intel® Core™ i9-14900HX processor (24 cores, 32 threads, up to 5.8GHz), and Python 3.11.0.
Results
Figure 1 and Figure 2 show the results for the elapsed time at the end of each trial and the best objective values, respectively.
Figure 1 shows that GPSampler becomes substantially slower in SciPy v1.15.0. The patch introduced in this PR mitigates the slowdown. This patch also slightly speeds up the original version with SciPy v1.14.0 in some cases.
Figure 2 confirms that neither the version change nor the modifications introduced in this PR affect the final objective values.
(a) function 2, dimension 5
(b) function 2, dimension 20
(c) function 3, dimension 5
(d) function 3, dimension 20
(e) function 16, dimension 5
(f) function 16, dimension 20
(g) function 17, dimension 5
(h) function 17, dimension 20
(i) function 20, dimension 5
(j) function 20, dimension 20
(k) function 20, dimension 5
(l) function 20, dimension 20
Figure 1. The elapsed time at the end of each trial. The solid lines denote the mean, and the translucent areas denote the standard error, both computed over five independent runs with different random seeds.
(a) function 2, dimension 5
(b) function 2, dimension 20
(c) function 3, dimension 5
(d) function 3, dimension 20
(e) function 16, dimension 5
(f) function 16, dimension 20
(g) function 17, dimension 5
(h) function 17, dimension 20
(i) function 20, dimension 5
(j) function 20, dimension 20
(k) function 22, dimension 5
(l) function 22, dimension 20
Figure 2. The best objective values. The solid lines denote the mean, and the translucent areas denote the standard error, both computed over five independent runs with different random seeds.