Speed up GPSampler by Batching Acquisition Function Evaluations#6268
Conversation
…rete optimization
…etter handling of optimization results
…ches and update parameter checks
…e/optuna into add-batched-acqf-eval
…y version handling and fallback
…reamline fallback logic for sequential optimization
…b.com/Kaichi-Irie/optuna into add-batched-acqf-eval-greenlet-for-PR
… in handling parameters
…e type checking issues
not522
left a comment
There was a problem hiding this comment.
Thank you for your PR! Could you check my small comments?
Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>
Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>
Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>
not522
left a comment
There was a problem hiding this comment.
Thank you for your update. LGTM!
I confirmed the speed improvement in my environment.
GPSampler by Batching Acquisition Function Evaluations
optuna/_gp/optim_mixed.py
Outdated
| normalized_params[: len(scaled_x), continuous_indices] = scaled_x * lengthscales | ||
| x_tensor = torch.from_numpy(normalized_params[: len(scaled_x), :]).requires_grad_(True) |
There was a problem hiding this comment.
This slicing is indeed incorrect when we have (a) discrete parameter(s).
For example, think about the case where normalized_params = np.array([1.0, 0.0], [0.0, 1.0]) and normalized_params[:, 1] is the discrete dimension.
Let's say the first batch converged and the second batch is still in progress, then the current slicing uses the discrete parameter value of 0.0, which is indeed from the first batch.
But we would like to use the discrete parameter from the second batch, meaning that we need to have the unconverged batch indices from the batched_l_bfgs_b so that we can correctly assign the discrete parameter values.
There was a problem hiding this comment.
Thanks for pointing out the bug! You are right.
I added batch indices as an argument to the fun_and_grad function, as well as the corresponding code. This change appears to be working; the runtime and accuracy seem fine.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6268 +/- ##
==========================================
- Coverage 89.24% 89.15% -0.10%
==========================================
Files 208 209 +1
Lines 13821 13908 +87
==========================================
+ Hits 12334 12399 +65
- Misses 1487 1509 +22 ☔ View full report in Codecov by Sentry. |
nabenabe0928
left a comment
There was a problem hiding this comment.
LGTM, let's work on followups in (an)other PR(s)
Motivation
The multi-start optimization of the acquisition function in the
GPSampleris a significant performance bottleneck, accounting for 50-60% of its total execution time. The current implementation uses 10 initial points (n_local_search=10) and performs optimization from each point sequentially.This pull request aims to significantly speed up this multi-start optimization by batching the evaluation of the acquisition function and its gradient. Benchmark results show that this change can reduce the execution time by approximately 40-50% without compromising the optimization accuracy of the objective function.
Description of the Changes
Previous Sequential Approach
The previous implementation runs an optimization loop for each starting point individually.
Proposed Batched Approach
The proposed approach executes each optimization step for all initial points in a single batch.
Evaluations of the acquisition function (
acqf) and its gradient (acqf_grad) are performed at once using vectorized operations ontorch.Tensor. The iterative optimization from each starting point is managed cooperatively usinggreenlet. This enables a cooperative, iterative workflow. In each iteration, the function values for all starting points are evaluated as a single batch. Subsequently, each optimization is advanced by one step to determine the next set of points to be evaluated, which then forms the input for the following batch evaluation cycle.For backward compatibility, a fallback mechanism has been implemented. If
greenletfails to import, the logic will revert to the original sequential execution.Dependencies
This change utilizes
greenlet, which is already included as a dependency ofsqlalchemy, a core Optuna dependency. Therefore, this PR does not introduce any new library dependencies.Performance Evaluation (Benchmark)
To validate the effectiveness of the proposed method, we compared the performance in the following three modes:
This PR (Batched): The implementation in this PR with batch evaluation enabled.original: The current implementation on themasterbranch.This PR (Fallback): The implementation in this PR with batch evaluation intentionally disabled to activate the sequential fallback logic. This helps confirm that the new implementation's overhead is acceptable and that the implementation is functioning correctly.Experimental Setup
rotated_ellipsoid(10 dimensions)f6fromoptunahub/benchmarks/bbob(20 dimensions)wfg(2 objectives, 3 dimensions, k=1) fromoptunahub/benchmarks/wfgResults
f6(D=20)f6(D=20)originalf6(D=20)This PR (Fallback)rotated_ellipsoid(D=10)rotated_ellipsoid(D=10)originalrotated_ellipsoid(D=10)This PR (Fallback)wfg(D=3)wfg(D=3)originalwfg(D=3)This PR (Fallback)The following plot show that the hypervolume values remain consistent across all modes, confirming that the change does not negatively impact search performance.
Analysis
Objective Value Progression:
f6(D=20, seed=42)The following tables show that the objective values remain consistent across all modes, confirming that the change does not negatively impact search performance.
rotated_ellipsoid(D=10, seed=42)Benchmark Code
Tests
Unit tests must be added.