Speed up `GPSampler` by Batching Acquisition Function Evaluations by Kaichi-Irie · Pull Request #6268 · optuna/optuna

Kaichi-Irie · 2025-09-08T01:40:21Z

Motivation

The multi-start optimization of the acquisition function in the GPSampler is a significant performance bottleneck, accounting for 50-60% of its total execution time. The current implementation uses 10 initial points (n_local_search=10) and performs optimization from each point sequentially.

This pull request aims to significantly speed up this multi-start optimization by batching the evaluation of the acquisition function and its gradient. Benchmark results show that this change can reduce the execution time by approximately 40-50% without compromising the optimization accuracy of the objective function.

Description of the Changes

Previous Sequential Approach

The previous implementation runs an optimization loop for each starting point individually.

# Previous pseudo-code
def multistart_optimize(acqf, x0_list, ...):
    ...
    min_fvals = ...
    min_xs = ...
    for i, x0 in enumerate(x0_list):
        # optimize() performs iterative optimization for a single initial point x0
        min_fval, min_x = optimize(acqf, x0, ...)
        min_fvals[i] = min_fval
        min_xs[i] = min_x
    best_idx = np.argmin(min_fvals)
    return min_xs[best_idx], min_fvals[best_idx]

Proposed Batched Approach

The proposed approach executes each optimization step for all initial points in a single batch.

Evaluations of the acquisition function (acqf) and its gradient (acqf_grad) are performed at once using vectorized operations on torch.Tensor. The iterative optimization from each starting point is managed cooperatively using greenlet. This enables a cooperative, iterative workflow. In each iteration, the function values for all starting points are evaluated as a single batch. Subsequently, each optimization is advanced by one step to determine the next set of points to be evaluated, which then forms the input for the following batch evaluation cycle.

# New pseudo-code
def multistart_optimize(acqf, x0_list, ...):
    ...
    # optimize_batched manages optimization for multiple initial points x0_list
    min_xs, min_fvals = optimize_batched(acqf, x0_list, ...)
    best_idx = np.argmin(min_fvals)
    return min_xs[best_idx], min_fvals[best_idx]

For backward compatibility, a fallback mechanism has been implemented. If greenlet fails to import, the logic will revert to the original sequential execution.

Dependencies

This change utilizes greenlet, which is already included as a dependency of sqlalchemy, a core Optuna dependency. Therefore, this PR does not introduce any new library dependencies.

Performance Evaluation (Benchmark)

To validate the effectiveness of the proposed method, we compared the performance in the following three modes:

This PR (Batched): The implementation in this PR with batch evaluation enabled.
original: The current implementation on the master branch.
This PR (Fallback): The implementation in this PR with batch evaluation intentionally disabled to activate the sequential fallback logic. This helps confirm that the new implementation's overhead is acceptable and that the implementation is functioning correctly.

Experimental Setup

CPU: Apple M4 (10-core)
Trials: 300 trials per study
Seed: 42, 43, 44 (results are averaged over three runs)
Objective Functions:
- rotated_ellipsoid (10 dimensions)
- f6 from optunahub/benchmarks/bbob (20 dimensions)
- wfg (2 objectives, 3 dimensions, k=1) from optunahub/benchmarks/wfg

Results

Objective Function (Dimension)	Mode	Avg. Execution Time (s)	Avg. Best Objective Value
`f6` (D=20)	This PR (Batched)	19.0	133.2
`f6` (D=20)	`original`	35.7	155.9
`f6` (D=20)	`This PR (Fallback)`	31.9	159.2
`rotated_ellipsoid` (D=10)	This PR (Batched)	19.4	3.64e-4
`rotated_ellipsoid` (D=10)	`original`	39.6	3.75e-4
`rotated_ellipsoid` (D=10)	`This PR (Fallback)`	34.0	2.28e-4
`wfg` (D=3)	This PR (Batched)	35.3	-
`wfg` (D=3)	`original`	74.4	-
`wfg` (D=3)	`This PR (Fallback)`	66.9	-

The following plot show that the hypervolume values remain consistent across all modes, confirming that the change does not negatively impact search performance.

Analysis

The implementation with batched evaluation is approximately 1.8x to 2.0x faster than the current implementation.
The best objective values obtained are comparable across all modes. The observed variations are likely due to changes in the order of floating-point operations and do not indicate a degradation in the sampler's search performance.
The fallback mode is as fast as the original implementation, suggesting that the structural overhead of the proposed changes is minimal.

Objective Value Progression:

`f6` (D=20, seed=42)

The following tables show that the objective values remain consistent across all modes, confirming that the change does not negatively impact search performance.

mode \ Trial	Trial 9	Trial 10	Trial 11	Trial 12	Trial 13	Trial 14
This PR (Batched)	1193964.5576430012	435258.7193614113	429479.3338986842	328199.63425374887	541828.1312516633	571560.8492535071
This PR (Fallback)	1193964.5576430012	435258.7193614113	429479.33389868634	328199.63425374887	541828.1312516204	571560.8492535099
original	1193964.5576430012	435258.7193614113	429479.3338986842	328199.6342537497	541828.1312516493	571560.8492535143

`rotated_ellipsoid` (D=10, seed=42)

mode \ Trial	Trial 9	Trial 10	Trial 11	Trial 12	Trial 13	Trial 14
This PR (Batched)	12.130935341373624	6.497476295603841	9.159600880097619	11.184451388408986	13.400053021276653	10.290407711547488
This PR (Fallback)	12.130935341373624	6.497476295603834	9.159600880097601	11.184451388408993	13.400053021279028	10.290407711541484
original	12.130935341373624	6.497476295603843	9.159600880097617	11.18445138840904	13.40005302128083	10.290407711537135

Benchmark Code

import json
import os
from itertools import product

import numpy as np
import optunahub

import optuna

np.random.seed(42)

NUM_CONTINUOUS_PARAMS = 10
ROTATION_MATRIX, _ = np.linalg.qr(np.random.rand(NUM_CONTINUOUS_PARAMS, NUM_CONTINUOUS_PARAMS))

CONDITIONING = 10
WEIGHTS = np.array([CONDITIONING**(i / (NUM_CONTINUOUS_PARAMS - 1)) for i in range(NUM_CONTINUOUS_PARAMS)])

def rotated_ellipsoid(trial):
    xs = np.array([trial.suggest_float(f"x_{i}", -1, 1) for i in range(NUM_CONTINUOUS_PARAMS)])
    rotated_xs = ROTATION_MATRIX @ xs
    return np.sum(WEIGHTS * (rotated_xs**2))

bbob = optunahub.load_module("benchmarks/bbob")
f6 = bbob.Problem(function_id=6, dimension=20, instance_id=1)

def execute_benchmark(
    mode: str,
    objective_type: str,
    n_trials: int,
    seed: int,
    results_file="results.jsonl",
    output_dir="output",
):
    sampler = optuna.samplers.GPSampler(seed=seed)
    name = f"{objective_type}_{seed}_{mode}_{n_trials}"
    log_file = f"{name}.jsonl"
    study = optuna.create_study(study_name=name, sampler=sampler)

    study.optimize(func=f6 if objective_type == "f6" else rotated_ellipsoid, n_trials=n_trials)
    print(study.best_trial.params, study.best_trial.value)
    elapsed = (study.trials[-1].datetime_complete - study.trials[0].datetime_start).total_seconds() # type: ignore
    print(f"{mode} took {elapsed:f} seconds. ")

    result = {
        "objective_type": objective_type,
        "seed": seed,
        "mode": mode,
        "elapsed": round(elapsed, 2),
        "n_trials": n_trials,
        "best_value": study.best_trial.value,
    }

    with open(os.path.join(output_dir, results_file), "a") as f:
        f.write(json.dumps(result) + "\n")

seeds = [42,43,44]
n_trials = 300

# To test different modes, you need to edit the source code of the GPSampler directly.
# This script runs with the batched evaluation enabled by default.
modes: list[str] = [
    "This PR (Batched)",
    # "This PR (Fallback)",
    # "original",
]

objective_types = [
    "rotated_ellipsoid",
    "f6",
]

for seed, mode, objective_type in product(seeds, modes, objective_types):
    execute_benchmark(
        mode=mode,
        objective_type=objective_type,
        n_trials=n_trials,
        seed=seed,
    )

Tests

Unit tests must be added.

…rete optimization

…etter handling of optimization results

…ches and update parameter checks

…e/optuna into add-batched-acqf-eval

…n functions

…eval

…or clarity

…y version handling and fallback

…reamline fallback logic for sequential optimization

…ganization

…fgsb call

…b.com/Kaichi-Irie/optuna into add-batched-acqf-eval-greenlet-for-PR

… in handling parameters

…e type checking issues

…encies

not522

Thank you for your PR! Could you check my small comments?

optuna/_gp/optim_mixed.py

optuna/_gp/batched_lbfgsb.py

Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>

pyproject.toml

Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>

not522

Thank you for your update. LGTM!
I confirmed the speed improvement in my environment.

…sb function

nabenabe0928 · 2025-09-11T03:55:18Z

optuna/_gp/optim_mixed.py

+        normalized_params[: len(scaled_x), continuous_indices] = scaled_x * lengthscales
+        x_tensor = torch.from_numpy(normalized_params[: len(scaled_x), :]).requires_grad_(True)


This slicing is indeed incorrect when we have (a) discrete parameter(s).
For example, think about the case where normalized_params = np.array([1.0, 0.0], [0.0, 1.0]) and normalized_params[:, 1] is the discrete dimension.
Let's say the first batch converged and the second batch is still in progress, then the current slicing uses the discrete parameter value of 0.0, which is indeed from the first batch.
But we would like to use the discrete parameter from the second batch, meaning that we need to have the unconverged batch indices from the batched_l_bfgs_b so that we can correctly assign the discrete parameter values.

Thanks for pointing out the bug! You are right.
I added batch indices as an argument to the fun_and_grad function, as well as the corresponding code. This change appears to be working; the runtime and accuracy seem fine.

codecov · 2025-09-11T05:14:47Z

Codecov Report

❌ Patch coverage is 96.36364% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.15%. Comparing base (57fade1) to head (6624800).
⚠️ Report is 536 commits behind head on master.

Files with missing lines	Patch %	Lines
optuna/_gp/batched_lbfgsb.py	96.29%	2 Missing ⚠️
optuna/_gp/optim_mixed.py	96.42%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6268      +/-   ##
==========================================
- Coverage   89.24%   89.15%   -0.10%     
==========================================
  Files         208      209       +1     
  Lines       13821    13908      +87     
==========================================
+ Hits        12334    12399      +65     
- Misses       1487     1509      +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nabenabe0928

LGTM, let's work on followups in (an)other PR(s)

Kaichi-Irie added 30 commits August 29, 2025 16:44

Refactor by Ruff

84167cf

[WIP] Add batched local search implementation and integrate with disc…

34f401b

…rete optimization

Add SciPy version check and refactor batched optimization functions

e73491b

Implement batched local search for discrete parameters

17fb556

Refactor and remove unused functions from optimization module

55af0ac

Add unconverged_batch_indices to function args of batched_lbfgsb

cff1e45

Add batched discrete local search

ffc3bcb

Add batched update management logic in _gradient_ascent_batched for b…

55777b4

…etter handling of optimization results

Add parameter update handling and convergence checks

b83f7d0

Improve convergence handling and parameter updates

342d6cf

Refactor gradient ascent logic to improve handling of unconverged bat…

843bfcc

…ches and update parameter checks

Merge branch 'add-batched-acqf-eval' of https://github.com/Kaichi-Iri…

f6bd7ec

…e/optuna into add-batched-acqf-eval

Refactor import statements and clean up redundant code in optimizatio…

e4fc744

…n functions

Merge remote-tracking branch 'upstream/master' into add-batched-acqf-…

e4b4abd

…eval

Update SciPy version check using packaing module

98c5c1f

Rename variable unconverged_batch_indices to is_converged_batch f…

31df33c

…or clarity

Add TODO to rename unconverged_batch_indices for clarity

ab63dc1

Rename variable updated_batched to is_updated_batch for clarity

aada677

Refactor optimization logic to use batched_lbfgsb and streamline SciP…

bc76c89

…y version handling and fallback

Refactor batched_lbfgsb function to improve structure and clarity; st…

856fcd4

…reamline fallback logic for sequential optimization

Rename key in output dictionary from nit to n_iterations for clarity

5077e9e

Implement greenlet version of batched acqf eval

b6afc96

Refactor imports in batched_lbfgsb and optim_mixed for clarity and or…

6f26565

…ganization

Add fallback logic for sequential optimization

80f1957

Remove TODO comment to rename unconverged_batch_indices for clarity

b792b52

Refactor

7478d85

Refactor optimize_acqf_mixed to simplify best value selection logic

c0ae4bd

Fix default value for bounds parameter in _batched_lbfgsb function

efa3794

Fix batch indices handling in batched_lbfgsb function

08bac9a

Refactor _gradient_ascent_batched to inline parameters for batched_lb…

50a556b

…fgsb call

Kaichi-Irie added 9 commits September 9, 2025 15:13

Merge branch 'add-batched-acqf-eval-greenlet-for-PR' of https://githu…

a876e39

…b.com/Kaichi-Irie/optuna into add-batched-acqf-eval-greenlet-for-PR

Refactor local_search_mixed_batched to improve clarity and efficiency…

298e751

… in handling parameters

Fix formatting in _discrete_line_search for consistency

5360329

Add type ignore comment for func_and_grad in batched_lbfgsb to resolv…

49d6e43

…e type checking issues

Add curly braces for older Python versions (e.g. Python 3.8, 3.9)

e7886ee

Fix bug in local_search_mixed_batched

fe72899

Add greenlet to optional dependencies for _gp module

99df803

Update greenlet dependency description for clarity in optional depend…

34af1d6

…encies

FIx bug of convergence checks in local_search_mixed_batched

e9fdec2

not522 reviewed Sep 10, 2025

View reviewed changes

optuna/_gp/optim_mixed.py Outdated Show resolved Hide resolved

optuna/_gp/batched_lbfgsb.py Outdated Show resolved Hide resolved

Kaichi-Irie and others added 2 commits September 10, 2025 14:01

Update optuna/_gp/batched_lbfgsb.py

c7fd439

Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>

Update optuna/_gp/optim_mixed.py

2487f56

Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>

not522 reviewed Sep 10, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Update pyproject.toml

29c50fb

Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>

not522 approved these changes Sep 10, 2025

View reviewed changes

not522 changed the title ~~Speed up GPSampler by Batching Acquisition Function Evaluations~~ Speed up GPSampler by Batching Acquisition Function Evaluations Sep 10, 2025

not522 removed their assignment Sep 10, 2025

Refactor greenlet import handling and improve clarity in batched_lbfg…

8d82ba5

…sb function

nabenabe0928 reviewed Sep 11, 2025

View reviewed changes

Fix bug of batch indices

6624800

nabenabe0928 reviewed Sep 11, 2025

View reviewed changes

nabenabe0928 approved these changes Sep 11, 2025

View reviewed changes

nabenabe0928 merged commit 5eebff9 into optuna:master Sep 11, 2025
26 checks passed

nabenabe0928 unassigned contramundum53 Sep 11, 2025

This was referenced Sep 11, 2025

Refactor optim_mixed.py, acqf.py, and batched_lbfgsb.py #6271

Closed

Batched acqf eval greenlet unittest #6272

Closed

Add unit tests for batched L-BFGS-B #6274

Merged

nabenabe0928 mentioned this pull request Oct 16, 2025

Developments of Gaussian-Process Based Bayesian Optimization (GPSampler) nabenabe0928/my-skills#1

Open

Kaichi-Irie mentioned this pull request Nov 19, 2025

Update GPSampler documentation to include D-BE optimization details #6347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up `GPSampler` by Batching Acquisition Function Evaluations#6268

Speed up `GPSampler` by Batching Acquisition Function Evaluations#6268
nabenabe0928 merged 71 commits intooptuna:masterfrom
Kaichi-Irie:add-batched-acqf-eval-greenlet-for-PR

Kaichi-Irie commented Sep 8, 2025 •

edited

Loading

Uh oh!

not522 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

not522 left a comment

Uh oh!

nabenabe0928 Sep 11, 2025

Uh oh!

Kaichi-Irie Sep 11, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

nabenabe0928 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		normalized_params[: len(scaled_x), continuous_indices] = scaled_x * lengthscales
		x_tensor = torch.from_numpy(normalized_params[: len(scaled_x), :]).requires_grad_(True)

Uh oh!

Conversation

Kaichi-Irie commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description of the Changes

Previous Sequential Approach

Proposed Batched Approach

Dependencies

Performance Evaluation (Benchmark)

Experimental Setup

Results

Analysis

Objective Value Progression:

f6 (D=20, seed=42)

rotated_ellipsoid (D=10, seed=42)

Tests

Uh oh!

not522 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

not522 left a comment

Choose a reason for hiding this comment

Uh oh!

nabenabe0928 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Kaichi-Irie Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nabenabe0928 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kaichi-Irie commented Sep 8, 2025 •

edited

Loading

`f6` (D=20, seed=42)

`rotated_ellipsoid` (D=10, seed=42)

Kaichi-Irie Sep 11, 2025 •

edited

Loading

codecov bot commented Sep 11, 2025 •

edited

Loading