Add constant liar strategy to `GPSampler` by sawa3030 · Pull Request #6430 · optuna/optuna

sawa3030 · 2026-02-10T02:37:41Z

Motivation

This PR adds a constant liar strategy to GPSampler. While some trials are still RUNNING, the sampler assigns them pseudo objective values, which is the best value among completed trials, and includes them in the GP update. This helps batch optimization by reducing the likelihood of proposing the same (or very similar) parameters when multiple evaluations are pending.

At this stage, the feature is limited to single-objective, unconstrained optimization.

Description of the changes

Update GPRegression to incorporate RUNNING trials.
Persist necessary parameters via study._storage.set_trial_system_attr, mirroring TPESampler’s constant liar implementation (see Fix TPESampler with multivariate and constant_liar #6189).

Note

Since this behavior has been requested by many users, this PR resolves Constant Liar strategy for GPSampler #6392 by adding a constant liar strategy.
Instead of adding Monte Carlo–based acquisition functions here, we opted for constant liar as a lightweight solution.

sawa3030 · 2026-02-10T03:19:09Z

Benchmark

Benchmark setup

(This benchmark follows the evaluation procedure used in #2964.)

I compared several different strategy to fix #6392 :

Constant Liar (pseudo values = best / mean / worst)
Kriging Believer
Baseline: current master implementation

Tested settings

Benchmark problems: 5×BBOB + 3×HPOBench
n_trials: [25, 50, 75, 100]
batch_size: [5, 10, 50]
Repeats: 100 independent runs per (problem, n_trials, batch_size, strategy), using different random seeds

Evaluation

For each (benchmark problems, n_trials, batch_size), I compared best_value between every pair of strategies using a one-sided Mann–Whitney U test (α = 0.05).

If strategy i is significantly better than strategy j (j != i), I count it as a “win” for i. Total wins are then aggregated over all (n_trials, batch_size, problems).

NOTE: The benchmark script is iavailable here, The Constant Liar (CL) results were collected using this implementation and the Kriging Believer (KB) results were collected using this implemntation

Results (total wins)

benchmark_type	wins:CL(best)	wins:CL(mean)	wins:CL(worst)	wins:master branch	wins:KB
bbob	153	109	47	12	84
hpo	44	35	18	0	27

NOTE: Most of the wins for the master branch come from BBOB function1. Since this function has a very simple landscape, the sampler must have reaches near-optimal values within the first few trials. After that, continuing to sample locally around the current best tends to produce slightly better values, which likely explains the advantage observed here.

Based on these results, I propose using CL(best) as the default strategy.

This result also aligns with existing research on parallel BO (e.g., Ginsbourger et al., "Kriging is well-suited to parallelize optimization." Computational intelligence in expensive optimization problems. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. 131-162.)

not522 · 2026-02-12T04:01:55Z

@y0z Could you review this PR?

optuna/_gp/search_space.py

kAIto47802 · 2026-02-13T23:16:23Z

optuna/_gp/gp.py

        self.noise_var = self.noise_var.detach()
        self.noise_var.grad = None

+    def append_running_data(self, X_running: torch.Tensor, y_running: torch.Tensor) -> None:


I think we can compute this more efficiently by reusing self._cov_Y_Y_chol, which we already have.
This would reduce the computational cost from the current
　　 $$O((\text{n\_train} + \text{n\_running})^3)$$
　　 $$= O(\text{n\_train}^3 + 3\,\text{n\_train}^2\:\text{n\_running} + 3\,\text{n\_train}\:\text{n\_running}^2 + \text{n\_running}^3),$$
to
　　 $$O(\text{n\_train}^2\:\text{n\_running} + \text{n\_train}\:\text{n\_running}^2 + \text{n\_running}^3).$$
Considering the fact that the $\text{n\_train}$ is the dominant size ($\text{n\_train} >> \text{n\_running}$), the reducing the order of $\text{n\_train}$ from 3 to 2 will improve runtime.
This will also eliminate the need for storing self._cov_Y_Y.
However, it is also acceptable to leave the current one and address this in a follow-up PR, since the current implementation itself is not incorrect.

We can calculate this as follows:

Let $A$ be the symmetric positive definite matrix (in this case, self._cov_Y_Y) and define its Cholesky decomposition as $LL^\top=A$ (in this case, $L$ corresponds to self._cov_Y_Y_chol). Let $M$ be a symmetric positive definite matrix partitioned as:
　　 $M$ = [ [ $A\:$ $X$ ] [ $X^\top\:$ $Y$ ] ]
(in this case, $X$ corresponds to self.kernel(X_running), and $Y$ corresponds to self.kernel(X_running, X_running)).
Then, the Cholesky decomposition of $M$ is provided as follows:
　　 $\mathrm{chol}(M)$ = [ [ $L\:$ $0$ ] [ $Z^\top\:$ $C$ ] ],
where $Z := L^{-1} X$ and $C := \mathrm{chol}(Y - Z^\top Z)$.
Note that we can compute $Z$ efficiently by solving $LZ = X$, since $L$ is the triangular matrix. (Use scipy.linalg.solve_triangular.)

proof.
Assume that $M$ admits a block Cholesky factorization of the form:
　　 $M$ := [ [ $L\:$ $0$ ] [ $B\:$ $C$ ] ] [ [ $L^\top\:$ $B^\top$ ] [ $0\:$ $C^\top$ ] ]
　　 = [ [ $LL^\top\:$ $LB^\top$ ] [ $BL^\top\:$ $BB^\top + CC^\top$ ] ].
Comparing this to $M$ = [ [ $A\:$ $X$ ] [ $X^\top\:$ $Y$ ] ], we obtain:
　　 $LB^\top = X$ and $BB^\top + CC^\top = Y$. 　　(1)
Thus, it follows that $B^\top = L^{-1}X$. Defining $Z := L^{-1}X$, we have $B = Z^\top$, and hence $BB^\top = Z^\top Z$.
Substituting this into the Equation (1) yields:
　　 $CC^\top = Y - Z^\top Z$
　　 $= Y - (L^{-1}X)^\top L^{-1}X$
　　 $= Y - X^\top (L^{-1})^\top L^{-1}X$
　　 $= Y - X A^{-1} X$.
Since $M$ is symmetric positive definite, the Schur complement $Y - X A^{-1} X = Y - Z^\top Z$ is also symmetric positive definite. Therefore, its Cholesky factor exists and $C$ can be chosen as $C = \mathrm{chol}(Y - Z^\top Z)$.
This completes the proof. □

Thank you so much for the precise explanation! I made the update accordingly.

nabenabe0928 · 2026-02-14T04:56:34Z

optuna/_gp/acqf.py

+            normalized_params_of_running_trials = torch.from_numpy(
+                normalized_params_of_running_trials
+            )
+            constant_liar_value = self._gpr._y_train.max()


Could you add an inline comment about this design choice?
Namely, we could use mean (posterior mean), median, min, or any kind of operators, right?
Plus, as you may already know, we often integrate out the value because we can calculate the posterior distributions for each running trial.

Thank you for the comment. I have added a few inline comments. I also left some notes on issue #6392 regarding the future implementation plan.

nabenabe0928 · 2026-02-14T05:01:00Z

optuna/_gp/gp.py

            self._squared_X_diff[..., self._is_categorical] = (
                self._squared_X_diff[..., self._is_categorical] > 0.0
            ).type(torch.float64)
+        self._cov_Y_Y: np.ndarray | None = None


What about renaming it to one of the following because the other data is stored as tensor.

Suggested change

self._cov_Y_Y: np.ndarray | None = None

self._cov_Y_Y_array: np.ndarray | None = None

Or

Suggested change

self._cov_Y_Y: np.ndarray | None = None

self._cov_Y_Y_np: np.ndarray | None = None

nabenabe0928 · 2026-02-14T05:13:10Z

optuna/_gp/acqf.py

        gpr: GPRegressor,
        search_space: SearchSpace,
        threshold: float,
+        normalized_params_of_running_trials: np.ndarray = np.array([]),


This syntax is standard in Python.

Suggested change

normalized_params_of_running_trials: np.ndarray = np.array([]),

normalized_params_of_running_trials: np.ndarray | None = None,

The response from Gemini follows:

Gemini

Why use None for default arguments?
Avoids Unwanted Behavior: Using None as a sentinel value ensures that a fresh, independent list (or other mutable object) is created each time the function is called without a specific argument.
Encapsulation and Predictability: It helps in writing more stable, predictable, and testable code by ensuring function calls are isolated and do not interact with a shared, mutable default object.
Follows Official Style Guides: This approach aligns with recommendations from the official Python style guide, PEP 8, for handling mutable default arguments.

I was thinking about the same thing

kAIto47802 · 2026-02-14T05:59:41Z

optuna/samplers/_gp/sampler.py

-        trials = study._get_trials(deepcopy=False, states=states, use_cache=True)
+        states = (TrialState.COMPLETE, TrialState.RUNNING)
+        trials = study._get_trials(deepcopy=False, states=states, use_cache=False)
+        complete_trials = [t for t in trials if t.state == TrialState.COMPLETE]


nit: completed_trials might be more natural.

Suggested change

complete_trials = [t for t in trials if t.state == TrialState.COMPLETE]

completed_trials = [t for t in trials if t.state == TrialState.COMPLETE]

nabenabe0928 · 2026-02-14T06:15:29Z

optuna/_gp/gp.py

+        # NOTE(nabenabe): Here we use NumPy to guarantee the reproducibility from the past.
+        self._cov_Y_Y_chol = torch.from_numpy(cov_Y_Y_chol)
+        self._cov_Y_Y_inv_Y = torch.from_numpy(cov_Y_Y_inv_Y)
+        self._X_train = torch.cat([self._X_train, X_running], dim=0)


I don't think we should update the training dataset because we didn't use X_running in the training, which is fit_kernel_params.

Suggested change

self._X_train = torch.cat([self._X_train, X_running], dim=0)

Maybe, it's even better to remove X_train and y_train after the call of _cache_matrix.
In other words, define X_all or something and use it instead of X_train.

nabenabe0928 · 2026-02-14T06:16:01Z

optuna/_gp/gp.py

+        self._cov_Y_Y_chol = torch.from_numpy(cov_Y_Y_chol)
+        self._cov_Y_Y_inv_Y = torch.from_numpy(cov_Y_Y_inv_Y)
+        self._X_train = torch.cat([self._X_train, X_running], dim=0)
+        self._y_train = torch.cat([self._y_train, y_running], dim=0)


Suggested change

self._y_train = torch.cat([self._y_train, y_running], dim=0)

sawa3030 · 2026-02-18T06:37:51Z

Thank you all for taking the time to review this PR. Your feedback was very helpful, and I’ve updated the implementation accordingly.

kAIto47802 · 2026-02-20T16:56:01Z

optuna/_gp/gp.py

        )
        with torch.no_grad():
-            cov_Y_Y = self.kernel().detach().cpu().numpy()
+            _cov_Y_Y = self.kernel().detach().cpu().numpy()


nit: Now that it is no longer stored as a member variable, there is no need to prefix it with an underscore.

Suggested change

_cov_Y_Y = self.kernel().detach().cpu().numpy()

cov_Y_Y = self.kernel().detach().cpu().numpy()

kAIto47802 · 2026-02-20T18:05:55Z

optuna/_gp/gp.py

+        cov_Y_Y_chol = np.zeros((n_total, n_total), dtype=np.float64)
+        cov_Y_Y_chol[:n_train, :n_train] = self._cov_Y_Y_chol.numpy()
+        with torch.no_grad():
+            kernel_train_running = self.kernel(X_running).detach().cpu().numpy()


Wouldn't kernel_running_train be more natural, since it is kernel(X_running, X_train)?

Suggested change

kernel_train_running = self.kernel(X_running).detach().cpu().numpy()

kernel_running_train = self.kernel(X_running).detach().cpu().numpy()

kAIto47802 · 2026-02-20T18:34:44Z

optuna/_gp/gp.py

+            - cov_Y_Y_chol[n_train:, :n_train] @ cov_Y_Y_chol[n_train:, :n_train].T
+        )
+        self._y_all = torch.cat([self._y_train, y_running], dim=0)
+        cov_Y_Y_inv_Y = scipy.linalg.solve_triangular(


I think we can also compute here little bit more efficiently by reusing self._cov_Y_Y_inv_Y, leading $O( (\text{n\_train} + \text{n\_running})^2)$ to $O(\text{n\_train}~\text{n\_running} + \text{n\_running}^2)$.
However, this is not very important and we can leave here as it is, since the overall complexity of append_running_data is bottlenecked by computing cov_Y_Y_chol.

Thank you for the helpful suggestion! I agree that the forward substitution, scipy.linalg.solve_triangular(cov_Y_Y_chol, self._y_all.cpu().numpy(), lower=True) can likely be reduced to O(n_train * n_running + n_running**2) by caching the corresponding train-only term,
scipy.linalg.solve_triangular(cov_Y_Y_chol, self._y_train.cpu().numpy(), lower=True) in _cache_matrix.

What I was less certain about was whether the backward substitution could also be reduced to the same complexity. As you mentioned, I think this would be a good optimization to consider in a future improvement.

Thank you for the reply. Indeed, yes. We need $W := AX^{-1}$ by solving $L^\top W = Z$, which takes $O(\text{n\_train}^2 + \text{n\_running})$. Also, the overall complexity of append_running_data is already bottlenecked by the computation of cov_Y_Y_chol, which we cannot improve more.
Therefore, we can achieve only a constant-factor improvement here, which is the reason why I felt this is not very important in contrast to the one I mentioned here. Also, I'm not very certain how this constant-factor improvement contribute to the overall runtime without conducting benchmarking on it.

kAIto47802 · 2026-02-20T18:41:04Z

optuna/samplers/_gp/sampler.py

    return constraint_vals, is_feasible
+
+
+def get_params(trial: FrozenTrial) -> dict[str, Any]:


Now that this function is only used in this module, we should make this internal.

Suggested change

def get_params(trial: FrozenTrial) -> dict[str, Any]:

def _get_params(trial: FrozenTrial) -> dict[str, Any]:

kAIto47802 · 2026-02-20T18:42:43Z

Thank you for the update! It almost looks good to me, with only some minor naming issues.

sawa3030 · 2026-02-25T06:24:13Z

I’ve updated the PR to address the comments so far. I’d appreciate it if you could take another look.

kAIto47802

Thank you for the update! LGTM

optuna/samplers/_gp/sampler.py

optuna/_gp/search_space.py

optuna/_gp/gp.py

not522 · 2026-02-27T08:07:40Z

I confirmed the improvement in the BBOB benchmark. It's LGTM except the above comments.

sawa3030 · 2026-03-03T08:10:35Z

Thank you for the check. I’ve updated the code accordingly. PTAL.

not522

LGTM!

sawa3030 added 15 commits December 26, 2025 12:54

implement constant_liar in gpsampler

11d8d5a

add get_params for GPSampler

504cc51

add experimental argument and add test

3591b7d

add constant liar strategy in acqf

2732ebe

refactor the code

70a7202

use y_train to compute constant_liar_value

c15c0ee

remove constant_liar argument

16b91af

update argument of LogEI

bad4d8a

use best strategy

0a74d89

format

4c1908a

refactor

bbb50ee

refactor

fede561

update append_running_data

34fea9b

merge master

0a447fe

merge master

a7a54ab

refactor

a102fe2

sawa3030 mentioned this pull request Feb 10, 2026

Add constant liar strategy to GPSampler #6402

Closed

not522 assigned not522 and y0z Feb 12, 2026

not522 reviewed Feb 13, 2026

View reviewed changes

optuna/_gp/search_space.py Outdated Show resolved Hide resolved

kAIto47802 reviewed Feb 13, 2026

View reviewed changes

nabenabe0928 reviewed Feb 14, 2026

View reviewed changes

kAIto47802 reviewed Feb 14, 2026

View reviewed changes

nabenabe0928 reviewed Feb 14, 2026

View reviewed changes

refactor

4033fb9

update append_running_data

5b372b3

kAIto47802 reviewed Feb 20, 2026

View reviewed changes

c-bata unassigned y0z Feb 25, 2026

c-bata added the feature Change that does not break compatibility, but affects the public interfaces. label Feb 25, 2026

refactor

6876cb4

kAIto47802 approved these changes Feb 25, 2026

View reviewed changes

sawa3030 mentioned this pull request Feb 26, 2026

Enhance Multi-Objective Constrained Parallel Optimization in GPSampler #6481

Merged

not522 reviewed Feb 27, 2026

View reviewed changes

optuna/samplers/_gp/sampler.py Outdated Show resolved Hide resolved

optuna/_gp/search_space.py Show resolved Hide resolved

not522 reviewed Feb 27, 2026

View reviewed changes

optuna/_gp/gp.py Show resolved Hide resolved

sawa3030 added 3 commits March 3, 2026 16:42

add unit test

e11187d

add inline comment

285d1c5

make use_cache = False only when it is necessary

01a0686

not522 approved these changes Mar 4, 2026

View reviewed changes

not522 added this to the v4.8.0 milestone Mar 4, 2026

not522 changed the title ~~Add constant liar strategy to GPSampler~~ Add constant liar strategy to GPSampler Mar 4, 2026

not522 merged commit 41618e2 into optuna:master Mar 4, 2026
13 checks passed

not522 mentioned this pull request Mar 5, 2026

Fix mypy errors in GP module #6492

Merged

	self._cov_Y_Y: np.ndarray \| None = None
	self._cov_Y_Y_array: np.ndarray \| None = None

	self._cov_Y_Y: np.ndarray \| None = None
	self._cov_Y_Y_np: np.ndarray \| None = None

	normalized_params_of_running_trials: np.ndarray = np.array([]),
	normalized_params_of_running_trials: np.ndarray \| None = None,

	complete_trials = [t for t in trials if t.state == TrialState.COMPLETE]
	completed_trials = [t for t in trials if t.state == TrialState.COMPLETE]

	_cov_Y_Y = self.kernel().detach().cpu().numpy()
	cov_Y_Y = self.kernel().detach().cpu().numpy()

	kernel_train_running = self.kernel(X_running).detach().cpu().numpy()
	kernel_running_train = self.kernel(X_running).detach().cpu().numpy()

		return constraint_vals, is_feasible


		def get_params(trial: FrozenTrial) -> dict[str, Any]:

	def get_params(trial: FrozenTrial) -> dict[str, Any]:
	def _get_params(trial: FrozenTrial) -> dict[str, Any]:

Uh oh!

Conversation

sawa3030 commented Feb 10, 2026

Motivation

Description of the changes

Note

Uh oh!

sawa3030 commented Feb 10, 2026

Benchmark

Benchmark setup

Evaluation

Results (total wins)

Uh oh!

not522 commented Feb 12, 2026

Uh oh!

Uh oh!

kAIto47802 Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nabenabe0928 Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nabenabe0928 Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sawa3030 commented Feb 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kAIto47802 Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kAIto47802 commented Feb 20, 2026

Uh oh!

sawa3030 commented Feb 25, 2026

Uh oh!

kAIto47802 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

not522 commented Feb 27, 2026

Uh oh!

sawa3030 commented Mar 3, 2026

Uh oh!

not522 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

kAIto47802 Feb 13, 2026 •

edited

Loading

nabenabe0928 Feb 14, 2026 •

edited

Loading

nabenabe0928 Feb 14, 2026 •

edited

Loading

kAIto47802 Feb 25, 2026 •

edited

Loading