Handle slowdown of `GPSampler` due to L-BFGS in SciPy v1.15 by kAIto47802 · Pull Request #6191 · optuna/optuna

kAIto47802 · 2025-07-02T09:00:06Z

Motivation

This PR is related to:

BUG: Slow function call in scipy.optimize.minimize with L-BFGS-B since scipy>=1.15.0 on Ubuntu scipy/scipy#23191.

It appears that L-BFGS-B in scipy.optimize.minimize has become slower in some cases since SciPy version 1.15.0.
This affects the performance of Optuna's GPSampler, which uses it for kernel fitting.

According to the issue above, setting the OPENBLAS_NUM_THREADS environment variable to 1 mitigates this slowdown.

Description of the changes

Add contextmanager that sets OPENBLAS_NUM_THREADS to 1 when scipy>=1.15.0.

Benchmarks

I benchmark the original slowdown and the update introduced in this PR.

Benchmarking Settings

Objective functions. I use the following functions available in OptunaHub:

Separable Ellipsoidal Function (function id: 2);
Rastrigin Function (function id: 3);
Weierstrass Function (function id: 16);
Schaffer’s F7 Function (function id: 17);
Schwefel Function (function id: 20); and
Gallagher’s Gaussian 21-hi Peaks Function (function id: 22).

Implementation details. I use dimensions 5 and 20, with 100 trials each. Each experiment is run with 5 different random seeds. The benchmarking codes and the visualization codes I use are as follows:

Benchmarking code

from argparse import ArgumentParser, Namespace
from datetime import datetime
import subprocess
from typing import cast

import numpy as np
import optuna
import optunahub
import scipy


def _extract_elapsed_time(study: optuna.study.Study) -> list[float]:
    return [
        (
            cast(datetime, t.datetime_complete) - cast(datetime, study.trials[0].datetime_start)
        ).total_seconds()
        for t in study.trials
    ]


def _extract_objective_value(study: optuna.study.Study) -> list[float | None]:
    return [t.value for t in study.trials]


def _get_git_commit_hash() -> str:
    return subprocess.check_output(["git", "rev-parse", "HEAD"]).strip().decode("utf-8")


def experiment_once(
    objective: optunahub.benchmarks.BaseProblem,
    n_trials: int,
    seed: int,
) -> tuple[list[float], list[float | None]]:
    sampler = optuna.samplers.GPSampler(seed=seed)
    study = optuna.create_study(sampler=sampler, directions=objective.directions)
    study.optimize(objective, n_trials=n_trials)
    times = _extract_elapsed_time(study)
    values = _extract_objective_value(study)
    return times, values


def main(args: Namespace) -> None:
    bbob = optunahub.load_module("benchmarks/bbob")

    objective = bbob.Problem(function_id=args.function_id, dimension=args.dimension)

    times, values = zip(
        *[experiment_once(objective, args.n_trials, seed) for seed in range(args.n_seeds)]
    )
    np.savez(
        f"results/bbob_fn{args.function_id}_dim{args.dimension}_scipy{scipy.__version__}_{_get_git_commit_hash()}_trial{args.n_trials}.npz",
        times=np.array(times),
        values=np.array(values),
        function_id=args.function_id,
        dimension=args.dimension,
    )


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument(
        "--function_id",
        type=int,
        default=3,
        help="Function ID (1-24) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--dimension",
        type=int,
        default=5,
        help="Dimension (2, 3, 5, 10, 20, 40, 60) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--n_trials",
        type=int,
        default=100,
        help="Number of trials for the optimization.",
    )
    parser.add_argument(
        "--n_seeds",
        type=int,
        default=5,
        help="Number of random seeds for the optimization.",
    )
    args = parser.parse_args()

    main(args)

#!/bin/bash


function_ids=(2 3 16 17 20 22)
dimensions=(5 20)
versions=('v1.14' 'v1.15')


for function_id in "${function_ids[@]}"; do
  for dimension in "${dimensions[@]}"; do
    for version in "${versions[@]}"; do
      source "scipy-${version}/bin/activate"
      echo "Running function $function_id with dimension $dimension in SciPy version $version"
      python benchmark_scipy.py --function_id "$function_id" --dimension "$dimension"
    done
  done
done

Visualization code

from argparse import ArgumentParser, Namespace

import matplotlib.pyplot as plt
from matplotlib.figure import Figure
import numpy as np


def _prepare_data(data: dict[str, np.ndarray | int]) -> tuple[np.ndarray, np.ndarray]:
    times = data["times"]
    values = data["values"]
    assert isinstance(times, np.ndarray) and isinstance(values, np.ndarray)
    assert times.shape == values.shape
    return times, values


def plot_results(
    data: dict[str, np.ndarray],
    colors: dict[str, str],
    markers: dict[str, str],
    labels: dict[str, str],
    ylabel: str,
    accumulate: bool = False,
    legend_loc: str = "upper left",
) -> Figure:

    fig, ax = plt.subplots()
    for name, d in data.items():
        if accumulate:
            d = np.minimum.accumulate(d, axis=-1)
        mean = np.mean(d, axis=0)
        std = np.std(d, axis=0)
        dx = np.arange(mean.shape[0]) + 1
        ax.plot(
            dx,
            mean,
            colors[name],
            label=labels[name],
            marker=markers[name],
            markevery=4,
            markerfacecolor="none",
        )
        ax.fill_between(
            dx,
            mean - std,
            mean + std,
            alpha=0.2,
            color=colors[name],
        )
    ax.legend(
        loc=legend_loc,
        fontsize=8,
    )
    ax.set_xlabel("Number of Trials")
    ax.set_ylabel(ylabel)
    ax.grid(which="major", color="black")

    return fig


def main(args: Namespace) -> None:
    comments = {
        f"{v}_{h}": f"{v} {c} ({h})"
        for h, c in zip(args.commits, args.comments)
        for v in args.versions
    }
    colors = dict(
        zip(
            comments.keys(),
            [
                "#0072B2",
                "#CC79A7",
                "#F0E442",
                "#D55E00",
                "#009E73",
                "#E69F00",
                "#56B4E9",
                "#000000",
            ][: len(comments)],
        )
    )
    markers = dict(zip(comments.keys(), ["o", "s", "D", "^", "v", "P", "*", "X"][: len(comments)]))

    data = [
        np.load(
            f"results/bbob_fn{args.function_id}_dim{args.dimension}_scipy{v[1:]}_trial{args.n_trials}.npz",
        )
        for v in comments.keys()
    ]

    times, values = zip(*[_prepare_data(d) for d in data])

    fig = plot_results(
        dict(zip(comments.keys(), times)),
        colors=colors,
        markers=markers,
        labels=comments,
        ylabel="Elapsed Time / s",
        accumulate=False,
    )
    fig.savefig(
        f"results/times_bbob_fn{args.function_id}_dim{args.dimension}_{'-'.join(args.versions)}_{'-'.join(args.commits)}.png",
        bbox_inches="tight",
    )

    fig = plot_results(
        dict(zip(comments, values)),
        colors=colors,
        markers=markers,
        labels=comments,
        ylabel="Function Value",
        accumulate=True,
        legend_loc="upper right",
    )
    fig.savefig(
        f"results/values_bbob_fn{args.function_id}_dim{args.dimension}_{'-'.join(args.versions)}_{'-'.join(args.commits)}.png",
        bbox_inches="tight",
    )


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument(
        "--function_id",
        type=int,
        default=3,
        help="Function ID (1-24) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--dimension",
        type=int,
        default=5,
        help="Dimension (2, 3, 5, 10, 20, 40, 60) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--n_trials",
        type=int,
        default=100,
        help="Number of trials for the optimization.",
    )
    parser.add_argument(
        "--versions",
        type=str,
        nargs="+",
        default=["v1.14.0", "v1.15.0"],
        help="List of scipy versions to compare.",
    )
    parser.add_argument(
        "--commits",
        type=str,
        nargs="+",
        help="Git commit hash to append to the output filenames.",
    )
    parser.add_argument(
        "--comments",
        type=str,
        nargs="+",
        help="Additional comments corresponding to the git commit hash.",
    )
    args = parser.parse_args()
    main(args)

#!/bin/bash


function_ids=(2 3 16 17 20 22)
dimensions=(5 20)
versions=('v1.14' 'v1.15')


for function_id in "${function_ids[@]}"; do
  for dimension in "${dimensions[@]}"; do
    for version in "${versions[@]}"; do
      python plot_scipy.py \
        --commits \
          2bfc0bb5290c7ab44a0bcdbfe7aea4b7626307e5 \
          a6487ff321891c0936e8bb5e18d0dd710f2e08f4 \
        --comments "This PR" "Original" \
        --function_id "$function_id" \
        --dimension "$dimension"
    done
  done
done

Environment. The benchmarking is done on a machine running Arch Linux with Intel® Core™ i9-14900HX processor (24 cores, 32 threads, up to 5.8GHz), and Python 3.11.0.

Results

Figure 1 and Figure 2 show the results for the elapsed time at the end of each trial and the best objective values, respectively.

Figure 1 shows that GPSampler becomes substantially slower in SciPy v1.15.0. The patch introduced in this PR mitigates the slowdown. This patch also slightly speeds up the original version with SciPy v1.14.0 in some cases.

Figure 2 confirms that neither the version change nor the modifications introduced in this PR affect the final objective values.

(a) function 2, dimension 5	(b) function 2, dimension 20
(c) function 3, dimension 5	(d) function 3, dimension 20
(e) function 16, dimension 5	(f) function 16, dimension 20
(g) function 17, dimension 5	(h) function 17, dimension 20
(i) function 20, dimension 5	(j) function 20, dimension 20
(k) function 20, dimension 5	(l) function 20, dimension 20

Figure 1. The elapsed time at the end of each trial. The solid lines denote the mean, and the translucent areas denote the standard error, both computed over five independent runs with different random seeds.

(a) function 2, dimension 5	(b) function 2, dimension 20
(c) function 3, dimension 5	(d) function 3, dimension 20
(e) function 16, dimension 5	(f) function 16, dimension 20
(g) function 17, dimension 5	(h) function 17, dimension 20
(i) function 20, dimension 5	(j) function 20, dimension 20
(k) function 22, dimension 5	(l) function 22, dimension 20

Figure 2. The best objective values. The solid lines denote the mean, and the translucent areas denote the standard error, both computed over five independent runs with different random seeds.

…e-slow-in-l-bfgs-b

optuna/_gp/scipy_blas_thread_patch.py

nabenabe0928 · 2025-07-02T09:08:18Z

@kAIto47802
Thank you for the PR.
The PR looks basically good to me.
Just in case, could you please conduct speed benchmarking?

kAIto47802 · 2025-07-02T11:47:28Z

Sure! I’ve conducted benchmarks, updating the comment.

y0z · 2025-07-04T08:11:56Z

@fusawa-yugo Could you review this PR?

nabenabe0928

@kAIto47802
Thank you for the benchmarking experiments!
I confirmed that the slowdown caused by a single thread is very limited:)

ilayn · 2025-07-11T10:19:51Z

Hello, am I understand correctly the OpenBLAS thread limit resolves the issue? Still not nice to have this requirement and apologies for this strange issue hitting your workflows. I am also a bit confused why this is happening but I just want to make sure that this is 100% OpenBLAS thread behavior not some silly mistake I made in the translations.

nabenabe0928 · 2025-07-11T10:37:24Z

@ilayn
Hey, thank you for the comment and for your many great works in SciPy!!
Yes, (we believe) your understanding is correct, but I cannot guarantee that the problem goes away with our change either.
We've noticed the slowdown happens mostly in scipy.optimize.minimize in our case.
But it doesn't mean that l_bfgs_b is safe because they share the same code as far as we can see.

The problem probably comes from the instabilities in our optimization. Indeed, our optimization using scipy.optimize.minimize in our package is, by nature, very sensitive and is likely to get big gradients.

Up to now, we observed slowdowns when we get huge gradients (> 2**32) and the precision handling depends on OS.
but it doesn't explain why the slowdown goes away when we limit the thread count.
Anyways, there are so many mysteries.
For now, this PR should resolve the issue in most cases (we hope 🤞).

ilayn · 2025-07-11T11:17:47Z

Thank you for your kind words. When I am in the thick of it, I can't see all the possible repercussions of this type of chage hence it is a lot of learning for me next to the straightforward translation work.

Regarding the large gradients and other details, I am trying to figure out whether some kind of a equilibriation step can help the conditioning of the problems both in L-BFGS-B and SLSQP. These are as you know very archaic algorithms written at a time where things were very restricted (late 80s). Now we are pushing them to places outside their comfort zone hence each step brings in another display of their shortcomings. So if there is a local optimization expert and knows what needs to be done, I'll be happy to implement those changes. I don't find too much time to go back to the paper-reading mode unfortunately.

Regarding why this is now happening is because the original versions had their LAPACK copies baked in as additional code. Hence they did not have library dependencies but self-contained. Now I took out those parts and linked to the SciPy BLAS/LAPACK mechanism (be it MKL or OpenBLAS or else), and folks started to hit the shortcomings of these numerical libraries. That's why my response to these problems are slower than usual because I don't quite know where to look yet and will try to understand why this might be happening. Maybe it's a simple answer because libc is doing something that Fortran does not. Or something is lurking in OpenBLAS. Not sure yet.

You might give it a go with MKL if you have the time and the patience, through a conda flow.

nabenabe0928 · 2025-07-14T03:12:40Z

Hey @gen740 , could you review this PR?

fusawa-yugo · 2025-07-16T05:41:12Z

LGTM! Thank you for your PR!!
I also confirmed the speedup in my environment!
Sorry for late response.

ilayn · 2025-07-16T07:11:17Z

Apologies for this request but could any of you folks report whether you have improved or worsened performance also with OPENBLAS_NUM_THREADS set to 2 or 4? A simple yes/no will suffice.

We are trying to get to the bottom of it so that you don't need to jump hoops like this but it is quite difficult to replicate this behavior hence my request.

nabenabe0928 · 2025-07-16T07:35:41Z

@ilayn

I am checking right now and will get back to you once the results are out.

import optuna


def objective(trial: optuna.Trial) -> float:
    return sum((trial.suggest_float("x", -5, 5) - 2)**2 for i in range(10))


sampler = optuna.samplers.GPSampler(seed=0)
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=30)
print((study.trials[-1].datetime_complete - study.trials[0].datetime_start).total_seconds())

nabenabe0928 · 2025-07-16T07:41:14Z

@ilayn
Ok, so the results are out.
I can clearly see the slowdown as the number of OpenBLAS threads increases.
I hope it helps !

OPENBLAS_NUM_THREADS	Runtime
1	5.275556
2	5.761794
3	6.064222
4	20.362522
5	25.854847
None	28.160928

ilayn · 2025-07-16T07:44:12Z

Perfect thank you very much. Also Ralf posted some instructions in the SciPy issue. I think this is because of a bad interaction of PyTorch and SciPy.

github-actions · 2025-07-29T23:06:58Z

This pull request has not seen any recent activity.

gen740

Sorry for late reply.

LGTM!

gen740 · 2025-08-01T05:09:25Z

@kAIto47802 Could you resolve the conflicts?

kAIto47802 added 5 commits June 25, 2025 17:07

Add contextmanager

6f36b82

Update

2bfc0bb

Rename file

0b5670f

Merge remote-tracking branch 'upstream/master' into scipy-v1.15-handl…

f9367d0

…e-slow-in-l-bfgs-b

Add future annotations

d2fdba7

nabenabe0928 reviewed Jul 2, 2025

View reviewed changes

optuna/_gp/scipy_blas_thread_patch.py Outdated Show resolved Hide resolved

nabenabe0928 self-assigned this Jul 2, 2025

nabenabe0928 added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label Jul 2, 2025

nabenabe0928 added this to the v4.5.0 milestone Jul 2, 2025

Apply formatter

0f7dc17

kAIto47802 force-pushed the scipy-v1.15-handle-slow-in-l-bfgs-b branch from b1976ce to 0f7dc17 Compare July 2, 2025 09:11

nabenabe0928 changed the title ~~Handle slowdown in GPSampler with L-BFGS in Scipy v1.15~~ Handle slowdown of GPSampler due to L-BFGS in SciPy v1.15 Jul 2, 2025

Add TYPE_CHECKING

b69ff4b

nabenabe0928 approved these changes Jul 8, 2025

View reviewed changes

nabenabe0928 removed their assignment Jul 8, 2025

nabenabe0928 assigned gen740 Jul 14, 2025

nabenabe0928 mentioned this pull request Jul 16, 2025

QUERY: Reduced performance in certain architecture only to-be-regained by OPENBLAS_NUM_THREADS=1 OpenMathLib/OpenBLAS#5383

Open

Merge branch 'master' into scipy-v1.15-handle-slow-in-l-bfgs-b

f1ba316

github-actions bot added the stale Exempt from stale bot labeling. label Jul 29, 2025

nabenabe0928 removed the stale Exempt from stale bot labeling. label Jul 30, 2025

gen740 approved these changes Aug 1, 2025

View reviewed changes

Merge branch 'master' into scipy-v1.15-handle-slow-in-l-bfgs-b

b245c7c

nabenabe0928 enabled auto-merge August 1, 2025 07:19

nabenabe0928 merged commit d15f4b9 into optuna:master Aug 1, 2025
13 of 14 checks passed

nabenabe0928 mentioned this pull request Oct 16, 2025

Developments of Gaussian-Process Based Bayesian Optimization (GPSampler) nabenabe0928/my-skills#1

Open

Uh oh!

Conversation

kAIto47802 commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description of the changes

Benchmarks

Benchmarking Settings

Results

Uh oh!

Uh oh!

nabenabe0928 commented Jul 2, 2025

Uh oh!

kAIto47802 commented Jul 2, 2025

Uh oh!

y0z commented Jul 4, 2025

Uh oh!

nabenabe0928 left a comment

Choose a reason for hiding this comment

Uh oh!

ilayn commented Jul 11, 2025

Uh oh!

nabenabe0928 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilayn commented Jul 11, 2025

Uh oh!

nabenabe0928 commented Jul 14, 2025

Uh oh!

fusawa-yugo commented Jul 16, 2025

Uh oh!

ilayn commented Jul 16, 2025

Uh oh!

nabenabe0928 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nabenabe0928 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilayn commented Jul 16, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

gen740 left a comment

Choose a reason for hiding this comment

Uh oh!

gen740 commented Aug 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kAIto47802 commented Jul 2, 2025 •

edited

Loading

nabenabe0928 commented Jul 11, 2025 •

edited

Loading

nabenabe0928 commented Jul 16, 2025 •

edited

Loading

nabenabe0928 commented Jul 16, 2025 •

edited

Loading