Skip to content

Handle slowdown of GPSampler due to L-BFGS in SciPy v1.15#6191

Merged
nabenabe0928 merged 9 commits intooptuna:masterfrom
kAIto47802:scipy-v1.15-handle-slow-in-l-bfgs-b
Aug 1, 2025
Merged

Handle slowdown of GPSampler due to L-BFGS in SciPy v1.15#6191
nabenabe0928 merged 9 commits intooptuna:masterfrom
kAIto47802:scipy-v1.15-handle-slow-in-l-bfgs-b

Conversation

@kAIto47802
Copy link
Copy Markdown
Collaborator

@kAIto47802 kAIto47802 commented Jul 2, 2025

Motivation

This PR is related to:

It appears that L-BFGS-B in scipy.optimize.minimize has become slower in some cases since SciPy version 1.15.0.
This affects the performance of Optuna's GPSampler, which uses it for kernel fitting.

According to the issue above, setting the OPENBLAS_NUM_THREADS environment variable to 1 mitigates this slowdown.

Description of the changes

  • Add contextmanager that sets OPENBLAS_NUM_THREADS to 1 when scipy>=1.15.0.

Benchmarks

I benchmark the original slowdown and the update introduced in this PR.

Benchmarking Settings

Objective functions. I use the following functions available in OptunaHub:

Implementation details. I use dimensions 5 and 20, with 100 trials each. Each experiment is run with 5 different random seeds. The benchmarking codes and the visualization codes I use are as follows:

Benchmarking code
from argparse import ArgumentParser, Namespace
from datetime import datetime
import subprocess
from typing import cast

import numpy as np
import optuna
import optunahub
import scipy


def _extract_elapsed_time(study: optuna.study.Study) -> list[float]:
    return [
        (
            cast(datetime, t.datetime_complete) - cast(datetime, study.trials[0].datetime_start)
        ).total_seconds()
        for t in study.trials
    ]


def _extract_objective_value(study: optuna.study.Study) -> list[float | None]:
    return [t.value for t in study.trials]


def _get_git_commit_hash() -> str:
    return subprocess.check_output(["git", "rev-parse", "HEAD"]).strip().decode("utf-8")


def experiment_once(
    objective: optunahub.benchmarks.BaseProblem,
    n_trials: int,
    seed: int,
) -> tuple[list[float], list[float | None]]:
    sampler = optuna.samplers.GPSampler(seed=seed)
    study = optuna.create_study(sampler=sampler, directions=objective.directions)
    study.optimize(objective, n_trials=n_trials)
    times = _extract_elapsed_time(study)
    values = _extract_objective_value(study)
    return times, values


def main(args: Namespace) -> None:
    bbob = optunahub.load_module("benchmarks/bbob")

    objective = bbob.Problem(function_id=args.function_id, dimension=args.dimension)

    times, values = zip(
        *[experiment_once(objective, args.n_trials, seed) for seed in range(args.n_seeds)]
    )
    np.savez(
        f"results/bbob_fn{args.function_id}_dim{args.dimension}_scipy{scipy.__version__}_{_get_git_commit_hash()}_trial{args.n_trials}.npz",
        times=np.array(times),
        values=np.array(values),
        function_id=args.function_id,
        dimension=args.dimension,
    )


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument(
        "--function_id",
        type=int,
        default=3,
        help="Function ID (1-24) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--dimension",
        type=int,
        default=5,
        help="Dimension (2, 3, 5, 10, 20, 40, 60) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--n_trials",
        type=int,
        default=100,
        help="Number of trials for the optimization.",
    )
    parser.add_argument(
        "--n_seeds",
        type=int,
        default=5,
        help="Number of random seeds for the optimization.",
    )
    args = parser.parse_args()

    main(args)
#!/bin/bash


function_ids=(2 3 16 17 20 22)
dimensions=(5 20)
versions=('v1.14' 'v1.15')


for function_id in "${function_ids[@]}"; do
  for dimension in "${dimensions[@]}"; do
    for version in "${versions[@]}"; do
      source "scipy-${version}/bin/activate"
      echo "Running function $function_id with dimension $dimension in SciPy version $version"
      python benchmark_scipy.py --function_id "$function_id" --dimension "$dimension"
    done
  done
done
Visualization code
from argparse import ArgumentParser, Namespace

import matplotlib.pyplot as plt
from matplotlib.figure import Figure
import numpy as np


def _prepare_data(data: dict[str, np.ndarray | int]) -> tuple[np.ndarray, np.ndarray]:
    times = data["times"]
    values = data["values"]
    assert isinstance(times, np.ndarray) and isinstance(values, np.ndarray)
    assert times.shape == values.shape
    return times, values


def plot_results(
    data: dict[str, np.ndarray],
    colors: dict[str, str],
    markers: dict[str, str],
    labels: dict[str, str],
    ylabel: str,
    accumulate: bool = False,
    legend_loc: str = "upper left",
) -> Figure:

    fig, ax = plt.subplots()
    for name, d in data.items():
        if accumulate:
            d = np.minimum.accumulate(d, axis=-1)
        mean = np.mean(d, axis=0)
        std = np.std(d, axis=0)
        dx = np.arange(mean.shape[0]) + 1
        ax.plot(
            dx,
            mean,
            colors[name],
            label=labels[name],
            marker=markers[name],
            markevery=4,
            markerfacecolor="none",
        )
        ax.fill_between(
            dx,
            mean - std,
            mean + std,
            alpha=0.2,
            color=colors[name],
        )
    ax.legend(
        loc=legend_loc,
        fontsize=8,
    )
    ax.set_xlabel("Number of Trials")
    ax.set_ylabel(ylabel)
    ax.grid(which="major", color="black")

    return fig


def main(args: Namespace) -> None:
    comments = {
        f"{v}_{h}": f"{v} {c} ({h})"
        for h, c in zip(args.commits, args.comments)
        for v in args.versions
    }
    colors = dict(
        zip(
            comments.keys(),
            [
                "#0072B2",
                "#CC79A7",
                "#F0E442",
                "#D55E00",
                "#009E73",
                "#E69F00",
                "#56B4E9",
                "#000000",
            ][: len(comments)],
        )
    )
    markers = dict(zip(comments.keys(), ["o", "s", "D", "^", "v", "P", "*", "X"][: len(comments)]))

    data = [
        np.load(
            f"results/bbob_fn{args.function_id}_dim{args.dimension}_scipy{v[1:]}_trial{args.n_trials}.npz",
        )
        for v in comments.keys()
    ]

    times, values = zip(*[_prepare_data(d) for d in data])

    fig = plot_results(
        dict(zip(comments.keys(), times)),
        colors=colors,
        markers=markers,
        labels=comments,
        ylabel="Elapsed Time / s",
        accumulate=False,
    )
    fig.savefig(
        f"results/times_bbob_fn{args.function_id}_dim{args.dimension}_{'-'.join(args.versions)}_{'-'.join(args.commits)}.png",
        bbox_inches="tight",
    )

    fig = plot_results(
        dict(zip(comments, values)),
        colors=colors,
        markers=markers,
        labels=comments,
        ylabel="Function Value",
        accumulate=True,
        legend_loc="upper right",
    )
    fig.savefig(
        f"results/values_bbob_fn{args.function_id}_dim{args.dimension}_{'-'.join(args.versions)}_{'-'.join(args.commits)}.png",
        bbox_inches="tight",
    )


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument(
        "--function_id",
        type=int,
        default=3,
        help="Function ID (1-24) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--dimension",
        type=int,
        default=5,
        help="Dimension (2, 3, 5, 10, 20, 40, 60) for the BBOB benchmark.",
    )
    parser.add_argument(
        "--n_trials",
        type=int,
        default=100,
        help="Number of trials for the optimization.",
    )
    parser.add_argument(
        "--versions",
        type=str,
        nargs="+",
        default=["v1.14.0", "v1.15.0"],
        help="List of scipy versions to compare.",
    )
    parser.add_argument(
        "--commits",
        type=str,
        nargs="+",
        help="Git commit hash to append to the output filenames.",
    )
    parser.add_argument(
        "--comments",
        type=str,
        nargs="+",
        help="Additional comments corresponding to the git commit hash.",
    )
    args = parser.parse_args()
    main(args)
#!/bin/bash


function_ids=(2 3 16 17 20 22)
dimensions=(5 20)
versions=('v1.14' 'v1.15')


for function_id in "${function_ids[@]}"; do
  for dimension in "${dimensions[@]}"; do
    for version in "${versions[@]}"; do
      python plot_scipy.py \
        --commits \
          2bfc0bb5290c7ab44a0bcdbfe7aea4b7626307e5 \
          a6487ff321891c0936e8bb5e18d0dd710f2e08f4 \
        --comments "This PR" "Original" \
        --function_id "$function_id" \
        --dimension "$dimension"
    done
  done
done

Environment. The benchmarking is done on a machine running Arch Linux with Intel® Core™ i9-14900HX processor (24 cores, 32 threads, up to 5.8GHz), and Python 3.11.0.

Results

Figure 1 and Figure 2 show the results for the elapsed time at the end of each trial and the best objective values, respectively.

Figure 1 shows that GPSampler becomes substantially slower in SciPy v1.15.0. The patch introduced in this PR mitigates the slowdown. This patch also slightly speeds up the original version with SciPy v1.14.0 in some cases.

Figure 2 confirms that neither the version change nor the modifications introduced in this PR affect the final objective values.

(a)

(a) function 2, dimension 5

(b)

(b) function 2, dimension 20

(c)

(c) function 3, dimension 5

(d)

(d) function 3, dimension 20

(e)

(e) function 16, dimension 5

(f)

(f) function 16, dimension 20

(g)

(g) function 17, dimension 5

(h)

(h) function 17, dimension 20

(i)

(i) function 20, dimension 5

(j)

(j) function 20, dimension 20

(k)

(k) function 20, dimension 5

(l)

(l) function 20, dimension 20

Figure 1. The elapsed time at the end of each trial. The solid lines denote the mean, and the translucent areas denote the standard error, both computed over five independent runs with different random seeds.



(a)

(a) function 2, dimension 5

(b)

(b) function 2, dimension 20

(c)

(c) function 3, dimension 5

(d)

(d) function 3, dimension 20

(e)

(e) function 16, dimension 5

(f)

(f) function 16, dimension 20

(g)

(g) function 17, dimension 5

(h)

(h) function 17, dimension 20

(i)

(i) function 20, dimension 5

(j)

(j) function 20, dimension 20

(k)

(k) function 22, dimension 5

(l)

(l) function 22, dimension 20

Figure 2. The best objective values. The solid lines denote the mean, and the translucent areas denote the standard error, both computed over five independent runs with different random seeds.

@nabenabe0928 nabenabe0928 self-assigned this Jul 2, 2025
@nabenabe0928 nabenabe0928 added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label Jul 2, 2025
@nabenabe0928 nabenabe0928 added this to the v4.5.0 milestone Jul 2, 2025
@nabenabe0928
Copy link
Copy Markdown
Contributor

@kAIto47802
Thank you for the PR.
The PR looks basically good to me.
Just in case, could you please conduct speed benchmarking?

@kAIto47802 kAIto47802 force-pushed the scipy-v1.15-handle-slow-in-l-bfgs-b branch from b1976ce to 0f7dc17 Compare July 2, 2025 09:11
@nabenabe0928 nabenabe0928 changed the title Handle slowdown in GPSampler with L-BFGS in Scipy v1.15 Handle slowdown of GPSampler due to L-BFGS in SciPy v1.15 Jul 2, 2025
@kAIto47802
Copy link
Copy Markdown
Collaborator Author

Sure! I’ve conducted benchmarks, updating the comment.

@y0z
Copy link
Copy Markdown
Member

y0z commented Jul 4, 2025

@fusawa-yugo Could you review this PR?

Copy link
Copy Markdown
Contributor

@nabenabe0928 nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kAIto47802
Thank you for the benchmarking experiments!
I confirmed that the slowdown caused by a single thread is very limited:)

@nabenabe0928 nabenabe0928 removed their assignment Jul 8, 2025
@ilayn
Copy link
Copy Markdown

ilayn commented Jul 11, 2025

Hello, am I understand correctly the OpenBLAS thread limit resolves the issue? Still not nice to have this requirement and apologies for this strange issue hitting your workflows. I am also a bit confused why this is happening but I just want to make sure that this is 100% OpenBLAS thread behavior not some silly mistake I made in the translations.

@nabenabe0928
Copy link
Copy Markdown
Contributor

nabenabe0928 commented Jul 11, 2025

@ilayn
Hey, thank you for the comment and for your many great works in SciPy!!
Yes, (we believe) your understanding is correct, but I cannot guarantee that the problem goes away with our change either.
We've noticed the slowdown happens mostly in scipy.optimize.minimize in our case.
But it doesn't mean that l_bfgs_b is safe because they share the same code as far as we can see.

The problem probably comes from the instabilities in our optimization. Indeed, our optimization using scipy.optimize.minimize in our package is, by nature, very sensitive and is likely to get big gradients.

Up to now, we observed slowdowns when we get huge gradients (> 2**32) and the precision handling depends on OS.
but it doesn't explain why the slowdown goes away when we limit the thread count.
Anyways, there are so many mysteries.
For now, this PR should resolve the issue in most cases (we hope 🤞).

@ilayn
Copy link
Copy Markdown

ilayn commented Jul 11, 2025

Thank you for your kind words. When I am in the thick of it, I can't see all the possible repercussions of this type of chage hence it is a lot of learning for me next to the straightforward translation work.

Regarding the large gradients and other details, I am trying to figure out whether some kind of a equilibriation step can help the conditioning of the problems both in L-BFGS-B and SLSQP. These are as you know very archaic algorithms written at a time where things were very restricted (late 80s). Now we are pushing them to places outside their comfort zone hence each step brings in another display of their shortcomings. So if there is a local optimization expert and knows what needs to be done, I'll be happy to implement those changes. I don't find too much time to go back to the paper-reading mode unfortunately.

Regarding why this is now happening is because the original versions had their LAPACK copies baked in as additional code. Hence they did not have library dependencies but self-contained. Now I took out those parts and linked to the SciPy BLAS/LAPACK mechanism (be it MKL or OpenBLAS or else), and folks started to hit the shortcomings of these numerical libraries. That's why my response to these problems are slower than usual because I don't quite know where to look yet and will try to understand why this might be happening. Maybe it's a simple answer because libc is doing something that Fortran does not. Or something is lurking in OpenBLAS. Not sure yet.

You might give it a go with MKL if you have the time and the patience, through a conda flow.

@nabenabe0928
Copy link
Copy Markdown
Contributor

Hey @gen740 , could you review this PR?

@fusawa-yugo
Copy link
Copy Markdown
Contributor

LGTM! Thank you for your PR!!
I also confirmed the speedup in my environment!
Sorry for late response.

@ilayn
Copy link
Copy Markdown

ilayn commented Jul 16, 2025

Apologies for this request but could any of you folks report whether you have improved or worsened performance also with OPENBLAS_NUM_THREADS set to 2 or 4? A simple yes/no will suffice.

We are trying to get to the bottom of it so that you don't need to jump hoops like this but it is quite difficult to replicate this behavior hence my request.

@nabenabe0928
Copy link
Copy Markdown
Contributor

nabenabe0928 commented Jul 16, 2025

@ilayn

I am checking right now and will get back to you once the results are out.

import optuna


def objective(trial: optuna.Trial) -> float:
    return sum((trial.suggest_float("x", -5, 5) - 2)**2 for i in range(10))


sampler = optuna.samplers.GPSampler(seed=0)
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=30)
print((study.trials[-1].datetime_complete - study.trials[0].datetime_start).total_seconds())

@nabenabe0928
Copy link
Copy Markdown
Contributor

nabenabe0928 commented Jul 16, 2025

@ilayn
Ok, so the results are out.
I can clearly see the slowdown as the number of OpenBLAS threads increases.
I hope it helps !

OPENBLAS_NUM_THREADS Runtime
1 5.275556
2 5.761794
3 6.064222
4 20.362522
5 25.854847
None 28.160928

@ilayn
Copy link
Copy Markdown

ilayn commented Jul 16, 2025

Perfect thank you very much. Also Ralf posted some instructions in the SciPy issue. I think this is because of a bad interaction of PyTorch and SciPy.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Jul 29, 2025
@nabenabe0928 nabenabe0928 removed the stale Exempt from stale bot labeling. label Jul 30, 2025
Copy link
Copy Markdown
Member

@gen740 gen740 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply.

LGTM!

@gen740
Copy link
Copy Markdown
Member

gen740 commented Aug 1, 2025

@kAIto47802 Could you resolve the conflicts?

@nabenabe0928 nabenabe0928 enabled auto-merge August 1, 2025 07:19
@nabenabe0928 nabenabe0928 merged commit d15f4b9 into optuna:master Aug 1, 2025
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Change that does not break compatibility and not affect public interfaces, but improves performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants