Skip to content

Add a module to preprocess solutions for hypervolume improvement calculation#6039

Merged
nabenabe0928 merged 33 commits intooptuna:masterfrom
nabenabe0928:feat/add-preproc-for-hv-improvement
Apr 28, 2025
Merged

Add a module to preprocess solutions for hypervolume improvement calculation#6039
nabenabe0928 merged 33 commits intooptuna:masterfrom
nabenabe0928:feat/add-preproc-for-hv-improvement

Conversation

@nabenabe0928
Copy link
Copy Markdown
Contributor

@nabenabe0928 nabenabe0928 commented Apr 10, 2025

Motivation

This PR adds a module to preprocess for hypervolume improvement based on the papers A Box Decomposition Algorithm to Compute the Hypervolume Indicator and Towards Efficient Multiobjective Optimization: Multiobjective statistical criterions.

_get_upper_bound_set

Basically, the main algorithm (_get_upper_bound_set) is based on the following pseudocode quoted from Algorithm 2 of [1]:

image

Figure 1 of [1] visualizes upper_bound_set as the black dots and the Pareto solutions (lower is better) as a cross.

image

_get_hyper_rectangle_bounds

This function calculates a set of diagonal points in the gray rectangles in the figure above, which is based the equation (2) in [1]:

image

One of B(u) corresponds to a gray rectangle.

_get_non_dominated_hyper_rectangle_bounds

This calculates a set of diagonal points in the white rectangles in Figure 1 of [2].
To do so, we perform _get_upper_bound_set twice: once for $l_k$ below and the other for $u_k$ below.

image

Note

[2] considers the maximization problem unlike Optuna and [1].

[1] A Box Decomposition Algorithm to Compute the Hypervolume Indicator
[2] Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization

@nabenabe0928 nabenabe0928 marked this pull request as draft April 11, 2025 07:53
@nabenabe0928 nabenabe0928 marked this pull request as ready for review April 15, 2025 06:15
@nabenabe0928 nabenabe0928 added the feature Change that does not break compatibility, but affects the public interfaces. label Apr 16, 2025
@nabenabe0928 nabenabe0928 added this to the v4.4.0 milestone Apr 16, 2025
@c-bata
Copy link
Copy Markdown
Member

c-bata commented Apr 17, 2025

@kAIto47802 @HideakiImamura Could you review this PR?

Copy link
Copy Markdown
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I've just started the review, but have several comments about unit tests. PTAL.

@nabenabe0928 nabenabe0928 force-pushed the feat/add-preproc-for-hv-improvement branch from 0039885 to 586c002 Compare April 23, 2025 05:42
@HideakiImamura
Copy link
Copy Markdown
Member

Could you give us some benchmark results to calculate the hypervolume based on the existing WFG and newly introduced one?

@nabenabe0928
Copy link
Copy Markdown
Contributor Author

nabenabe0928 commented Apr 23, 2025

Benchmarking Code
from __future__ import annotations

import time
from typing import Protocol

import numpy as np

from optuna._hypervolume.box_decomposition import get_non_dominated_box_bounds
from optuna._hypervolume.wfg import compute_hypervolume
from optuna.study._multi_objective import _is_pareto_front


_EPS = 1e-12


class InstanceGenerator(Protocol):
    def __call__(self, n_trials: int, n_objectives: int, seed: int) -> np.ndarray:
        raise NotImplementedError


def _extract_pareto_sols(loss_values: np.ndarray) -> np.ndarray:
    sorted_loss_vals = np.unique(loss_values, axis=0)
    on_front = _is_pareto_front(sorted_loss_vals, assume_unique_lexsorted=True)
    return sorted_loss_vals[on_front]


def _generate_uniform_samples(n_trials: int, n_objectives: int, seed: int) -> np.ndarray:
    rng = np.random.RandomState(seed)
    return np.maximum(_EPS, rng.random((n_trials, n_objectives)))


def _generate_concave_instances(n_trials: int, n_objectives: int, seed: int) -> np.ndarray:
    """See Section 4.2 of https://arxiv.org/pdf/1510.01963"""
    points = _generate_uniform_samples(n_trials, n_objectives, seed)
    loss_values = points / np.maximum(_EPS, np.sum(points**2, axis=-1))[:, np.newaxis]
    return loss_values


def _generate_convex_instances(n_trials: int, n_objectives: int, seed: int) -> np.ndarray:
    """See Section 4.2 of https://arxiv.org/pdf/1510.01963"""
    points = _generate_uniform_samples(n_trials, n_objectives, seed)
    loss_values = 1 - points / np.maximum(_EPS, np.sum(points**2, axis=-1))[:, np.newaxis]
    return loss_values


def _generate_linear_instances(n_trials: int, n_objectives: int, seed: int) -> np.ndarray:
    """See Section 4.2 of https://arxiv.org/pdf/1510.01963"""
    points = _generate_uniform_samples(n_trials, n_objectives, seed)
    loss_values = points / np.maximum(_EPS, np.sum(points, axis=-1))[:, np.newaxis]
    return loss_values


def _calculate_hypervolume_improvement(
    lbs: np.ndarray, ubs: np.ndarray, new_sols: np.ndarray
) -> np.ndarray:
    diff = np.maximum(0.0, ubs - np.maximum(new_sols[..., np.newaxis, :], lbs))
    # The minimization version of Eq. (1) in https://arxiv.org/pdf/2006.05078.
    return np.sum(np.prod(diff, axis=-1), axis=-1)


def benchmark_wfg(
    pareto_sols: np.ndarray, new_sols: np.ndarray, ref_point: np.ndarray
) -> np.ndarray:
    hv = compute_hypervolume(pareto_sols, ref_point, assume_pareto=True)
    hvis = np.empty(len(new_sols))
    for i, new_sol in enumerate(new_sols):
        sols = np.vstack([pareto_sols, new_sol[np.newaxis, :]])
        hvis[i] = compute_hypervolume(sols, ref_point, assume_pareto=False) - hv

    return hvis


def benchmark_box_decomposition(
    pareto_sols: np.ndarray, new_sols: np.ndarray, ref_point: np.ndarray
) -> np.ndarray:
    lbs, ubs = get_non_dominated_box_bounds(pareto_sols, ref_point)
    return _calculate_hypervolume_improvement(lbs, ubs, new_sols)


def main(
    gen: InstanceGenerator, n_objectives: int, n_trials: int, n_calls: int, seed: int
) -> tuple[float, float, float]:
    pareto_sols = _extract_pareto_sols(gen(n_trials=n_trials, n_objectives=n_objectives, seed=seed))
    new_sols = gen(n_trials=n_calls, n_objectives=n_objectives, seed=seed + 100)
    ref_point = np.max(np.vstack([pareto_sols, new_sols]), axis=0)
    ref_point = np.maximum(ref_point * 0.9, ref_point * 1.1)

    start = time.time()
    out1 = benchmark_box_decomposition(pareto_sols, new_sols, ref_point)
    time_box_decomp = (time.time() - start) * 1000
    print(f"Box Decomp.: {time_box_decomp:.3e} [ms]")

    start = time.time()
    out2 = benchmark_wfg(pareto_sols, new_sols, ref_point)
    print(f"WFG: {(time.time() - start)*1000:.3e} [ms]")
    time_wfg = (time.time() - start) * 1000

    rtol = 1e-5  # 1e-5 is the default rtol in np.allclose.
    while not np.allclose(out1, out2, rtol=rtol):
        rtol *= 10.0

    return time_box_decomp, time_wfg, rtol


if __name__ == "__main__":
    import itertools

    import pandas as pd

    rows = []
    for (gen, n_objectives, n_trials, n_calls, seed) in itertools.product(*(
        [_generate_concave_instances, _generate_convex_instances, _generate_linear_instances],
        range(2, 5),
        [10, 30, 100],
        [1000, 3000, 10000],
        range(5),
    )):
        print(gen.__name__, f"{n_objectives=}, {n_trials=}, {n_calls=}, {seed=}")
        t_bd, t_wfg, rtol = main(gen, n_objectives, n_trials, n_calls, seed=seed)
        row = {
            "instance_name": gen.__name__,
            "n_objectives": n_objectives,
            "n_trials": n_trials,
            "n_calls": n_calls,
            "runtime_wfg": t_wfg,
            "runtime_bd": t_bd,
            "rtol": rtol,
            "seed": seed,
        }
        rows.append(row)

    pd.DataFrame(rows).to_json("hvi-benchmarking-results.json")

Note

The result file is available here:
hvi-benchmarking-results.json

@nabenabe0928
Copy link
Copy Markdown
Contributor Author

nabenabe0928 commented Apr 24, 2025

@HideakiImamura

I believe that n_calls refers to the number of candidate points sampled via Monte Carlo when optimizing the acquisition function.

Yes and No, indeed.
n_calls is intended for n_qmc_samples * n_preliminary_samples, which becomes 262144 when using the default n_qmc_samples in Botorch, i.e., 128, and the default n_preliminary_samples in Optuna, i.e., 2048, but not just n_qmc_samples.

In this sense, my benchmarking is already too soft for the real running environment where we actually receive n_calls=262144.
But if you prefer, I can add some experiments only for box decomposition.
As can be seen in the benchmarking results, the runtime for WFG grows superlinearly, so I am pretty sure it does not make sense to benchmark WFG in this setup as it is obvious that WFG will not be practical anymore.

Copy link
Copy Markdown
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I checked the logic. Basically, LGTM. Could you take a look at my comments?

nabenabe0928 and others added 4 commits April 24, 2025 10:55
Co-authored-by: Hideaki Imamura <38826298+HideakiImamura@users.noreply.github.com>
Co-authored-by: Hideaki Imamura <38826298+HideakiImamura@users.noreply.github.com>
Copy link
Copy Markdown
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Almost, LGTM. I have a comment about the _get_non_dominated_box_bounds. PTAL.

Copy link
Copy Markdown
Collaborator

@kAIto47802 kAIto47802 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR! I'm still reviewing the details, but leaving my current comments for now. PTAL :octocat:

Copy link
Copy Markdown
Collaborator

@kAIto47802 kAIto47802 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the unit tests, leaving a comment. PTAL :octocat:

Copy link
Copy Markdown
Collaborator

@kAIto47802 kAIto47802 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left other comments. PTAL :octocat:

nabenabe0928 and others added 2 commits April 25, 2025 14:11
Co-authored-by: kAIto47802 <115693559+kAIto47802@users.noreply.github.com>
Co-authored-by: kAIto47802 <115693559+kAIto47802@users.noreply.github.com>
@nabenabe0928
Copy link
Copy Markdown
Contributor Author

nabenabe0928 commented Apr 26, 2025

@kAIto47802
Ok, so I could prove the validity of this part:

Here, I will describe why this processing is valid.

=========================================

Here, we use the following notations:

  1. $r \in \mathbb{R}^M$, a reference point,
  2. $x \in [l, r)$, a vector defined in the objective space,
  3. $x \preceq y$, the weak dominance of $x$ over $y$, i.e., $x_m \leq y_m$ for all $m \in [M] \coloneqq \{1,\dots,M\}$,
  4. $x \prec y$, the strong dominance of $x$ over $y$, i.e., $x \preceq y$ and $x_m &lt; y_m$ for some $m \in [M]$,
  5. $P \coloneqq \{x_n\}_{n=1}^N$, the Pareto set, i.e., $x_i \npreceq y_i$ for any pair $(i, j) \in [N] \times [N]$ where $x_n \in [l, r)$ for all $n \in [N]$,
  6. $D(P) \coloneqq \{z \in [l, r) | \exists x \in P, x \preceq z\}$, the dominated space of $P$,
  7. $\partial D(P)$, the boundary of $D(P)$ except for the boundary of $[l, r)$. and
  8. $ND(P) \coloneqq \partial D(P) \cup ([l, r) \setminus D(P))$, the non-dominated space of $P$.

Important

We assume that if we point all the Pareto solutions in a space $X$, we can perform the box decomposition of $X$ and that Algorithm 2 of the original paper identifies the upper bound set $U(P)$ accurately.

To perform the box decomposition of the non-dominated space $ND(P)$, we need to enumerate all the Pareto solutions in the non-dominated space $-ND(P)$, leading to the statement below.

Statement

Given a Pareto set $P$, the Pareto maximal set in the non-dominated space $ND(P)$ is a subset of $U(P)$.

Proof

image

Since (P1) states that the upper bound set is composed of the points that either touch $D(P)$ or are not in $D(P)$, it means that the set that satisfies (P1) is exactly $ND(P)$.

Hence, if we can say that (P2) is equivalent to the Pareto maximality in $ND(P)$, the proof will be completed.

The Pareto maximal set in $ND(P)$ is $\{x \in ND(P) | \nexists z \in ND(P), x \leq z\}$.

(P2) claims that if there exists $u^\prime \in [l, r)$ s.t. $u^\prime \geq u$, such $u^\prime$ does not satisfy (P1), meaning that if $u^\prime \in ND(P)$, i.e., to satisfy (P1), $u^\prime \geq u$ cannot hold for any $u \in ND(P)$.

Therefore, if $U(P)$ satisfies (P1) and (P2), $U(P)$ includes all the Pareto maximal solutions in $ND(P)$ and this completes the proof.

=========================================

Copy link
Copy Markdown
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! LGTM!

@HideakiImamura HideakiImamura removed their assignment Apr 28, 2025
@kAIto47802
Copy link
Copy Markdown
Collaborator

kAIto47802 commented Apr 28, 2025

Thank you for your update and the proof!
I confirmed that the statement holds by carefully following the definition of the upper bound set and the non-dominated space. Thus, with the commit 9aae8c1, which restricts to only Pareto solutions, the validity I mentioned above has been verified.
Now, LGTM!

@nabenabe0928 nabenabe0928 merged commit 48c4d95 into optuna:master Apr 28, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Change that does not break compatibility, but affects the public interfaces.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants