Skip to content

Refactor and speed up HV3D#6124

Merged
nabenabe0928 merged 12 commits intooptuna:masterfrom
nabenabe0928:code-fix/refactor-hv-3d
Jun 6, 2025
Merged

Refactor and speed up HV3D#6124
nabenabe0928 merged 12 commits intooptuna:masterfrom
nabenabe0928:code-fix/refactor-hv-3d

Conversation

@nabenabe0928
Copy link
Copy Markdown
Contributor

@nabenabe0928 nabenabe0928 commented Jun 3, 2025

Note

This branch completed the script below in 31.02 seconds while the latest HEAD completed it in 36.84 seconds, resulting in 15.8% speedup.

Motivation

This PR is a followup of:

Description of the changes

  • Replace inline comments by # with """
  • Use more NumPy vectorization to speed up

In essence, here:

z_delta[np.arange(n), y_indices] = reference_point[2] - sorted_pareto_sols[:, 2]
  • Tweak a bit to reduce NumPy method calls to speed up

In essence, here:

z_delta = np.maximum.accumulate(np.maximum.accumulate(z_delta, axis=0), axis=1)

and here:

x_delta = np.concatenate([x_vals[1:], reference_point[:1]]) - x_vals
y_delta = np.concatenate([y_vals[1:], reference_point[1:2]]) - y_vals
  • Use more efficient NumPy methods, i.e., np.vdot(A, B) instead of np.sum(A * B) and np.concatenate instead of np.append.
Benchmarking Code
import optuna


def objective(trial: optuna.Trial) -> tuple[float, float, float]:
    x = trial.suggest_float("x", -5, 5)
    y = trial.suggest_float("y", -5, 5)
    return x**2 + y**2, (x - 2)**2 + (y - 2)**2, (x + 2)**2 + (y + 2)**2


sampler = optuna.samplers.TPESampler(seed=42)
study = optuna.create_study(sampler=sampler, directions=["minimize"]*3)
study.optimize(objective, n_trials=1000)
trials = study.trials
print((trials[-1].datetime_complete - trials[0].datetime_start).total_seconds())

@nabenabe0928 nabenabe0928 marked this pull request as ready for review June 3, 2025 22:31
@y0z y0z assigned not522, sawa3030 and kAIto47802 and unassigned sawa3030 Jun 4, 2025
@y0z
Copy link
Copy Markdown
Member

y0z commented Jun 4, 2025

@not522 @kAIto47802 Could you review this PR?

@y0z y0z added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label Jun 4, 2025
x_delta = np.concatenate([x_vals[1:], reference_point[:1]]) - x_vals
y_delta = np.concatenate([y_vals[1:], reference_point[1:2]]) - y_vals
# NOTE(nabenabe): `np.vdot(A, B)` is a faster calculation of `np.sum(A * B)`.
return np.vdot(x_delta[:, np.newaxis] * y_delta, z_delta)
Copy link
Copy Markdown
Contributor Author

@nabenabe0928 nabenabe0928 Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems sorted_pareto_sols.shape[0] is < 25 due to the default_gamma.

In [1]: import numpy as np
   ...: 
   ...: A = np.random.random((30, 30))
   ...: %timeit np.sum(A * A)
   ...: %timeit np.vdot(A, A)
2.08 μs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
479 ns ± 6.95 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

y_indices, y_vals = _compress_coordinate(sorted_pareto_sols[:, 1])
z_delta = np.zeros((n, n), dtype=float)
z_delta[np.arange(n), y_indices] = reference_point[2] - sorted_pareto_sols[:, 2]
z_delta = np.maximum.accumulate(np.maximum.accumulate(z_delta, axis=0), axis=1)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [2]: import numpy as np
   ...: 
   ...: 
   ...: A = np.random.random((30, 30))
   ...: %timeit 1.0 - np.minimum.accumulate(np.minimum.accumulate(A, axis=0), axis=1
   ...: )
   ...: %timeit np.maximum.accumulate(np.maximum.accumulate(A, axis=0), axis=1)
7.16 μs ± 52.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
6.61 μs ± 54.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Comment on lines +51 to +52
z_delta = np.zeros((n, n), dtype=float)
z_delta[np.arange(n), y_indices] = reference_point[2] - sorted_pareto_sols[:, 2]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [4]: import numpy as np
   ...: 
   ...: 
   ...: n = 30
   ...: a = np.random.random(n)
   ...: 
   ...: def original():
   ...:     A = np.full((n, n), 1.0)
   ...:     for i in range(n):
   ...:         A[i, i] = a[i]
   ...:     A = 1.0 - A
   ...: 
   ...: def this_pr():
   ...:     inds = np.arange(n)
   ...:     A = np.zeros((n, n), dtype=float)
   ...:     A[inds, inds] = 1.0 - a
   ...: 
   ...: %timeit original()
   ...: %timeit this_pr()
4.26 μs ± 41 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.63 μs ± 17.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

z_delta = np.maximum.accumulate(np.maximum.accumulate(z_delta, axis=0), axis=1)
# The x axis is already sorted, so no need to compress this coordinate.
x_vals = sorted_pareto_sols[:, 0]
x_delta = np.concatenate([x_vals[1:], reference_point[:1]]) - x_vals
Copy link
Copy Markdown
Contributor Author

@nabenabe0928 nabenabe0928 Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes only a slight speedup (2%).

In [5]: import numpy as np
   ...: 
   ...: 
   ...: a = np.random.random(30)
   ...: 
   ...: def original():
   ...:     b = np.concatenate([a, [1.0]])
   ...:     b[1:] - b[:-1]
   ...: 
   ...: def this_pr():
   ...:     np.concatenate([a[1:], [1.0]]) - a
   ...: 
   ...: %timeit original()
   ...: %timeit this_pr()   

972 ns ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
952 ns ± 6.55 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Copy link
Copy Markdown
Member

@not522 not522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check my comments? They will improve the speed performance.

nabenabe0928 and others added 4 commits June 4, 2025 07:23
Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>
Co-authored-by: Naoto Mizuno <naotomizuno@preferred.jp>
x_delta = np.concatenate([x_vals[1:], reference_point[:1]]) - x_vals
y_delta = np.concatenate([y_vals[1:], reference_point[1:2]]) - y_vals
# NOTE(nabenabe): Below is a faster calculation of `np.sum(A * B)`.
return np.dot(np.dot(z_delta, y_delta), x_delta)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [3]: import numpy as np
   ...: 
   ...: 
   ...: a = np.random.random(30)
   ...: b = np.random.random(30)
   ...: C = np.random.random((30, 30))
   ...: %timeit np.dot(np.dot(C, b), a)
   ...: %timeit np.vdot(a[:, None] * b, C)
   ...: %timeit np.sum(C * a[:, None] * b[None, :], axis=(0, 1))
761 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
1.99 μs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
4.31 μs ± 4.69 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Copy link
Copy Markdown
Member

@not522 not522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@not522 not522 removed their assignment Jun 4, 2025
@kAIto47802
Copy link
Copy Markdown
Collaborator

kAIto47802 commented Jun 6, 2025

Thank you for your PR!
I've confirmed that this PR, along with the original one:

indeed improves the benchmarking time.

I used the same objective function as you described in the initial comment.
The benchmarking result is as follows:

The result of the benchmarking. "Original Master" refers to the master branch before any changes, "Latest Master" refers to the current master branch including #6112, and "This PR" refers to the branch introduced in this pull request. The solid lines denote the mean, and the translucent areas denote the standard error, both computed over five independent runs with different random seeds.

Copy link
Copy Markdown
Collaborator

@kAIto47802 kAIto47802 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nabenabe0928 nabenabe0928 merged commit 7228f77 into optuna:master Jun 6, 2025
14 checks passed
@nabenabe0928 nabenabe0928 added this to the v4.4.0 milestone Jun 9, 2025
@kAIto47802 kAIto47802 removed their assignment Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Change that does not break compatibility and not affect public interfaces, but improves performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants