Skip to content

Enhance SBXCrossover#6008

Merged
y0z merged 8 commits intooptuna:masterfrom
hrntsm:update-sbx
Apr 7, 2025
Merged

Enhance SBXCrossover#6008
y0z merged 8 commits intooptuna:masterfrom
hrntsm:update-sbx

Conversation

@hrntsm
Copy link
Copy Markdown
Contributor

@hrntsm hrntsm commented Mar 12, 2025

Motivation

I would like to reduce the difference in the distribution of children between the reference paper and optuna's SBX as shown in this discussion

Description of the changes

The fixed values of “establishment” and “probability” are used as arguments to reproduce the same distribution as in the paper.

I have two questions.

  1. Would this change be better for Optunahub?
  2. Although establishment and probability are names that originally appeared in comments in the code, I think that expressions such as variable_crossover_prob, for example, would be easier to understand.

@c-bata
Copy link
Copy Markdown
Member

c-bata commented Mar 17, 2025

@sawa3030 @y0z @HideakiImamura Could you review this PR?

@c-bata c-bata added the feature Change that does not break compatibility, but affects the public interfaces. label Mar 17, 2025
@sawa3030
Copy link
Copy Markdown
Collaborator

@hrntsm Thank you for your PR! I’ve left some comments—please take a look when you have time!

@hrntsm
Copy link
Copy Markdown
Contributor Author

hrntsm commented Mar 20, 2025

@sawa3030
Thanks for the review.
There does not appear to be any comments specifically attached, where can I check?

else:
child_params_list.append(c2_i)
if rng.rand() < self._establishment:
options = (c1_i, c2_i) if rng.rand() < self._probability else (c2_i, c1_i)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
options = (c1_i, c2_i) if rng.rand() < self._probability else (c2_i, c1_i)
if index_prob < self._probability:
child_params_list.append(c1_i)
else:
child_params_list.append(c2_i)

The variable options and the function select_parameter seems unnecessary here since we could directly append c1_i or c2_i based on index_prob.

Copy link
Copy Markdown
Contributor Author

@hrntsm hrntsm Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sawa3030

Once I accepted the suggestion and committed. But it did not work, so I am commenting on it.

The proposed method would not result in the combination of individuals I thought it would unless I branched out with each of the following probabilities.

        for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
            if rng.rand() < self._probability:
                if rng.rand() < self._establishment:
                    if index_prob < 0.5:
                        child_params_list.append(c1_i)
                    else:
                        child_params_list.append(c2_i)
                else:
                    if index_prob < 0.5:
                        child_params_list.append(c2_i)
                    else:
                        child_params_list.append(c1_i)
            else:
                if rng.rand() < self._establishment:
                    if index_prob < 0.5:
                        child_params_list.append(x1_i)
                    else:
                        child_params_list.append(x2_i)
                else:
                    if index_prob < 0.5:
                        child_params_list.append(x2_i)
                    else:
                        child_params_list.append(x1_i)

Instead, I believe that it would be better to create the combination as it is now and then determine whether to select one or the other created after the combination is created, which would reduce duplication of code.

Copy link
Copy Markdown
Member

@y0z y0z Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding, the code presented in #6008 (comment) is equivalent to the code below.
Then, establishment is converted to 1 - establishment with probability 0.5 (i.e., index_prob <= 0.5).
How can I interpret this behavior? I feel that it is not intuitive to understand it from the description in the docstring " establishment is the probability of uniform crossover between two individuals selected as candidate child individuals."

        index_prob = rng.rand()
        child_params_list = []
        establishment = self._establishment if index_prob >= 0.5 else 1 - self._establishment
        for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
            if rng.rand() < self._probability:
                if rng.rand() < establishment:
                    child_params_list.append(c1_i)
                else:
                    child_params_list.append(c2_i)
            else:
                if rng.rand() < establishment:
                    child_params_list.append(x1_i)
                else:
                    child_params_list.append(x2_i)
        child_params = np.array(child_params_list)

Copy link
Copy Markdown
Contributor Author

@hrntsm hrntsm Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@y0z
As you point out, this implementation is confusing because it is simultaneously doing the change for each variable and choosing between the two candidate child individuals.
The final behavior is the same as pointed out, but it would be easier to understand the intent if it were reimplemented as follows.

  1. Generate two children c1, c2 based on equation(4) & (5) in SBX paper.
  2. Check that the generated children's variable will be used.(_probability)
  3. Check that uniform crossover applies each children's variable.(_establishment)
  4. This is repeated for each variable to create two candidate child individuals.(child1_params_list, child2_params_list)
  5. Finally, Optuna can only return one individual, so choose one of them.
    • Equivalent to index_prob but used only once, so omitted.
        child1_params_list = []
        child2_params_list = []

        for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
            if rng.rand() < self._probability:
                # If the probability of "applying" uniform crossover is `_establishment`, then the reverse of the equal sign may be more appropriate.
                if rng.rand() < self._establishment:
                    child1_params_list.append(c1_i)
                    child2_params_list.append(c2_i)
                else:
                    child1_params_list.append(c2_i)
                    child2_params_list.append(c1_i)
            else:
                if rng.rand() < self._establishment:
                    child1_params_list.append(x1_i)
                    child2_params_list.append(x2_i)
                else:
                    child1_params_list.append(x2_i)
                    child2_params_list.append(x1_i)

        child_params_list = child1_params_list if rng.rand() < 0.5 else child2_params_list
        child_params = np.array(child_params_list)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It is more understandable!
It also sounds good to rename the establishment uniform_crossover_prob as suggested.

@sawa3030
Copy link
Copy Markdown
Collaborator

@hrntsm Sorry about that! Can you check them now?

Copy link
Copy Markdown
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have two comments. PTAL.

@y0z
Copy link
Copy Markdown
Member

y0z commented Mar 21, 2025

@hrntsm Thank you for your contribution.

I would like to share my opinion.

SBXCrossOver generates two children in the original. However, Optuna's Operator must generate only one child. Therefore, the existing implementation generates a child that is a mix of the two children. To get closer to the original implementation, it seems to be better to randomly generate one of the two children.

Also, how about changing the default value of eta to 1 for single-objective optimization? Currently, the setting is 2, but I couldn't find a reason for it.
https://github.com/optuna/optuna/blob/master/optuna/samplers/nsgaii/_crossovers/_sbx.py#L57

I tried such an implementation and it seems to work well.

FYI: I also fixed vSBXCrossOver (#6000 (reply in thread)).

    def crossover(
        self,
        parents_params: np.ndarray,
        rng: np.random.RandomState,
        study: Study,
        search_space_bounds: np.ndarray,
    ) -> np.ndarray:
        # https://www.researchgate.net/profile/M-M-Raghuwanshi/publication/267198495_Simulated_Binary_Crossover_with_Lognormal_Distribution/links/5576c78408ae7536375205d7/Simulated-Binary-Crossover-with-Lognormal-Distribution.pdf
        # Section 2 Simulated Binary Crossover (SBX)

        # To avoid generating solutions that violate the box constraints,
        # alpha1, alpha2, xls and xus are introduced, unlike the reference.
        xls = search_space_bounds[..., 0]
        xus = search_space_bounds[..., 1]

        xs_min = np.min(parents_params, axis=0)
        xs_max = np.max(parents_params, axis=0)
        if self._eta is None:
            eta = 20.0 if study._is_multi_objective() else 1.0  # Suggestion 2: change default eta to 1.0
        else:
            eta = self._eta

        xs_diff = np.clip(xs_max - xs_min, 1e-10, None)
        beta1 = 1 + 2 * (xs_min - xls) / xs_diff
        beta2 = 1 + 2 * (xus - xs_max) / xs_diff
        alpha1 = 2 - np.power(beta1, -(eta + 1))
        alpha2 = 2 - np.power(beta2, -(eta + 1))

        us = rng.rand(len(search_space_bounds))
        mask1 = us > 1 / alpha1  # Equation (3).
        betaq1 = np.power(us * alpha1, 1 / (eta + 1))  # Equation (3).
        betaq1[mask1] = np.power((1 / (2 - us * alpha1)), 1 / (eta + 1))[mask1]  # Equation (3).

        mask2 = us > 1 / alpha2  # Equation (3).
        betaq2 = np.power(us * alpha2, 1 / (eta + 1))  # Equation (3)
        betaq2[mask2] = np.power((1 / (2 - us * alpha2)), 1 / (eta + 1))[mask2]  # Equation (3).

        c1 = 0.5 * ((xs_min + xs_max) - betaq1 * xs_diff)  # Equation (4).
        c2 = 0.5 * ((xs_min + xs_max) + betaq2 * xs_diff)  # Equation (5).

        # SBX applies crossover with establishment 0.5, and with probability 0.5,
        # the gene of the parent individual is the gene of the child individual.
        # The original SBX creates two child individuals,
        # but optuna's implementation creates only one child individual.
        # Therefore, when there is no crossover,
        # the gene is selected with equal probability from the parent individuals x1 and x2.

        # child_params_list = []
        # for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
        #     if rng.rand() < 0.5:
        #         if rng.rand() < 0.5:
        #             child_params_list.append(c1_i)
        #         else:
        #             child_params_list.append(c2_i)
        #     else:
        #         if rng.rand() < 0.5:
        #             child_params_list.append(x1_i)
        #         else:
        #             child_params_list.append(x2_i)
        # child_params = np.array(child_params_list)

        return c1 if rng.rand() < 0.5 else c2  # Suggestion 1: return one of the two children randomly.

Figure_1

@hrntsm
Copy link
Copy Markdown
Contributor Author

hrntsm commented Mar 24, 2025

Also, how about changing the default value of eta to 1 for single-objective optimization? Currently, the setting is 2, but I couldn't find a reason for it. https://github.com/optuna/optuna/blob/master/optuna/samplers/nsgaii/_crossovers/_sbx.py#L57

Here, in K.Deb's SBX paper below, it is mentioned that n=2~5 is close to a single-point crossover in equation 19. (In this paper, n is the equivalent of eta.)
There is no direct mention of good or bad, but I am wondering if it is safe to leave it at 2 for continuity with previous implementations.

Simulated Binary Crossover for Continuous Search Space

This point is subject to the judgment of OptunaTeam.

@HideakiImamura
Copy link
Copy Markdown
Member

This is not strong opinion, but I think it is reasonable to leave the value of eta as it is (i.e., 2) since it is suggested to set the value in the range of [2, 5] according to the Simulated Binary Crossover for Continuous Search Space.

@y0z
Copy link
Copy Markdown
Member

y0z commented Mar 25, 2025

Thanks for providing the reference.

I think it is reasonable to leave the value of eta as it is (i.e., 2)

+1

@hrntsm
Copy link
Copy Markdown
Contributor Author

hrntsm commented Mar 25, 2025

I have been looking into which individuals to return. I have looked at several papers, but I feel that they only mention returning two generated individuals c1 and c2, as y0z mentioned.

On the other hand, I checked the implementation of some optimization libraries.

lib name comment
DEAP The only variable is eta, and the two individuals generated are returned as is.
pymoo The variables are eta, prob_bin(establishment), prob_var(probability), similar to the implementation as in this PR.
jMetal Same as the current implementation of Optuna, with establishment and probability fixed at 0.5
PlatEMO The only variable is eta, and the two individuals generated are returned.

After all, the two arguments other than eta were not specified in any of the papers cited in any of the implementations. Please let me know if I am missing something.

As I understand, establishment is the probability of binary crossover (uniform crossover) among 2 parents and 2 children, probability is the probability of applying the SBX crossover (1 - probability of using the parent values as it is)

This is my opinion, but considering the current implementation of Optuna and the reproduction of the paper, I think the arguments should be eta, establishment, and probability.
On the other hand, I thought we need to discuss what the default values should be.

I am thinking either of the following two.

  1. The value as it is in the current implimentation (eta=2, establishment=0.5, probability=0.5)
  2. A value that reproduces the distribution of the paper (eta=2, establishment=0.0, probability=1.0)

@y0z
Copy link
Copy Markdown
Member

y0z commented Mar 31, 2025

@hrntsm

Thank you for your survey.

Based on the discussions, introducing establishment and probability parameters with default values that are the same as in the current implementation, i.e., not changing the current Optuna behavior, seems good.

@hrntsm
Copy link
Copy Markdown
Contributor Author

hrntsm commented Mar 31, 2025

@sawa3030 @HideakiImamura
Thank you for all your comments!

The all checks passed, Could you please review?.
The implementation is as follows. The default has been the same result so far.

default(eta=2,establishment=probability=0.5) reproduct ref paper(eta=1,establishment=0.0,probability=1)
Screenshot 2025-03-31 at 14 58 26 Screenshot 2025-03-31 at 14 57 58

@sawa3030
Copy link
Copy Markdown
Collaborator

sawa3030 commented Apr 2, 2025

Since a probability of 0 means the child is never selected, I think it's better to ensure that 0 < _probability ≤ 1.

@sawa3030
Copy link
Copy Markdown
Collaborator

sawa3030 commented Apr 3, 2025

@hrntsm Thank you for your updates! I have a few suggestions and clarifications I'd like to share:

  1. As discussed in this comment, the implementation of this PR behaves the same when _establishment is set to 0 or 1, which suggests that the index_prob variable may be unnecessary. I suggest simplifying the logic as shown below. This approach preserves the current Optuna behavior, as noted here, and could also align with the distribution described in the original paper, as discussed here.

    for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
        if rng.rand() < self._probability:
            if rng.rand() < self._establishment:
                child_params_list.append(c1_i)
            else:
                child_params_list.append(c2_i)
        else:
            if rng.rand() < self._establishment:
                child_params_list.append(x1_i)
            else:
                child_params_list.append(x2_i)
    
  2. In both the current implementation and the code above, c1 is implicitly associated with x1, and c2 with x2. To be more precise, it might be better to decouple these associations and consider introducing separate control variable for selecting between the two child candidates (c1 and c2) and between the two parent values (x1 and x2).

  3. As pointed out in the description of this PR, the variable name establishment doesn't clearly reflect its actual role in the crossover operation. I would suggest renaming it to uniform_crossover_prob, and likewise renaming probability to use_child_gene_prob to better communicate their intent. Of course, these are just suggestions — I'm happy to discuss other naming ideas if anyone has alternatives.

@y0z
Copy link
Copy Markdown
Member

y0z commented Apr 3, 2025

@sawa3030

The code in this PR and the one you suggested seem to have different behavior.
With establishment=0.1 and probability=0.5, we find the difference.
So, this change is not suitable for preserving the current behavior.

As discussed in #6008 (comment), the implementation of this PR behaves the same when _establishment is set to 0 or 1, which suggests that the index_prob variable may be unnecessary. I suggest simplifying the logic as shown below. This approach preserves the current Optuna behavior, as noted #6008 (comment), and could also align with the distribution described in the original paper, as discussed #6000 (reply in thread).

for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
   if rng.rand() < self._probability:
       if rng.rand() < self._establishment:
           child_params_list.append(c1_i)
       else:
           child_params_list.append(c2_i)
   else:
       if rng.rand() < self._establishment:
           child_params_list.append(x1_i)
       else:
           child_params_list.append(x2_i)

Figure_1
Figure_2

@hrntsm
Copy link
Copy Markdown
Contributor Author

hrntsm commented Apr 3, 2025

@sawa3030

As for 1 & 2, please check #6008 (comment) for a summary of the intent of the implementation.

As for 3(the variable name), it more accurately describes the behavior than the one I gave at the beginning, and I agree with your suggestion.

@hrntsm
Copy link
Copy Markdown
Contributor Author

hrntsm commented Apr 3, 2025

Based on the comments, I have modified the docstring and implementation for clarity.

Co-authored-by: Yoshihiko Ozaki <30489874+y0z@users.noreply.github.com>
Copy link
Copy Markdown
Member

@y0z y0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your careful discussion.
LGTM.

@y0z y0z removed their assignment Apr 3, 2025
@sawa3030
Copy link
Copy Markdown
Collaborator

sawa3030 commented Apr 4, 2025

Thank you so much for your clear classification and explanation — it really helped me understand the logic.
I have one more question: since the behavior is essentially the same when uniform_crossover_prob is set to 0 or 1, and it’s a bit unclear how users should choose the value, would it make sense to restrict the range to [0.0, 0.5]? This isn’t a strong preference, as values greater than 0.5 don’t cause any issues in practice.

@hrntsm
Copy link
Copy Markdown
Contributor Author

hrntsm commented Apr 4, 2025

@sawa3030

would it make sense to restrict the range to [0.0, 0.5]?

I once thought the same thing, but the “probability” range of [0, 0.5] seemed counter-intuitive, so I decided to use [0,1].
I will add followings to the docstring to make it more understandable.

If the uniform_crossover_prob exceeds 0.5, the result is equivalent to 1-uniform_crossover_prob, because it returns one of the two individuals of the crossover result.

Copy link
Copy Markdown
Collaborator

@sawa3030 sawa3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the discussions and clarifications. LGTM!

@y0z y0z merged commit 19296c3 into optuna:master Apr 7, 2025
14 checks passed
@y0z y0z added this to the v4.3.0 milestone Apr 7, 2025
@hrntsm hrntsm deleted the update-sbx branch April 7, 2025 03:50
@hrntsm hrntsm mentioned this pull request Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Change that does not break compatibility, but affects the public interfaces.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants