Conversation
|
@sawa3030 @y0z @HideakiImamura Could you review this PR? |
|
@hrntsm Thank you for your PR! I’ve left some comments—please take a look when you have time! |
|
@sawa3030 |
| else: | ||
| child_params_list.append(c2_i) | ||
| if rng.rand() < self._establishment: | ||
| options = (c1_i, c2_i) if rng.rand() < self._probability else (c2_i, c1_i) |
There was a problem hiding this comment.
| options = (c1_i, c2_i) if rng.rand() < self._probability else (c2_i, c1_i) | |
| if index_prob < self._probability: | |
| child_params_list.append(c1_i) | |
| else: | |
| child_params_list.append(c2_i) |
The variable options and the function select_parameter seems unnecessary here since we could directly append c1_i or c2_i based on index_prob.
There was a problem hiding this comment.
Once I accepted the suggestion and committed. But it did not work, so I am commenting on it.
The proposed method would not result in the combination of individuals I thought it would unless I branched out with each of the following probabilities.
for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
if rng.rand() < self._probability:
if rng.rand() < self._establishment:
if index_prob < 0.5:
child_params_list.append(c1_i)
else:
child_params_list.append(c2_i)
else:
if index_prob < 0.5:
child_params_list.append(c2_i)
else:
child_params_list.append(c1_i)
else:
if rng.rand() < self._establishment:
if index_prob < 0.5:
child_params_list.append(x1_i)
else:
child_params_list.append(x2_i)
else:
if index_prob < 0.5:
child_params_list.append(x2_i)
else:
child_params_list.append(x1_i)Instead, I believe that it would be better to create the combination as it is now and then determine whether to select one or the other created after the combination is created, which would reduce duplication of code.
There was a problem hiding this comment.
Based on my understanding, the code presented in #6008 (comment) is equivalent to the code below.
Then, establishment is converted to 1 - establishment with probability 0.5 (i.e., index_prob <= 0.5).
How can I interpret this behavior? I feel that it is not intuitive to understand it from the description in the docstring " establishment is the probability of uniform crossover between two individuals selected as candidate child individuals."
index_prob = rng.rand()
child_params_list = []
establishment = self._establishment if index_prob >= 0.5 else 1 - self._establishment
for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
if rng.rand() < self._probability:
if rng.rand() < establishment:
child_params_list.append(c1_i)
else:
child_params_list.append(c2_i)
else:
if rng.rand() < establishment:
child_params_list.append(x1_i)
else:
child_params_list.append(x2_i)
child_params = np.array(child_params_list)There was a problem hiding this comment.
@y0z
As you point out, this implementation is confusing because it is simultaneously doing the change for each variable and choosing between the two candidate child individuals.
The final behavior is the same as pointed out, but it would be easier to understand the intent if it were reimplemented as follows.
- Generate two children
c1,c2based on equation(4) & (5) in SBX paper. - Check that the generated children's variable will be used.(
_probability) - Check that uniform crossover applies each children's variable.(
_establishment) - This is repeated for each variable to create two candidate child individuals.(
child1_params_list,child2_params_list) - Finally, Optuna can only return one individual, so choose one of them.
- Equivalent to
index_probbut used only once, so omitted.
- Equivalent to
child1_params_list = []
child2_params_list = []
for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
if rng.rand() < self._probability:
# If the probability of "applying" uniform crossover is `_establishment`, then the reverse of the equal sign may be more appropriate.
if rng.rand() < self._establishment:
child1_params_list.append(c1_i)
child2_params_list.append(c2_i)
else:
child1_params_list.append(c2_i)
child2_params_list.append(c1_i)
else:
if rng.rand() < self._establishment:
child1_params_list.append(x1_i)
child2_params_list.append(x2_i)
else:
child1_params_list.append(x2_i)
child2_params_list.append(x1_i)
child_params_list = child1_params_list if rng.rand() < 0.5 else child2_params_list
child_params = np.array(child_params_list)There was a problem hiding this comment.
Thank you! It is more understandable!
It also sounds good to rename the establishment uniform_crossover_prob as suggested.
|
@hrntsm Sorry about that! Can you check them now? |
HideakiImamura
left a comment
There was a problem hiding this comment.
Thanks for the PR. I have two comments. PTAL.
|
@hrntsm Thank you for your contribution. I would like to share my opinion.
Also, how about changing the default value of eta to I tried such an implementation and it seems to work well. FYI: I also fixed def crossover(
self,
parents_params: np.ndarray,
rng: np.random.RandomState,
study: Study,
search_space_bounds: np.ndarray,
) -> np.ndarray:
# https://www.researchgate.net/profile/M-M-Raghuwanshi/publication/267198495_Simulated_Binary_Crossover_with_Lognormal_Distribution/links/5576c78408ae7536375205d7/Simulated-Binary-Crossover-with-Lognormal-Distribution.pdf
# Section 2 Simulated Binary Crossover (SBX)
# To avoid generating solutions that violate the box constraints,
# alpha1, alpha2, xls and xus are introduced, unlike the reference.
xls = search_space_bounds[..., 0]
xus = search_space_bounds[..., 1]
xs_min = np.min(parents_params, axis=0)
xs_max = np.max(parents_params, axis=0)
if self._eta is None:
eta = 20.0 if study._is_multi_objective() else 1.0 # Suggestion 2: change default eta to 1.0
else:
eta = self._eta
xs_diff = np.clip(xs_max - xs_min, 1e-10, None)
beta1 = 1 + 2 * (xs_min - xls) / xs_diff
beta2 = 1 + 2 * (xus - xs_max) / xs_diff
alpha1 = 2 - np.power(beta1, -(eta + 1))
alpha2 = 2 - np.power(beta2, -(eta + 1))
us = rng.rand(len(search_space_bounds))
mask1 = us > 1 / alpha1 # Equation (3).
betaq1 = np.power(us * alpha1, 1 / (eta + 1)) # Equation (3).
betaq1[mask1] = np.power((1 / (2 - us * alpha1)), 1 / (eta + 1))[mask1] # Equation (3).
mask2 = us > 1 / alpha2 # Equation (3).
betaq2 = np.power(us * alpha2, 1 / (eta + 1)) # Equation (3)
betaq2[mask2] = np.power((1 / (2 - us * alpha2)), 1 / (eta + 1))[mask2] # Equation (3).
c1 = 0.5 * ((xs_min + xs_max) - betaq1 * xs_diff) # Equation (4).
c2 = 0.5 * ((xs_min + xs_max) + betaq2 * xs_diff) # Equation (5).
# SBX applies crossover with establishment 0.5, and with probability 0.5,
# the gene of the parent individual is the gene of the child individual.
# The original SBX creates two child individuals,
# but optuna's implementation creates only one child individual.
# Therefore, when there is no crossover,
# the gene is selected with equal probability from the parent individuals x1 and x2.
# child_params_list = []
# for c1_i, c2_i, x1_i, x2_i in zip(c1, c2, parents_params[0], parents_params[1]):
# if rng.rand() < 0.5:
# if rng.rand() < 0.5:
# child_params_list.append(c1_i)
# else:
# child_params_list.append(c2_i)
# else:
# if rng.rand() < 0.5:
# child_params_list.append(x1_i)
# else:
# child_params_list.append(x2_i)
# child_params = np.array(child_params_list)
return c1 if rng.rand() < 0.5 else c2 # Suggestion 1: return one of the two children randomly. |
Here, in K.Deb's SBX paper below, it is mentioned that n=2~5 is close to a single-point crossover in equation 19. (In this paper, n is the equivalent of eta.) Simulated Binary Crossover for Continuous Search Space This point is subject to the judgment of OptunaTeam. |
|
This is not strong opinion, but I think it is reasonable to leave the value of |
|
Thanks for providing the reference.
+1 |
|
I have been looking into which individuals to return. I have looked at several papers, but I feel that they only mention returning two generated individuals c1 and c2, as y0z mentioned. On the other hand, I checked the implementation of some optimization libraries.
After all, the two arguments other than eta were not specified in any of the papers cited in any of the implementations. Please let me know if I am missing something. As I understand, This is my opinion, but considering the current implementation of Optuna and the reproduction of the paper, I think the arguments should be eta, establishment, and probability. I am thinking either of the following two.
|
|
Thank you for your survey. Based on the discussions, introducing |
|
@sawa3030 @HideakiImamura The all checks passed, Could you please review?.
|
|
Since a probability of 0 means the child is never selected, I think it's better to ensure that 0 < |
|
@hrntsm Thank you for your updates! I have a few suggestions and clarifications I'd like to share:
|
|
The code in this PR and the one you suggested seem to have different behavior.
|
|
As for 1 & 2, please check #6008 (comment) for a summary of the intent of the implementation. As for 3(the variable name), it more accurately describes the behavior than the one I gave at the beginning, and I agree with your suggestion. |
|
Based on the comments, I have modified the docstring and implementation for clarity. |
y0z
left a comment
There was a problem hiding this comment.
Thank you for your careful discussion.
LGTM.
|
Thank you so much for your clear classification and explanation — it really helped me understand the logic. |
I once thought the same thing, but the “probability” range of [0, 0.5] seemed counter-intuitive, so I decided to use [0,1].
|
sawa3030
left a comment
There was a problem hiding this comment.
Thank you for all the discussions and clarifications. LGTM!





Motivation
I would like to reduce the difference in the distribution of children between the reference paper and optuna's SBX as shown in this discussion
Description of the changes
The fixed values of “establishment” and “probability” are used as arguments to reproduce the same distribution as in the paper.
I have two questions.
establishmentandprobabilityare names that originally appeared in comments in the code, I think that expressions such asvariable_crossover_prob, for example, would be easier to understand.