-
-
Notifications
You must be signed in to change notification settings - Fork 242
Description
The basc example here
is too complicated for what is a simple optimization of one parameter.
I would advocate for a minimal working example likes this where the main advantage is just import smac,
that's it, no need to mess with anything else, to know about scenarios or to know about ConfigSpace.
It's a good starting point to add on complexity as well. Some of the most well used libaries are import x as y, i.e.
How often have you done from pandas.dataframes.concatenator import ConcatenarStrategy?
import numpy as np
import smac
from sklearn.ensemble import RandomForestClassifier
X_train, y_train = np.random.randint(2, size=(20, 2)), np.random.randint(2, size=20)
X_val, y_val = np.random.randint(2, size=(5, 2)), np.random.randint(2, size=5)
def train_random_forest(depth: int) -> float:
model = RandomForestClassifier(max_depth=depth)
model.fit(X_train, y_train)
# define the evaluation metric as return
return 1 - model.score(X_val, y_val)
if __name__ == "__main__":
best_config = smac.optimize(
train_random_forest,
config_space={"depth": [2, 100]}
n_runs=10,
)This could be strapped on fairly easly into SMAC but requires creatining functionality in ConfigSpace.
However, I think some simple implementation of this in ConfigSpace could look like:
(cat_1, cat_2, cat_3)ordinal categorical by a tuple{cat_1, cat_2, cat_3}choice categorical by a set[0, 1]lower and upper bound numerical int.[0.0, 1.0]lower and upper numerical float
This leaves out some of the power of ConfigSpace like log scale parameters, conditionals etc...
but the goal is to accomodate as much as possible within the constraints of this simplicity.
Current version:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from ConfigSpace import ConfigurationSpace
from ConfigSpace.hyperparameters import UniformIntegerHyperparameter
from smac.facade.smac_bb_facade import SMAC4BB
from smac.scenario.scenario import Scenario
X_train, y_train = np.random.randint(2, size=(20, 2)), np.random.randint(2, size=20)
X_val, y_val = np.random.randint(2, size=(5, 2)), np.random.randint(2, size=5)
def train_random_forest(config):
model = RandomForestClassifier(max_depth=config["depth"])
model.fit(X_train, y_train)
# define the evaluation metric as return
return 1 - model.score(X_val, y_val)
if __name__ == "__main__":
# Define your hyperparameters
configspace = ConfigurationSpace()
configspace.add_hyperparameter(UniformIntegerHyperparameter("depth", 2, 100))
# Provide meta data for the optimization
scenario = Scenario({
"run_obj": "quality", # Optimize quality (alternatively runtime)
"runcount-limit": 10, # Max number of function evaluations (the more the better)
"cs": configspace,
})
smac = SMAC4BB(scenario=scenario, tae_runner=train_random_forest)
best_found_config = smac.optimize()Metadata
Metadata
Assignees
Labels
Type
Projects
Status