Conversation
- use super() in BaseSuccessiveHalving - allow random_state for subsampling data - more efficient subsampling
|
missing from collections import defaultdict
from itertools import productI think? |
|
in the random search it probably shouldn't be n_iter but n_candidates? Also there should be some minimum size for the data for the first iteration. |
|
I wonder if we should be using a bigger validation set in the cross-validation... hm... I guess large validation sets could also slow things down. Have you checked how other implementations do this? |
fml/search.py
Outdated
| n_candidates = len(candidate_params) | ||
| n_samples_iter = floor(n_samples_total / | ||
| (n_candidates * n_iterations)) | ||
| indices = rng.choice(n_samples_total, n_samples_iter, |
There was a problem hiding this comment.
| indices = rng.choice(n_samples_total, n_samples_iter, | |
| # this could be outside the for-loop | |
| # 2 is a magic number. I found 10 too slow and 2 seems to work fine? | |
| # basically lower bound on the test set size | |
| cv = check_cv(self.cv, y, classifier=is_classifier(self.estimator)) | |
| min_n_samples = cv.get_n_splits(X, y) * n_classes * 2 | |
| if is_classifier(self.estimator): | |
| n_samples_iter = max(n_samples_iter, min_n_samples) | |
| indices = rng.choice(n_samples_total, n_samples_iter, |
this makes this more robust for small datasets.
There was a problem hiding this comment.
ok the is_classifier here is useless and my indentation is wrong... don't commit this lol.
fml/search.py
Outdated
|
|
||
| candidate_params = list(self._generate_candidate_params()) | ||
| n_iterations = int(ceil(log2(len(candidate_params)))) | ||
| n_samples_total = X.shape[0] |
There was a problem hiding this comment.
ok so you basically set the budget to n_samples, right? where did you get that from?
|
tests are passing now on master :) |
|
btw if you can make this green I'll merge it and we can iterate. That way I can play with it on my flight more easily. |
|
CI is queued but it should be green now (obligatory it works on my machine ;)) |
|
thanks! |

Still very WIP, need feedback on the interface :)I'm now overwriting
_run_search, taking advantage of the proposed changes inBaseGridSearchfrom scikit-learn/scikit-learn#13145Note that this forces to remove the
refitparameter, which has to be set to a custom callable.cv_results_looks good and it's still useful. For e.g. 4 candidates it's going to have 4 (first iter) + 2 (second and last iter) rows.If you're OK with this new design I'll start to write tests :)