[MRG] Successive halving by NicolasHug · Pull Request #4 · dabl/dabl

NicolasHug · 2019-02-07T22:15:40Z

~~Still very WIP, need feedback on the interface :)~~

I'm now overwriting _run_search, taking advantage of the proposed changes in BaseGridSearch from scikit-learn/scikit-learn#13145

Note that this forces to remove the refit parameter, which has to be set to a custom callable.

cv_results_ looks good and it's still useful. For e.g. 4 candidates it's going to have 4 (first iter) + 2 (second and last iter) rows.

If you're OK with this new design I'll start to write tests :)

- use super() in BaseSuccessiveHalving - allow random_state for subsampling data - more efficient subsampling

amueller · 2019-03-07T19:39:48Z

missing

from collections import defaultdict
from itertools import product

I think?

amueller · 2019-03-07T19:45:52Z

in the random search it probably shouldn't be n_iter but n_candidates?

Also there should be some minimum size for the data for the first iteration.

amueller · 2019-03-07T20:42:33Z

I wonder if we should be using a bigger validation set in the cross-validation... hm... I guess large validation sets could also slow things down. Have you checked how other implementations do this?

amueller · 2019-03-07T21:07:34Z

res = pd.DataFrame(sh.cv_results_)
res['params_str'] = res.params.apply(str)
reshape = res.pivot(index='iter', columns='params_str', values='mean_test_score')
reshape.plot(legend=False, alpha=.4, c='k')

amueller · 2019-03-07T21:44:16Z

fml/search.py

+            n_candidates = len(candidate_params)
+            n_samples_iter = floor(n_samples_total /
+                                   (n_candidates * n_iterations))
+            indices = rng.choice(n_samples_total, n_samples_iter,


Suggested change

indices = rng.choice(n_samples_total, n_samples_iter,

# this could be outside the for-loop

# 2 is a magic number. I found 10 too slow and 2 seems to work fine?

# basically lower bound on the test set size

cv = check_cv(self.cv, y, classifier=is_classifier(self.estimator))

min_n_samples = cv.get_n_splits(X, y) * n_classes * 2

if is_classifier(self.estimator):

n_samples_iter = max(n_samples_iter, min_n_samples)

indices = rng.choice(n_samples_total, n_samples_iter,

this makes this more robust for small datasets.

ok the is_classifier here is useless and my indentation is wrong... don't commit this lol.

amueller · 2019-03-08T21:48:44Z

fml/search.py

+
+        candidate_params = list(self._generate_candidate_params())
+        n_iterations = int(ceil(log2(len(candidate_params))))
+        n_samples_total = X.shape[0]


ok so you basically set the budget to n_samples, right? where did you get that from?

amueller · 2019-03-27T02:51:18Z

tests are passing now on master :)

amueller · 2019-03-27T17:08:37Z

btw if you can make this green I'll merge it and we can iterate. That way I can play with it on my flight more easily.

NicolasHug · 2019-03-27T17:34:32Z

CI is queued but it should be green now (obligatory it works on my machine ;))

amueller · 2019-03-27T19:19:17Z

thanks!

NicolasHug added 7 commits February 7, 2019 17:13

first draft from SuccessiveHalving

881d10f

Inherit from BaseSearchCV

7ffce90

bunch minor improvements:

97042b7

- use super() in BaseSuccessiveHalving - allow random_state for subsampling data - more efficient subsampling

pep8

c437cc4

some docstring

781f885

Only overwrite _run_search

52a0053

used new proposed designed with more_results dict

c1de828

amueller reviewed Mar 7, 2019

View reviewed changes

amueller reviewed Mar 8, 2019

View reviewed changes

NicolasHug added 8 commits March 21, 2019 14:35

Merge branch 'master' of github.com:amueller/fml into successivehalving

0c89e2f

Merge branch 'master' of github.com:amueller/fml into successivehalving

bfc9efa

lots of changes... still very WIP

31a92be

Cleaning and some tests

4793897

Support budget_on=some_parameter, + tests

784c86f

Support for RandomizedSearch

e3ea19c

Added checks

3dda6e0

better copying of params list

b6a990c

NicolasHug added 5 commits March 27, 2019 08:08

Added custom grid search base class

dd84b91

Merge remote-tracking branch 'upstream/master' into successivehalving

13e8c0d

Fix sampling bug + remove dulicates from candidates

06d64a7

Added example for plotting iteration scores

255a0a5

show number of samples in x axis

06646c6

fixed tests

83d61aa

NicolasHug mentioned this pull request Mar 27, 2019

[MRG] Successive halving continued #8

Closed

4 tasks

NicolasHug changed the title ~~[WIP] Successive halving~~ [MRG] Successive halving Mar 27, 2019

amueller merged commit 4b22737 into dabl:master Mar 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Successive halving#4

[MRG] Successive halving#4
amueller merged 21 commits intodabl:masterfrom
NicolasHug:successivehalving

NicolasHug commented Feb 7, 2019 •

edited

Loading

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller Mar 7, 2019

Uh oh!

amueller Mar 7, 2019

Uh oh!

amueller Mar 8, 2019

Uh oh!

amueller commented Mar 27, 2019

Uh oh!

amueller commented Mar 27, 2019

Uh oh!

NicolasHug commented Mar 27, 2019

Uh oh!

amueller commented Mar 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            indices = rng.choice(n_samples_total, n_samples_iter,
+			# this could be outside the for-loop
+			# 2 is a magic number. I found 10 too slow and 2 seems to work fine?
+			# basically lower bound on the test set size
+			cv = check_cv(self.cv, y, classifier=is_classifier(self.estimator))
+			min_n_samples = cv.get_n_splits(X, y) * n_classes * 2
+            if is_classifier(self.estimator):
+                n_samples_iter = max(n_samples_iter, min_n_samples)
+            indices = rng.choice(n_samples_total, n_samples_iter,

Conversation

NicolasHug commented Feb 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller commented Mar 7, 2019

Uh oh!

amueller Mar 7, 2019

Choose a reason for hiding this comment

Uh oh!

amueller Mar 7, 2019

Choose a reason for hiding this comment

Uh oh!

amueller Mar 8, 2019

Choose a reason for hiding this comment

Uh oh!

amueller commented Mar 27, 2019

Uh oh!

amueller commented Mar 27, 2019

Uh oh!

NicolasHug commented Mar 27, 2019

Uh oh!

amueller commented Mar 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NicolasHug commented Feb 7, 2019 •

edited

Loading