Conversation
This allows fitting a (random) subset of the data on any partial_fit calls. This class fulfills the 'dataset subsampling' use case mentioned on page 10 of the Hyperband of the Hyperband paper (https://arxiv.org/pdf/1603.06560.pdf).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR implement?
This allows for Hyperband to work with any estimator, not only those that implement
partial_fit. It does this by fitting a (random) subset of the data passed topartial_fit.This makes sense when computationally constrained, not memory constrained. I would expect this class to be used with Hyperband (or other adaptive searches) when computationally constrained, not memory constrained.
A good use-case for this is finding the hyper-parameters for embedding data into a low-dimensional space. This tends to be computationally expensive with moderate data and has many hyper-parameters (and they matter; see "Using t-SNE effectively").
Related work
This is based off the documentation re-organization in dask#221.
TODO