Include model fit times in learning curves #745
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Last year, scikit-learn added functionality to include model fit times when computing learning curves since – in addition to the model's performance – it's also quite useful to know how the long the model takes to train as more training data was added. This PR now adds the same functionality to SKLL.
skll.utils.train_and_score()function now measures the model fit time for every model trained as part of a learning curve experiment.learning_curveexperiment. The first is the usual "score curve" that shows the training and cross-validation scores as more training data is added. The newly-added second plot is a "time curve" that shows how the model fit times change as more training data is added. The format for this new curve's name is:<experiment>_<featureset>_times.png.skll.experiments.output.generate_learning_curve_plotsfunction. It now only pre-processes the score and time data to create data frames. The two curves (score and time) are now generated by two private functions:skll.experiments.output._generate_learning_curve_score_plotsandskll.experiments.output._generate_learning_curve_time_plots.As always, the best way to review is to try this out in the examples. As a starting point, if you want to replicate the same example, you can modify the Titanic example's
learning_curve.cfgfile as shown below and then look at theTitanic_Learning_Curve_all.pngandTitanic_Learning_Curve_all_times.pngfiles in theoutputdirectory.This PR closes #556.