Include model fit times in learning curves #745

desilinguist · 2023-06-26T16:15:18Z

Last year, scikit-learn added functionality to include model fit times when computing learning curves since – in addition to the model's performance – it's also quite useful to know how the long the model takes to train as more training data was added. This PR now adds the same functionality to SKLL.

The skll.utils.train_and_score() function now measures the model fit time for every model trained as part of a learning curve experiment.
We now generate two plots for each featureset for a learning_curve experiment. The first is the usual "score curve" that shows the training and cross-validation scores as more training data is added. The newly-added second plot is a "time curve" that shows how the model fit times change as more training data is added. The format for this new curve's name is: <experiment>_<featureset>_times.png.
The model fit times show in the time curve are first averaged over all runs with the same training data size and then averaged over all output metrics (if multiple ones are specified), making the estimates a bit more smooth.
While the score curve is faced across both rows (output metrics) and columns (learners), the time curve is only faceted along columns (learners) since we already averaged over the metrics.
I refactored the skll.experiments.output.generate_learning_curve_plots function. It now only pre-processes the score and time data to create data frames. The two curves (score and time) are now generated by two private functions: skll.experiments.output._generate_learning_curve_score_plots and skll.experiments.output._generate_learning_curve_time_plots.
Updated existing tests to allow for the refactoring and to ensure that the new plots are checked.
Documentation has been updated to show the time curve in addition to the time curve. I modified the existing plot to show a more realistic example.

As always, the best way to review is to try this out in the examples. As a starting point, if you want to replicate the same example, you can modify the Titanic example's learning_curve.cfg file as shown below and then look at the Titanic_Learning_Curve_all.png and Titanic_Learning_Curve_all_times.png files in the output directory.

This PR closes #556.

skll/experiments/output.py

codecov · 2023-06-26T22:03:58Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05 🎉

Comparison is base (143ff09) 95.19% compared to head (10469c3) 95.24%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #745      +/-   ##
==========================================
+ Coverage   95.19%   95.24%   +0.05%     
==========================================
  Files          29       29              
  Lines        3538     3578      +40     
==========================================
+ Hits         3368     3408      +40     
  Misses        170      170

Impacted Files	Coverage Δ
skll/experiments/__init__.py	`94.69% <100.00%> (ø)`
skll/experiments/output.py	`97.86% <100.00%> (+0.40%)`	⬆️
skll/learner/__init__.py	`97.18% <100.00%> (ø)`
skll/learner/utils.py	`93.37% <100.00%> (+0.05%)`	⬆️
skll/learner/voting.py	`98.54% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

desilinguist added 8 commits June 26, 2023 09:31

feat: capture model fit times for learning curves

9d5cc33

feat: Learner.learning_curve() outputs fit times

7f41477

feat: voting learners also updated

2b32f8f

feat: compute average fit times for each experiment

d132e8f

feat: generate new times plot as output

dbc6b11

test: update learning curve tests

88ac4c8

docs: update learning curve documentation

a79738a

fix: more tests and docstrings

eab95bc

desilinguist requested review from Frost45, damien2012eng, dblandan, mulhod and tamarl08 June 26, 2023 16:15

dblandan reviewed Jun 26, 2023

View reviewed changes

skll/experiments/output.py Outdated Show resolved Hide resolved

dblandan approved these changes Jun 26, 2023

View reviewed changes

fix: simplify computation of rotate_labels

10469c3

Frost45 approved these changes Jun 27, 2023

View reviewed changes

desilinguist merged commit 9e501a9 into main Jun 27, 2023

delete-merged-branch bot deleted the 556-include-fit-times-for-learning-curves branch June 27, 2023 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include model fit times in learning curves #745

Include model fit times in learning curves #745

Uh oh!

desilinguist commented Jun 26, 2023

Uh oh!

Uh oh!

codecov bot commented Jun 26, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Include model fit times in learning curves #745

Include model fit times in learning curves #745

Uh oh!

Conversation

desilinguist commented Jun 26, 2023

Uh oh!

Uh oh!

codecov bot commented Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jun 26, 2023 •

edited

Loading