Refactor code in preparation for voting learners #622

desilinguist · 2020-06-19T14:19:47Z

As I was working on implementing a VotingLearner class (for #488), I realized that this implementation would end up sharing a lot of the code with methods for the Learner class. One way to handle this is to make VotingLearner inherit from Learner but that's not really right because, in fact, VotingLearner uses multiple underlying Learner instances. So, the solution I came up with is to refactor as much of this shared code into functions in learner.utils that can the be used by both Learner instances as well as VotingLearner instances (and, hopefully, StackingLearner instances too, when we get to those).

Specifically, this PR refactors:

code that computes evaluation metrics from Learner.evaluate() into learner.utils.compute_evaluation_metrics().
code that writes predictions to files from Learner.predict() into learner.utils.write_predictions().
code that merges any unseen test set labels with training labels from Learner.evaluate() into learner.utils.add_unseen_labels().
code that computes the number of folds based on the number of examples for each label from Learner._compute_num_folds_from_example_counts() into learner.utils.compute_num_folds_from_example_counts().
code that computes various types of predictions (raw, class labels, class indices, probabilities) from Learner.predict() into learner.utils.get_predictions().
code that sets up the cross-validation fold iterator from Learner.cross_validate() into learner.utils.setup_cv_fold_iterator().
code that sets up the cross-validation split iterator from Learner.learning_curve() into learner.utils.setup_cv_split_iterator().

In addition, This PR closes #621 and adds a test to make sure that the predictions being returned and written out via Learner.predict() match expectations. As part of this fix, the default value of the class_labels keyword argument for Learner.predict() is now set to True instead of False since it doesn't make sense to return class indices by default.
💥 This will be a an API-breaking change since to get class probabilities as outputs, class_labels will explicitly need to be set to False . 💥

Finally, this PR improves many docstrings, replaces single quotes with double quotes in some places, and replaces the old-style format strings with new-style ones in some of the code that was touched.

- This is useful since it can the be used by both regular learners as well as the voting learners.

- Otherwise things break on Windows.

pep8speaks · 2020-06-19T14:19:59Z

Hello @desilinguist! Thanks for updating this PR.

In the file tests/test_classification.py:

Line 1802:101: E501 line too long (107 > 100 characters)

Comment last updated at 2020-07-01 16:43:16 UTC

desilinguist · 2020-06-19T14:57:25Z

Not sure why it doesn't show here but the Travis builds are actually complete.

desilinguist · 2020-06-19T15:43:16Z

Actually, this is not done - just found another simplification I can make. Please don't review it yet.

codecov · 2020-06-19T22:40:55Z

Codecov Report

Merging #622 into master will decrease coverage by 0.04%.
The diff coverage is 93.98%.

@@            Coverage Diff             @@
##           master     #622      +/-   ##
==========================================
- Coverage   95.15%   95.10%   -0.05%     
==========================================
  Files          26       27       +1     
  Lines        3031     3083      +52     
==========================================
+ Hits         2884     2932      +48     
- Misses        147      151       +4

Impacted Files	Coverage Δ
skll/learner/utils.py	`94.94% <93.52%> (-0.94%)`	⬇️
skll/learner/__init__.py	`96.24% <94.73%> (+0.13%)`	⬆️
skll/__init__.py	`100.00% <100.00%> (ø)`
skll/experiments/__init__.py	`95.19% <100.00%> (ø)`
skll/utils/commandline/generate_predictions.py	`98.59% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e413e8...e876bbb. Read the comment docs.

desilinguist · 2020-06-19T22:49:31Z

@bndgyawali @mulhod @aoifecahill this is now ready for review. The minor code coverage decrease is expected due to the refactoring.

mulhod

Looks great! I love the restructuring. It will make development easier when the source files are shorter.

I had a few very minor, nitpicky suggestions for docstrings, etc.

skll/learner/utils.py

skll/utils/commandline/generate_predictions.py

aoifecahill

Looks good to me, thanks!

Co-authored-by: Matt Mulholland <mulhodm@gmail.com>

desilinguist added 18 commits June 18, 2020 11:23

Fix inaccurate docstring.

bb702ec

Factor out prediction writing into separate function

67de90a

- This is useful since it can the be used by both regular learners as well as the voting learners.

Factor out common code for evaluation

c590149

Some more tweaks.

d3a2e1e

Refactor some CV code into new functions.

b0f5bbf

Refactor some inefficient code.

6d92f94

Change single quotes to double quotes.

1a33017

Refactor some more code

2f63ac5

Use refactored functions

b7ba55b

Pass beta as keyword argument to avoid sklearn warning.

954fca7

Add missing check.

34bc40b

Add missing argument

8d728de

Fix docstring

94fdfac

Fix buggy Learner.predict().

fac5216

Improve predict docstring.

f94528b

Add new test for predictions

9afa349

Improve the test.

14148af

Set newline appropriately when writing predictions.

e2bbc0f

- Otherwise things break on Windows.

desilinguist requested review from a user, aoifecahill and mulhod June 19, 2020 14:19

Reorder functions in learner.utils.

6531361

desilinguist changed the title ~~Refactor code in preparation for voting learners~~ [WIP] Refactor code in preparation for voting learners Jun 19, 2020

desilinguist marked this pull request as draft June 19, 2020 15:43

desilinguist added 3 commits June 19, 2020 16:48

A bit more refactoring.

4e6875c

Improve docstrings and exception handling.

9910394

Fix typo.

ca14bd6

desilinguist added 3 commits June 19, 2020 18:19

Set class_labels to False since True is now default.

56d3da8

Tweak docstrings and quotes.

77e6150

Make docstring even more explicit.

0a026d2

desilinguist changed the title ~~[WIP] Refactor code in preparation for voting learners~~ Refactor code in preparation for voting learners Jun 19, 2020

desilinguist marked this pull request as ready for review June 19, 2020 22:47

mulhod suggested changes Jun 23, 2020

View reviewed changes

aoifecahill approved these changes Jun 29, 2020

View reviewed changes

Apply suggestions from code review

e876bbb

Co-authored-by: Matt Mulholland <mulhodm@gmail.com>

mulhod approved these changes Jul 1, 2020

View reviewed changes

desilinguist merged commit d824961 into master Jul 1, 2020

delete-merged-branch bot deleted the refactor-code-for-meta-learners branch July 1, 2020 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor code in preparation for voting learners #622

Refactor code in preparation for voting learners #622

Uh oh!

desilinguist commented Jun 19, 2020 •

edited

Loading

Uh oh!

pep8speaks commented Jun 19, 2020 •

edited

Loading

Uh oh!

desilinguist commented Jun 19, 2020

Uh oh!

desilinguist commented Jun 19, 2020

Uh oh!

codecov bot commented Jun 19, 2020 •

edited

Loading

Uh oh!

desilinguist commented Jun 19, 2020

Uh oh!

mulhod left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aoifecahill left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Refactor code in preparation for voting learners #622

Refactor code in preparation for voting learners #622

Uh oh!

Conversation

desilinguist commented Jun 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Jun 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-07-01 16:43:16 UTC

Uh oh!

desilinguist commented Jun 19, 2020

Uh oh!

desilinguist commented Jun 19, 2020

Uh oh!

codecov bot commented Jun 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

desilinguist commented Jun 19, 2020

Uh oh!

mulhod left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aoifecahill left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

desilinguist commented Jun 19, 2020 •

edited

Loading

pep8speaks commented Jun 19, 2020 •

edited

Loading

codecov bot commented Jun 19, 2020 •

edited

Loading