Skip to content

Conversation

@desilinguist
Copy link
Collaborator

@desilinguist desilinguist commented Jun 19, 2020

As I was working on implementing a VotingLearner class (for #488), I realized that this implementation would end up sharing a lot of the code with methods for the Learner class. One way to handle this is to make VotingLearner inherit from Learner but that's not really right because, in fact, VotingLearner uses multiple underlying Learner instances. So, the solution I came up with is to refactor as much of this shared code into functions in learner.utils that can the be used by both Learner instances as well as VotingLearner instances (and, hopefully, StackingLearner instances too, when we get to those).

Specifically, this PR refactors:

  • code that computes evaluation metrics from Learner.evaluate() into learner.utils.compute_evaluation_metrics().
  • code that writes predictions to files from Learner.predict() into learner.utils.write_predictions().
  • code that merges any unseen test set labels with training labels from Learner.evaluate() into learner.utils.add_unseen_labels().
  • code that computes the number of folds based on the number of examples for each label from Learner._compute_num_folds_from_example_counts() into learner.utils.compute_num_folds_from_example_counts().
  • code that computes various types of predictions (raw, class labels, class indices, probabilities) from Learner.predict() into learner.utils.get_predictions().
  • code that sets up the cross-validation fold iterator from Learner.cross_validate() into learner.utils.setup_cv_fold_iterator().
  • code that sets up the cross-validation split iterator from Learner.learning_curve() into learner.utils.setup_cv_split_iterator().

In addition, This PR closes #621 and adds a test to make sure that the predictions being returned and written out via Learner.predict() match expectations. As part of this fix, the default value of the class_labels keyword argument for Learner.predict() is now set to True instead of False since it doesn't make sense to return class indices by default.
💥 This will be a an API-breaking change since to get class probabilities as outputs, class_labels will explicitly need to be set to False . 💥

Finally, this PR improves many docstrings, replaces single quotes with double quotes in some places, and replaces the old-style format strings with new-style ones in some of the code that was touched.

@desilinguist desilinguist requested review from a user, aoifecahill and mulhod June 19, 2020 14:19
@pep8speaks
Copy link

pep8speaks commented Jun 19, 2020

Hello @desilinguist! Thanks for updating this PR.

Line 1802:101: E501 line too long (107 > 100 characters)

Comment last updated at 2020-07-01 16:43:16 UTC

@desilinguist
Copy link
Collaborator Author

Not sure why it doesn't show here but the Travis builds are actually complete.

@desilinguist
Copy link
Collaborator Author

Actually, this is not done - just found another simplification I can make. Please don't review it yet.

@desilinguist desilinguist changed the title Refactor code in preparation for voting learners [WIP] Refactor code in preparation for voting learners Jun 19, 2020
@desilinguist desilinguist marked this pull request as draft June 19, 2020 15:43
@codecov
Copy link

codecov bot commented Jun 19, 2020

Codecov Report

Merging #622 into master will decrease coverage by 0.04%.
The diff coverage is 93.98%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #622      +/-   ##
==========================================
- Coverage   95.15%   95.10%   -0.05%     
==========================================
  Files          26       27       +1     
  Lines        3031     3083      +52     
==========================================
+ Hits         2884     2932      +48     
- Misses        147      151       +4     
Impacted Files Coverage Δ
skll/learner/utils.py 94.94% <93.52%> (-0.94%) ⬇️
skll/learner/__init__.py 96.24% <94.73%> (+0.13%) ⬆️
skll/__init__.py 100.00% <100.00%> (ø)
skll/experiments/__init__.py 95.19% <100.00%> (ø)
skll/utils/commandline/generate_predictions.py 98.59% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e413e8...e876bbb. Read the comment docs.

@desilinguist desilinguist changed the title [WIP] Refactor code in preparation for voting learners Refactor code in preparation for voting learners Jun 19, 2020
@desilinguist desilinguist marked this pull request as ready for review June 19, 2020 22:47
@desilinguist
Copy link
Collaborator Author

@bndgyawali @mulhod @aoifecahill this is now ready for review. The minor code coverage decrease is expected due to the refactoring.

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I love the restructuring. It will make development easier when the source files are shorter.

I had a few very minor, nitpicky suggestions for docstrings, etc.

Copy link
Collaborator

@aoifecahill aoifecahill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

Co-authored-by: Matt Mulholland <mulhodm@gmail.com>
@desilinguist desilinguist merged commit d824961 into master Jul 1, 2020
@delete-merged-branch delete-merged-branch bot deleted the refactor-code-for-meta-learners branch July 1, 2020 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Buggy interaction in Learner.predict() when used via API

5 participants