648 learning curve fix for probability #649

desilinguist · 2020-12-03T20:18:49Z

This PR closes #648. More specifically, it:

Adds a warning in learning_curve() for probabilistic learners.
Modifies train_and_score() to handle probabilities by using argmax.
Modifies the learning curve implementation test to handle probabilities. The other tests do not need to be modified since they all test things downstream of the implementation.

codecov · 2020-12-03T20:24:29Z

Codecov Report

Merging #649 (9bf85f6) into main (26b96c2) will decrease coverage by 0.05%.
The diff coverage is 60.00%.

@@            Coverage Diff             @@
##             main     #649      +/-   ##
==========================================
- Coverage   95.11%   95.06%   -0.06%     
==========================================
  Files          27       27              
  Lines        3093     3098       +5     
==========================================
+ Hits         2942     2945       +3     
- Misses        151      153       +2

Impacted Files	Coverage Δ
skll/learner/utils.py	`94.33% <33.33%> (-0.62%)`	⬇️
skll/learner/__init__.py	`96.28% <100.00%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26b96c2...9bf85f6. Read the comment docs.

- Clean up a docstring that sphinx complains about.

desilinguist · 2020-12-03T21:02:08Z

@aoifecahill @mulhod @bndgyawali This is ready for review. Codecov is wrong about the 2 lines in train_and_score() not being covered by tests. They are the lines that make the new test pass. However, I suspect nosetests can't detect that because train_and_score() is called via joblib.parallel().

mulhod

Looks great!

I tested by making a new config named learning_curve.cfg for the Boston dataset:

Click to expand!

[General]
experiment_name = Example_LearningCurve
task = learning_curve

[Input]
train_directory = all
featuresets = [["example_boston_features"]]
featureset_names = ["example_boston"]
feature_scaling = both
learners = ["RandomForestClassifier", "SVC", "LogisticRegression"]
suffix = .jsonlines

[Tuning]

[Output]
probability = true
metrics = ['unweighted_kappa']
results = output_lc
log = output_lc

probability is set to true. The data was created by running:

mkdir all; cat {train,test}/*jsonlines > all/example_boston_features.jsonlines

(The dataset has just a smidge over the mininum of 500 examples.)

I see this in the output:

$ run_experiment learning_curve.cfg
[...]
2020-12-03 22:39:12,411 - Example_LearningCurve_example_boston_LogisticRegression - INFO - Generating learning curve(s)
2020-12-03 22:39:12,411 - Example_LearningCurve_example_boston_LogisticRegression - WARNING - Since ``probability`` is set, the most likely class will be computed via an argmax before computing the curve.

I see that warning for each learner and it's prominent enough, I think.

desilinguist added 3 commits December 3, 2020 14:52

Add a warning for probabilistic learners.

5f9671f

Modify train_and_score() to handle probabilities.

de2ddd9

Modify learning curve test to handle probabilities.

a81dbdd

desilinguist requested review from a user, aoifecahill and mulhod December 3, 2020 20:21

Add a note to the documentation about this.

9bf85f6

- Clean up a docstring that sphinx complains about.

ghost approved these changes Dec 3, 2020

View reviewed changes

aoifecahill approved these changes Dec 3, 2020

View reviewed changes

mulhod approved these changes Dec 4, 2020

View reviewed changes

desilinguist merged commit 29614f3 into main Dec 4, 2020

desilinguist deleted the 648-learning-curve-fix-for-probability branch December 4, 2020 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

648 learning curve fix for probability #649

648 learning curve fix for probability #649

Uh oh!

desilinguist commented Dec 3, 2020 •

edited

Loading

Uh oh!

codecov bot commented Dec 3, 2020 •

edited

Loading

Uh oh!

desilinguist commented Dec 3, 2020

Uh oh!

mulhod left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

648 learning curve fix for probability #649

648 learning curve fix for probability #649

Uh oh!

Conversation

desilinguist commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

desilinguist commented Dec 3, 2020

Uh oh!

mulhod left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

desilinguist commented Dec 3, 2020 •

edited

Loading

codecov bot commented Dec 3, 2020 •

edited

Loading