-
Notifications
You must be signed in to change notification settings - Fork 68
648 learning curve fix for probability #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #649 +/- ##
==========================================
- Coverage 95.11% 95.06% -0.06%
==========================================
Files 27 27
Lines 3093 3098 +5
==========================================
+ Hits 2942 2945 +3
- Misses 151 153 +2
Continue to review full report at Codecov.
|
- Clean up a docstring that sphinx complains about.
|
@aoifecahill @mulhod @bndgyawali This is ready for review. Codecov is wrong about the 2 lines in |
mulhod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I tested by making a new config named learning_curve.cfg for the Boston dataset:
Click to expand!
[General]
experiment_name = Example_LearningCurve
task = learning_curve
[Input]
train_directory = all
featuresets = [["example_boston_features"]]
featureset_names = ["example_boston"]
feature_scaling = both
learners = ["RandomForestClassifier", "SVC", "LogisticRegression"]
suffix = .jsonlines
[Tuning]
[Output]
probability = true
metrics = ['unweighted_kappa']
results = output_lc
log = output_lc
probability is set to true. The data was created by running:
mkdir all; cat {train,test}/*jsonlines > all/example_boston_features.jsonlines(The dataset has just a smidge over the mininum of 500 examples.)
I see this in the output:
$ run_experiment learning_curve.cfg
[...]
2020-12-03 22:39:12,411 - Example_LearningCurve_example_boston_LogisticRegression - INFO - Generating learning curve(s)
2020-12-03 22:39:12,411 - Example_LearningCurve_example_boston_LogisticRegression - WARNING - Since ``probability`` is set, the most likely class will be computed via an argmax before computing the curve.
I see that warning for each learner and it's prominent enough, I think.
This PR closes #648. More specifically, it:
learning_curve()for probabilistic learners.train_and_score()to handle probabilities by using argmax.