-
Notifications
You must be signed in to change notification settings - Fork 68
min. number of examples check for the the learning_curve method #631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
min. number of examples check for the the learning_curve method #631
Conversation
Codecov Report
@@ Coverage Diff @@
## main #631 +/- ##
=======================================
Coverage 95.10% 95.10%
=======================================
Files 27 27
Lines 3083 3087 +4
=======================================
+ Hits 2932 2936 +4
Misses 151 151
Continue to review full report at Codecov.
|
skll/learner/__init__.py
Outdated
| cv_folds=10, | ||
| train_sizes=np.linspace(0.1, 1.0, 5)): | ||
| train_sizes=np.linspace(0.1, 1.0, 5), | ||
| min_training_examples=500): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am not sure we want to make this an argument? If we know that less than 500 examples lead to unreliable curves, why not simply hardcode that and add an argument like override_minimum which is False by default but users can choose to override it if they really want to. So, in this case, we would raise a ValueError if the number of features doesn't meet the 500 minimum. We would downgrade that error to a warning though if the user sets override_minimum to True.
What do you guys think @aoifecahill @mulhod @bndgyawali ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, whatever we end up doing, we would need to add new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, override_minimum sounds good to me.
|
Hello @srhrshr! Thanks for updating this PR.
Comment last updated at 2020-10-28 16:20:44 UTC |
skll/learner/__init__.py
Outdated
| override_minimum : bool, optional | ||
| Should this be True, learning curve would be generated | ||
| even with less than ideal number of `examples` (500). | ||
| However, by default, if the number of `examples` in the FeatureSet is | ||
| less than 500, an exception is raised, because | ||
| learning curves can be very unreliable | ||
| for very small sizes esp. if you have > 2 labels. | ||
| Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| override_minimum : bool, optional | |
| Should this be True, learning curve would be generated | |
| even with less than ideal number of `examples` (500). | |
| However, by default, if the number of `examples` in the FeatureSet is | |
| less than 500, an exception is raised, because | |
| learning curves can be very unreliable | |
| for very small sizes esp. if you have > 2 labels. | |
| Defaults to False. | |
| override_minimum : bool, optional | |
| Learning curves can be unreliable for very small sizes | |
| esp. for > 2 labels. If this option is set to ``True``, the | |
| learning curve would be generated even if the number | |
| of example is less 500 along with a warning. If ``False``, | |
| the curve is not generated and an exception is raised instead. | |
| Defaults to ``False``. |
tests/test_custom_learner.py
Outdated
|
|
||
| # this must throw an error because `examples` has less than 500 items | ||
| _ = learner.learning_curve(examples=train_fs_less_than_500, metric='accuracy', | ||
| override_minimum=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want to specify override_minimum here since the goal is to check that the error is raised by default.
I think we need another tests, where we specify override_minimum as True, and then check that the warning was output to the log file. I believe we have other tests that already check for warnings in the log so you should be able to adapt those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense - I've updated this test and have also added a separate test to check that the warning is raised: https://github.com/EducationalTestingService/skll/pull/631/files#diff-d8e9ffeba7d07b6f722f054d5a627f2582ba38b8b90ea090fb38d30cea565936R261-R292
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I'm not sure why the build has not yet completed, though. Looks like it's stuck in queued status. We might need to manually run it again.
Update: Actually, it does seem to be running now.
desilinguist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are almost there @srhrshr ! Thanks for being so receptive to our suggestions 🙇🏽.
tests/test_custom_learner.py
Outdated
| @raises(ValueError) | ||
| def test_custom_learner_learning_curve_min_examples(): | ||
| """ | ||
| Test to check learning curve raises error with less than 500 examples | ||
| """ | ||
| # generates a training split with less than 500 examples | ||
| train_fs_less_than_500, _ = make_classification_data(num_examples=499, | ||
| train_test_ratio=1.0, | ||
| num_labels=3) | ||
|
|
||
| # creating an example learner | ||
| learner = Learner('LogisticRegression') | ||
|
|
||
| # this must throw an error because `examples` has less than 500 items | ||
| _ = learner.learning_curve(examples=train_fs_less_than_500, metric='accuracy') | ||
|
|
||
|
|
||
| def test_custom_learner_learning_curve_min_examples_override(): | ||
| """ | ||
| Test to check learning curve displays warning with less than 500 examples | ||
| """ | ||
|
|
||
| # creates a logger which writes to a temporary log file | ||
| log_dir = join(_my_dir, 'log') | ||
| log_file = NamedTemporaryFile("w", delete=False, dir=log_dir) | ||
| logger = get_skll_logger("test_custom_learner_learning_curve_min_examples_override", | ||
| filepath=log_file.name) | ||
|
|
||
| # generates a training split with less than 500 examples | ||
| train_fs_less_than_500, _ = make_classification_data(num_examples=499, | ||
| train_test_ratio=1.0, | ||
| num_labels=3) | ||
|
|
||
| # creating an example learner | ||
| learner = Learner('LogisticRegression', logger=logger) | ||
|
|
||
| # this must throw an error because `examples` has less than 500 items | ||
| _ = learner.learning_curve(examples=train_fs_less_than_500, metric='accuracy', | ||
| override_minimum=True) | ||
|
|
||
| # checks that the learning_curve warning message is contained in the log file | ||
| with open(log_file.name) as tf: | ||
| log_text = tf.read() | ||
| learning_curve_warning_re = \ | ||
| re.compile(r'Because the number of training examples provided - ' | ||
| r'\d+ - is less than the ideal minimum - \d+ - ' | ||
| r'learning curve generation is unreliable' | ||
| r' and might break') | ||
| assert learning_curve_warning_re.search(log_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look great @srhrshr ! One minor thing: is there a reason why these test are in test_custom_learner.py rather than test_output.py where the other learning curve tests are? These don't have anything to do with custom learners right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - I didn't realize there were learning curve tests as I was looking for tests of the Learner* class all this while 🤦 Will move the tests to test_output.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @desilinguist !
tests/test_custom_learner.py
Outdated
| from skll.learner import Learner | ||
| from skll.utils.constants import KNOWN_DEFAULT_PARAM_GRIDS | ||
|
|
||
| from skll.utils.logging import (get_skll_logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from skll.utils.logging import (get_skll_logger) | |
| from skll.utils.logging import get_skll_logger |
tests/test_custom_learner.py
Outdated
| glob(join(output_dir, 'test_model_custom_learner_*'))): | ||
| os.unlink(output_file) | ||
|
|
||
| for log_file in glob(join(log_dir, '*')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than deleting all log files which could affect other tests, we should only delete the specific log file we are actually generating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I've also done a little bit of cleaning up to the tearDown function and instead of creating a new log dir, I've written the log file to an output dir, since the log files are being written to the output dir in a few other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. One minor nit I want to pick is that I am not a fan of single letter variable names and nested list comprehensions when they aren't necessary and sacrifice readability. I think in this case, nested for loops would work just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally get that - this practice of mine has always been a little polarizing :P
Made this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Co-authored-by: Nitin Madnani <nmadnani@gmail.com>
Co-authored-by: Nitin Madnani <nmadnani@gmail.com>
85fd735 to
b546646
Compare
|
Seems like a missing import? |
|
Hmm, I am seeing a partial failure on the windows builds: |
| for output_file in glob(join(output_dir, 'test_{}_*'.format(suffix))) \ | ||
| + glob(join(output_dir, 'test_{}.log'.format(suffix))) \ | ||
| + glob(join(output_dir, 'test_majority_class_custom_learner_*')): | ||
| + glob(join(output_dir, 'test_{}.log'.format(suffix))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this isn't a glob pattern? I think it could be dealt with elsewhere. I'm not sure if this has anything to do with the build failure, but it's probably better to remove it above (or below) if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this was the issue. But line 90 ('test_{}_*'.format(suffix)))) was also matching the log filename of my test, despite me trying to be super specific about the filename I used. I've changed it now.
Btw, is the tearDown method called after each test? Because, that would explain the race condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that's right.
mulhod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file handle thing needs to be fixed. It's causing the build to fail.
mulhod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
desilinguist
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally! Looks great! Excellent work @srhrshr !
@desilinguist , this PR closes #624 - I've added a hard minimum number of examples constraint for the
learning_curvemethod. Let me know if this is what you had in mind.