Skip to content

Conversation

@srhrshr
Copy link
Contributor

@srhrshr srhrshr commented Oct 27, 2020

@desilinguist , this PR closes #624 - I've added a hard minimum number of examples constraint for the learning_curve method. Let me know if this is what you had in mind.

@codecov
Copy link

codecov bot commented Oct 27, 2020

Codecov Report

Merging #631 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #631   +/-   ##
=======================================
  Coverage   95.10%   95.10%           
=======================================
  Files          27       27           
  Lines        3083     3087    +4     
=======================================
+ Hits         2932     2936    +4     
  Misses        151      151           
Impacted Files Coverage Δ
skll/learner/__init__.py 96.26% <100.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2add4bf...6ed1084. Read the comment docs.

cv_folds=10,
train_sizes=np.linspace(0.1, 1.0, 5)):
train_sizes=np.linspace(0.1, 1.0, 5),
min_training_examples=500):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am not sure we want to make this an argument? If we know that less than 500 examples lead to unreliable curves, why not simply hardcode that and add an argument like override_minimum which is False by default but users can choose to override it if they really want to. So, in this case, we would raise a ValueError if the number of features doesn't meet the 500 minimum. We would downgrade that error to a warning though if the user sets override_minimum to True.

What do you guys think @aoifecahill @mulhod @bndgyawali ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, whatever we end up doing, we would need to add new tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, override_minimum sounds good to me.

@pep8speaks
Copy link

pep8speaks commented Oct 27, 2020

Hello @srhrshr! Thanks for updating this PR.

Line 91:28: E127 continuation line over-indented for visual indent

Comment last updated at 2020-10-28 16:20:44 UTC

Comment on lines 1652 to 1659
override_minimum : bool, optional
Should this be True, learning curve would be generated
even with less than ideal number of `examples` (500).
However, by default, if the number of `examples` in the FeatureSet is
less than 500, an exception is raised, because
learning curves can be very unreliable
for very small sizes esp. if you have > 2 labels.
Defaults to False.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
override_minimum : bool, optional
Should this be True, learning curve would be generated
even with less than ideal number of `examples` (500).
However, by default, if the number of `examples` in the FeatureSet is
less than 500, an exception is raised, because
learning curves can be very unreliable
for very small sizes esp. if you have > 2 labels.
Defaults to False.
override_minimum : bool, optional
Learning curves can be unreliable for very small sizes
esp. for > 2 labels. If this option is set to ``True``, the
learning curve would be generated even if the number
of example is less 500 along with a warning. If ``False``,
the curve is not generated and an exception is raised instead.
Defaults to ``False``.


# this must throw an error because `examples` has less than 500 items
_ = learner.learning_curve(examples=train_fs_less_than_500, metric='accuracy',
override_minimum=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want to specify override_minimum here since the goal is to check that the error is raised by default.

I think we need another tests, where we specify override_minimum as True, and then check that the warning was output to the log file. I believe we have other tests that already check for warnings in the log so you should be able to adapt those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - I've updated this test and have also added a separate test to check that the warning is raised: https://github.com/EducationalTestingService/skll/pull/631/files#diff-d8e9ffeba7d07b6f722f054d5a627f2582ba38b8b90ea090fb38d30cea565936R261-R292

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'm not sure why the build has not yet completed, though. Looks like it's stuck in queued status. We might need to manually run it again.

Update: Actually, it does seem to be running now.

Copy link
Collaborator

@desilinguist desilinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are almost there @srhrshr ! Thanks for being so receptive to our suggestions 🙇🏽.

Comment on lines 244 to 292
@raises(ValueError)
def test_custom_learner_learning_curve_min_examples():
"""
Test to check learning curve raises error with less than 500 examples
"""
# generates a training split with less than 500 examples
train_fs_less_than_500, _ = make_classification_data(num_examples=499,
train_test_ratio=1.0,
num_labels=3)

# creating an example learner
learner = Learner('LogisticRegression')

# this must throw an error because `examples` has less than 500 items
_ = learner.learning_curve(examples=train_fs_less_than_500, metric='accuracy')


def test_custom_learner_learning_curve_min_examples_override():
"""
Test to check learning curve displays warning with less than 500 examples
"""

# creates a logger which writes to a temporary log file
log_dir = join(_my_dir, 'log')
log_file = NamedTemporaryFile("w", delete=False, dir=log_dir)
logger = get_skll_logger("test_custom_learner_learning_curve_min_examples_override",
filepath=log_file.name)

# generates a training split with less than 500 examples
train_fs_less_than_500, _ = make_classification_data(num_examples=499,
train_test_ratio=1.0,
num_labels=3)

# creating an example learner
learner = Learner('LogisticRegression', logger=logger)

# this must throw an error because `examples` has less than 500 items
_ = learner.learning_curve(examples=train_fs_less_than_500, metric='accuracy',
override_minimum=True)

# checks that the learning_curve warning message is contained in the log file
with open(log_file.name) as tf:
log_text = tf.read()
learning_curve_warning_re = \
re.compile(r'Because the number of training examples provided - '
r'\d+ - is less than the ideal minimum - \d+ - '
r'learning curve generation is unreliable'
r' and might break')
assert learning_curve_warning_re.search(log_text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look great @srhrshr ! One minor thing: is there a reason why these test are in test_custom_learner.py rather than test_output.py where the other learning curve tests are? These don't have anything to do with custom learners right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - I didn't realize there were learning curve tests as I was looking for tests of the Learner* class all this while 🤦 Will move the tests to test_output.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @desilinguist !

from skll.learner import Learner
from skll.utils.constants import KNOWN_DEFAULT_PARAM_GRIDS

from skll.utils.logging import (get_skll_logger)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from skll.utils.logging import (get_skll_logger)
from skll.utils.logging import get_skll_logger

glob(join(output_dir, 'test_model_custom_learner_*'))):
os.unlink(output_file)

for log_file in glob(join(log_dir, '*')):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than deleting all log files which could affect other tests, we should only delete the specific log file we are actually generating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've also done a little bit of cleaning up to the tearDown function and instead of creating a new log dir, I've written the log file to an output dir, since the log files are being written to the output dir in a few other places.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. One minor nit I want to pick is that I am not a fan of single letter variable names and nested list comprehensions when they aren't necessary and sacrifice readability. I think in this case, nested for loops would work just fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally get that - this practice of mine has always been a little polarizing :P
Made this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@srhrshr srhrshr force-pushed the learning_curve_check branch from 85fd735 to b546646 Compare October 27, 2020 23:33
@desilinguist
Copy link
Collaborator

Seems like a missing import?

@desilinguist
Copy link
Collaborator

Hmm, I am seeing a partial failure on the windows builds:

for output_file in glob(join(output_dir, 'test_{}_*'.format(suffix))) \
+ glob(join(output_dir, 'test_{}.log'.format(suffix))) \
+ glob(join(output_dir, 'test_majority_class_custom_learner_*')):
+ glob(join(output_dir, 'test_{}.log'.format(suffix))):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this isn't a glob pattern? I think it could be dealt with elsewhere. I'm not sure if this has anything to do with the build failure, but it's probably better to remove it above (or below) if it exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was the issue. But line 90 ('test_{}_*'.format(suffix)))) was also matching the log filename of my test, despite me trying to be super specific about the filename I used. I've changed it now.

Btw, is the tearDown method called after each test? Because, that would explain the race condition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that's right.

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file handle thing needs to be fixed. It's causing the build to fail.

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@desilinguist desilinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally! Looks great! Excellent work @srhrshr !

@desilinguist desilinguist merged commit 7a7794f into EducationalTestingService:main Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Learning curve breaks for very small data sizes

4 participants