Add support for custom metrics #612

desilinguist · 2020-05-22T15:27:12Z

This PR closes #606, closes #609, and closes #610.

Custom Metrics:

SKLL can now use arbitrary custom metric functions for both tuning and evaluation.
The details of how this works is described in detail in Add support for custom metrics #606. The only tricky bit not described there is how the dynamic import works. Essentially, we take the Python file specified by the user and import it as a sub-module of skll.metrics and then tell SCORERS that the function that the name points to skll.metrics.<filename>.<function>.
Add a new file test_custom_metrics.py to test the custom metric functionality.
Add a new section to the documentation dedicated to this functionality called "Using custom metrics".

Add new metrics:

Added all variants of the jaccard_score metric from scikit-learn since those can be quite useful.
Added the non-binary variants of precision and recall to be consistent with f1_score and f0.5_score.
Added all new metrics to the documentation.

- Add a private set called `_CUSTOM_METRICS` that will hold the names of any custom metrics. - Add a new function called `register_custom_metric()` that allows registering custom metric functions and making them available in SKLL.

- Modify `get_acceptable_classification_metrics()` and `get_acceptable_regression_metrics()` to consider all custom metrics acceptable no matter what.

- Add a new keyword argument to `_parse_and_validate_metrics()` so that it automatically tries to register non-built-in metrics as custom metrics. - Use this new keyword argument while parsing and validating the `metrics` and `objective` fields in a configuration file.

- check for conflicts for custom metric modules as well as custom metric functions

- For completelness.

pep8speaks · 2020-05-22T15:27:24Z

Hello @desilinguist! Thanks for updating this PR.

In the file skll/experiments/__init__.py:

Line 183:101: E501 line too long (107 > 100 characters)

Comment last updated at 2020-05-28 20:38:15 UTC

- Add another config test. - Use `assert_raises_regex()` instead of `assert_raises()` for all tests to be more specific.

codecov · 2020-05-26T18:28:46Z

Codecov Report

Merging #612 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #612   +/-   ##
=======================================
  Coverage   95.18%   95.18%           
=======================================
  Files          26       26           
  Lines        3031     3031           
=======================================
  Hits         2885     2885           
  Misses        146      146

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89e4fdc...89e4fdc. Read the comment docs.

aoifecahill

Very nice, thanks for this :)

skll/metrics.py

mulhod

This looks really great.

Beside my comments (especially those related to when register_custom_metric is used in _parse_and_validate_metrics), I encountered an issue when trying to use a custom metric in a job submitted to a grid engine via gridmap.

This is the error I see:

2020-05-27 14:41:36,756 - gridmap.job - ERROR - --------------------------------------------------------------------------------
2020-05-27 14:41:36,756 - gridmap.job - ERROR - GridMap job traceback for Example_CV_example_boston_RandomForestRegressor:
2020-05-27 14:41:36,756 - gridmap.job - ERROR - --------------------------------------------------------------------------------
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Exception: ValueError
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Job ID: 6927642
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Host: loki.research.ets.org
2020-05-27 14:41:36,757 - gridmap.job - ERROR - ................................................................................
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Traceback (most recent call last):
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 249, in execute
    self.ret = self.function(*self.args, **self.kwlist)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/experiments/__init__.py", line 311, in _classify_featureset
    use_custom_folds_for_grid_search=use_folds_file_for_grid_search)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/learner/__init__.py", line 1746, in cross_validate
    shuffle=grid_search)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/learner/__init__.py", line 850, in train
    label_type.__name__))
ValueError: 'my_pearsonr' is not a valid objective function for RandomForestRegressor with labels of type float64.

2020-05-27 14:41:36,784 - gridmap.job - INFO - Encountered ValueError, so killing all jobs.
Traceback (most recent call last):
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/bin/run_experiment", line 11, in <module>
    load_entry_point('skll', 'console_scripts', 'run_experiment')()
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/utils/commandline/run_experiment.py", line 125, in main
    log_level=log_level)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/experiments/__init__.py", line 733, in run_configuration
    temp_dir=log_path)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 896, in process_jobs
    monitor.check(sid, jobs)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 452, in check
    self.check_if_alive()
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 510, in check_if_alive
    raise job.ret
ValueError: 'my_acc' is not a valid objective function for RandomForestRegressor with labels of type float64.

To reproduce, I made a SKLL environment on a server with gridmap installed. I generated the Boston data, changed to the boston directory, and then made a custom.py file:

from scipy.stats import pearsonr

def my_pearsonr(y_true, y_pred):
    return pearsonr(y_true, y_pred)[0]

Then I modified the existing cross_val.cfg to use that custom.py file and the metric my_pearsonr as the objective:

[General]
experiment_name = Example_CV
task = cross_validate

[Input]
# this could also be an absolute path instead (and must be if you're not running things in local mode)
train_directory = train
featuresets = [["example_boston_features"]]
# there is only set of features to try with one feature file in it here.
featureset_names = ["example_boston"]
# when the feature values are numeric and on different scales
# it is good to have feature scaling to put various features in same scale
custom_metric_path = custom.py

feature_scaling = both
learners = ["RandomForestRegressor", "SVR", "LinearRegression"]
suffix = .jsonlines

[Tuning]
grid_search = true
objectives = ['my_pearsonr']

[Output]
# again, these can be absolute paths
results = output
log = output
predictions = output

Finally, I ran run_experiment cross_val.cfg, which submitted the job to the grid engine.

doc/custom_metrics.rst

doc/run_experiment.rst

skll/config/utils.py

- The custom metric registration now happens inside `_classify_featureset()` and we also add it to `globals()` so that it gets serialized for gridmap properly. - Unfortunately, this means that `_parse_and_validate_metrics()` can no longer recognize invalid metrics at config parsing time. This means that the user needs to wait until the experiment starts running to find out that metrics are invalid.

- It now returns the metric function just like the custom learner loader. - Since we are not attaching to `skll.metrics`, we can remove one of the checks.

- Add new custom metric test for conflicting filename which is now fine. - Update regex in one of the tests to match new invalid metric finding code. - Refactor `_cleanup_custom_metrics()` to make it cleaner. - Make sure all `run_configuration()` calls are local.

desilinguist · 2020-05-28T20:47:09Z

Okay, I have modified the implementation such that custom metrics are now properly serialized when using gridmap. However, in order to support gridmap, we had to unfortunately lose the functionality wherein any invalid metrics in the configuration file could be identified at configuration parsing time. This check is now deferred until right before the job is submitted. However, this shouldn't be too bad. This deferral needed to happen because the registration of any potential custom metrics now happens inside _classify_featureset() and we cannot declare any metrics as invalid before that. This meant removing a couple of config parsing tests.

@mulhod can you please re-run your gridmap tests and any other tests you can think of. Thanks! @bndgyawali if you have time to review this too, it'd be great!

mulhod

Looks great! I think the cost of having the check in _classify_featureset is pretty acceptable. Chances are, users who would be making use of this feature will be the type of users that won't have issues with this feature anyway. Or, if they do, they'll be able to easily debug an issue that arises due to an error they made.

desilinguist added 30 commits May 14, 2020 11:30

Allow jaccard metrics for classification.

8010c68

Add support for custom metrics in metrics.py

0bee4fd

- Add a private set called `_CUSTOM_METRICS` that will hold the names of any custom metrics. - Add a new function called `register_custom_metric()` that allows registering custom metric functions and making them available in SKLL.

Make custom metrics acceptable.

b1be759

- Modify `get_acceptable_classification_metrics()` and `get_acceptable_regression_metrics()` to consider all custom metrics acceptable no matter what.

Add custom_metric_path as a configuration field.

b3a4245

Add support for custom_metric_path in SKLL experiments.

9ce10c2

Add first draft of documentation.

dcdd25a

minor tweak - remove comment

5ddaf01

improve error checking and messages

c247b9e

Add initial set of custom metric tessts

f631e7a

Fix docstring

593221b

Tweak custom metric for test

c90e6fb

Add more custom metric tests

50de819

Do not reimport unless necessary.

da6d489

Update custom metric tests.

c33ed78

Fix test to actually use custom metrics from scratch.

acf644f

Fix tests to account for extra config field.

fb4bb37

Actually add the custom metric tests to CI builds.

9ecc94e

Trying something.

e887f3e

Trying another tack.

c76f77b

Trying yet another option.

38c6f26

Add checks for custom metrics

d1f03be

- check for conflicts for custom metric modules as well as custom metric functions

Update custom metric tests.

a32c994

Add missing file.

89709e3

Tweak the cleanup function

b2ca6c2

Add new kwarg test

93da4f5

Add kwarg custom metric test based on config.

3d600c5

Add missing template files.

1f1b957

Adding even more tests.

20fdc34

Finish custom metric documentation.

140005b

desilinguist added 2 commits May 21, 2020 22:33

Add more versions of precision and recall to SKLL.

51cdc1f

- For completelness.

Tweak notes and add warnings.

63f8ce8

desilinguist requested review from a user, Lguyogiro, aloukina, aoifecahill, farigys and mulhod May 22, 2020 15:27

desilinguist added 3 commits May 22, 2020 12:12

Add new test and tweak existing tests

13fa633

- Add another config test. - Use `assert_raises_regex()` instead of `assert_raises()` for all tests to be more specific.

Remove stray '#'

50b5fee

fix some typos in comments.

3b72eb9

aoifecahill approved these changes May 26, 2020

View reviewed changes

skll/metrics.py Show resolved Hide resolved

mulhod suggested changes May 27, 2020

View reviewed changes

desilinguist added 10 commits May 28, 2020 15:30

Simplify register_custom_metric()

76a5d39

- It now returns the metric function just like the custom learner loader. - Since we are not attaching to `skll.metrics`, we can remove one of the checks.

Update tests

5039c9b

- Add new custom metric test for conflicting filename which is now fine. - Update regex in one of the tests to match new invalid metric finding code. - Refactor `_cleanup_custom_metrics()` to make it cleaner. - Make sure all `run_configuration()` calls are local.

Remove config parsing invalid metric test.

7373cb3

Add new test config template.

a0eab9f

Check for None as an invalid metric.

27a48b4

Getting multiprocessing serialization working

bbc4a59

Trying yet another fix.

abfb151

Remove another unneeded test.

026fac8

Address comments in documentation.

89e4fdc

mulhod approved these changes May 29, 2020

View reviewed changes

desilinguist merged commit 68c7ec8 into master May 30, 2020

delete-merged-branch bot deleted the add-support-for-custom-metrics branch May 30, 2020 22:51

desilinguist mentioned this pull request Jun 3, 2020

add _macro, _micro and _weighted variants of precision and recall #589

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for custom metrics #612

Add support for custom metrics #612

Uh oh!

desilinguist commented May 22, 2020 •

edited

Loading

Uh oh!

pep8speaks commented May 22, 2020 •

edited

Loading

Uh oh!

codecov bot commented May 26, 2020 •

edited

Loading

Uh oh!

aoifecahill left a comment

Uh oh!

Uh oh!

mulhod left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

desilinguist commented May 28, 2020

Uh oh!

mulhod left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add support for custom metrics #612

Add support for custom metrics #612

Uh oh!

Conversation

desilinguist commented May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-05-28 20:38:15 UTC

Uh oh!

codecov bot commented May 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aoifecahill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mulhod left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

desilinguist commented May 28, 2020

Uh oh!

mulhod left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

desilinguist commented May 22, 2020 •

edited

Loading

pep8speaks commented May 22, 2020 •

edited

Loading

codecov bot commented May 26, 2020 •

edited

Loading

mulhod left a comment •

edited

Loading