Skip to content

Conversation

@desilinguist
Copy link
Collaborator

@desilinguist desilinguist commented May 22, 2020

This PR closes #606, closes #609, and closes #610.

Custom Metrics:

  1. SKLL can now use arbitrary custom metric functions for both tuning and evaluation.
  2. The details of how this works is described in detail in Add support for custom metrics #606. The only tricky bit not described there is how the dynamic import works. Essentially, we take the Python file specified by the user and import it as a sub-module of skll.metrics and then tell SCORERS that the function that the name points to skll.metrics.<filename>.<function>.
  3. Add a new file test_custom_metrics.py to test the custom metric functionality.
  4. Add a new section to the documentation dedicated to this functionality called "Using custom metrics".

Add new metrics:

  1. Added all variants of the jaccard_score metric from scikit-learn since those can be quite useful.
  2. Added the non-binary variants of precision and recall to be consistent with f1_score and f0.5_score.
  3. Added all new metrics to the documentation.

- Add a private set called `_CUSTOM_METRICS` that will hold the names of any custom metrics.
- Add a new function called `register_custom_metric()` that allows registering custom metric functions and making them available in SKLL.
- Modify `get_acceptable_classification_metrics()` and `get_acceptable_regression_metrics()` to consider all custom metrics acceptable no matter what.
- Add a new keyword argument to `_parse_and_validate_metrics()` so that it automatically tries to register non-built-in metrics as custom metrics.
- Use this new keyword argument while parsing and validating the `metrics` and `objective` fields in a configuration file.
- check for conflicts for custom metric modules as well as custom metric functions
@pep8speaks
Copy link

pep8speaks commented May 22, 2020

Hello @desilinguist! Thanks for updating this PR.

Line 183:101: E501 line too long (107 > 100 characters)

Comment last updated at 2020-05-28 20:38:15 UTC

- Add another config test.
- Use `assert_raises_regex()` instead of `assert_raises()` for all tests to be more specific.
@codecov
Copy link

codecov bot commented May 26, 2020

Codecov Report

Merging #612 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #612   +/-   ##
=======================================
  Coverage   95.18%   95.18%           
=======================================
  Files          26       26           
  Lines        3031     3031           
=======================================
  Hits         2885     2885           
  Misses        146      146           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89e4fdc...89e4fdc. Read the comment docs.

Copy link
Collaborator

@aoifecahill aoifecahill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for this :)

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great.

Beside my comments (especially those related to when register_custom_metric is used in _parse_and_validate_metrics), I encountered an issue when trying to use a custom metric in a job submitted to a grid engine via gridmap.

This is the error I see:

2020-05-27 14:41:36,756 - gridmap.job - ERROR - --------------------------------------------------------------------------------
2020-05-27 14:41:36,756 - gridmap.job - ERROR - GridMap job traceback for Example_CV_example_boston_RandomForestRegressor:
2020-05-27 14:41:36,756 - gridmap.job - ERROR - --------------------------------------------------------------------------------
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Exception: ValueError
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Job ID: 6927642
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Host: loki.research.ets.org
2020-05-27 14:41:36,757 - gridmap.job - ERROR - ................................................................................
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Traceback (most recent call last):
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 249, in execute
    self.ret = self.function(*self.args, **self.kwlist)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/experiments/__init__.py", line 311, in _classify_featureset
    use_custom_folds_for_grid_search=use_folds_file_for_grid_search)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/learner/__init__.py", line 1746, in cross_validate
    shuffle=grid_search)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/learner/__init__.py", line 850, in train
    label_type.__name__))
ValueError: 'my_pearsonr' is not a valid objective function for RandomForestRegressor with labels of type float64.

2020-05-27 14:41:36,784 - gridmap.job - INFO - Encountered ValueError, so killing all jobs.
Traceback (most recent call last):
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/bin/run_experiment", line 11, in <module>
    load_entry_point('skll', 'console_scripts', 'run_experiment')()
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/utils/commandline/run_experiment.py", line 125, in main
    log_level=log_level)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/experiments/__init__.py", line 733, in run_configuration
    temp_dir=log_path)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 896, in process_jobs
    monitor.check(sid, jobs)
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 452, in check
    self.check_if_alive()
  File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 510, in check_if_alive
    raise job.ret
ValueError: 'my_acc' is not a valid objective function for RandomForestRegressor with labels of type float64.

To reproduce, I made a SKLL environment on a server with gridmap installed. I generated the Boston data, changed to the boston directory, and then made a custom.py file:

from scipy.stats import pearsonr

def my_pearsonr(y_true, y_pred):
    return pearsonr(y_true, y_pred)[0]

Then I modified the existing cross_val.cfg to use that custom.py file and the metric my_pearsonr as the objective:

[General]
experiment_name = Example_CV
task = cross_validate

[Input]
# this could also be an absolute path instead (and must be if you're not running things in local mode)
train_directory = train
featuresets = [["example_boston_features"]]
# there is only set of features to try with one feature file in it here.
featureset_names = ["example_boston"]
# when the feature values are numeric and on different scales
# it is good to have feature scaling to put various features in same scale
custom_metric_path = custom.py

feature_scaling = both
learners = ["RandomForestRegressor", "SVR", "LinearRegression"]
suffix = .jsonlines

[Tuning]
grid_search = true
objectives = ['my_pearsonr']

[Output]
# again, these can be absolute paths
results = output
log = output
predictions = output

Finally, I ran run_experiment cross_val.cfg, which submitted the job to the grid engine.

- The custom metric registration now happens inside `_classify_featureset()` and we also add it to `globals()` so that it gets serialized for gridmap properly.
- Unfortunately, this means that `_parse_and_validate_metrics()` can no longer recognize invalid metrics at config parsing time. This means that the user needs to wait until the experiment starts running to find out that metrics are invalid.
- It now returns the metric function just like the custom learner loader.
- Since we are not attaching to `skll.metrics`, we can remove one of the checks.
- Add new custom metric test for conflicting filename which is now fine.
- Update regex in one of the tests to match new invalid metric finding code.
- Refactor `_cleanup_custom_metrics()` to make it cleaner.
- Make sure all `run_configuration()` calls are local.
@desilinguist
Copy link
Collaborator Author

Okay, I have modified the implementation such that custom metrics are now properly serialized when using gridmap. However, in order to support gridmap, we had to unfortunately lose the functionality wherein any invalid metrics in the configuration file could be identified at configuration parsing time. This check is now deferred until right before the job is submitted. However, this shouldn't be too bad. This deferral needed to happen because the registration of any potential custom metrics now happens inside _classify_featureset() and we cannot declare any metrics as invalid before that. This meant removing a couple of config parsing tests.

@mulhod can you please re-run your gridmap tests and any other tests you can think of. Thanks! @bndgyawali if you have time to review this too, it'd be great!

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think the cost of having the check in _classify_featureset is pretty acceptable. Chances are, users who would be making use of this feature will be the type of users that won't have issues with this feature anyway. Or, if they do, they'll be able to easily debug an issue that arises due to an error they made.

@desilinguist desilinguist merged commit 68c7ec8 into master May 30, 2020
@delete-merged-branch delete-merged-branch bot deleted the add-support-for-custom-metrics branch May 30, 2020 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add other variants of precision and recall to SKLL Add jaccard metrics as acceptable classification metrics Add support for custom metrics

5 participants