-
Notifications
You must be signed in to change notification settings - Fork 68
Add support for custom metrics #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add a private set called `_CUSTOM_METRICS` that will hold the names of any custom metrics. - Add a new function called `register_custom_metric()` that allows registering custom metric functions and making them available in SKLL.
- Modify `get_acceptable_classification_metrics()` and `get_acceptable_regression_metrics()` to consider all custom metrics acceptable no matter what.
- Add a new keyword argument to `_parse_and_validate_metrics()` so that it automatically tries to register non-built-in metrics as custom metrics. - Use this new keyword argument while parsing and validating the `metrics` and `objective` fields in a configuration file.
- check for conflicts for custom metric modules as well as custom metric functions
- For completelness.
|
Hello @desilinguist! Thanks for updating this PR.
Comment last updated at 2020-05-28 20:38:15 UTC |
- Add another config test. - Use `assert_raises_regex()` instead of `assert_raises()` for all tests to be more specific.
Codecov Report
@@ Coverage Diff @@
## master #612 +/- ##
=======================================
Coverage 95.18% 95.18%
=======================================
Files 26 26
Lines 3031 3031
=======================================
Hits 2885 2885
Misses 146 146 Continue to review full report at Codecov.
|
aoifecahill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks for this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great.
Beside my comments (especially those related to when register_custom_metric is used in _parse_and_validate_metrics), I encountered an issue when trying to use a custom metric in a job submitted to a grid engine via gridmap.
This is the error I see:
2020-05-27 14:41:36,756 - gridmap.job - ERROR - --------------------------------------------------------------------------------
2020-05-27 14:41:36,756 - gridmap.job - ERROR - GridMap job traceback for Example_CV_example_boston_RandomForestRegressor:
2020-05-27 14:41:36,756 - gridmap.job - ERROR - --------------------------------------------------------------------------------
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Exception: ValueError
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Job ID: 6927642
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Host: loki.research.ets.org
2020-05-27 14:41:36,757 - gridmap.job - ERROR - ................................................................................
2020-05-27 14:41:36,757 - gridmap.job - ERROR - Traceback (most recent call last):
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 249, in execute
self.ret = self.function(*self.args, **self.kwlist)
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/experiments/__init__.py", line 311, in _classify_featureset
use_custom_folds_for_grid_search=use_folds_file_for_grid_search)
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/learner/__init__.py", line 1746, in cross_validate
shuffle=grid_search)
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/learner/__init__.py", line 850, in train
label_type.__name__))
ValueError: 'my_pearsonr' is not a valid objective function for RandomForestRegressor with labels of type float64.
2020-05-27 14:41:36,784 - gridmap.job - INFO - Encountered ValueError, so killing all jobs.
Traceback (most recent call last):
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/bin/run_experiment", line 11, in <module>
load_entry_point('skll', 'console_scripts', 'run_experiment')()
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/utils/commandline/run_experiment.py", line 125, in main
log_level=log_level)
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/skll/experiments/__init__.py", line 733, in run_configuration
temp_dir=log_path)
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 896, in process_jobs
monitor.check(sid, jobs)
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 452, in check
self.check_if_alive()
File "/RHGS/mountpoints/home/nlp-text/dynamic/mmulholland/skll_dev/skll/sklldev/lib/python3.7/site-packages/gridmap/job.py", line 510, in check_if_alive
raise job.ret
ValueError: 'my_acc' is not a valid objective function for RandomForestRegressor with labels of type float64.
To reproduce, I made a SKLL environment on a server with gridmap installed. I generated the Boston data, changed to the boston directory, and then made a custom.py file:
from scipy.stats import pearsonr
def my_pearsonr(y_true, y_pred):
return pearsonr(y_true, y_pred)[0]Then I modified the existing cross_val.cfg to use that custom.py file and the metric my_pearsonr as the objective:
[General]
experiment_name = Example_CV
task = cross_validate
[Input]
# this could also be an absolute path instead (and must be if you're not running things in local mode)
train_directory = train
featuresets = [["example_boston_features"]]
# there is only set of features to try with one feature file in it here.
featureset_names = ["example_boston"]
# when the feature values are numeric and on different scales
# it is good to have feature scaling to put various features in same scale
custom_metric_path = custom.py
feature_scaling = both
learners = ["RandomForestRegressor", "SVR", "LinearRegression"]
suffix = .jsonlines
[Tuning]
grid_search = true
objectives = ['my_pearsonr']
[Output]
# again, these can be absolute paths
results = output
log = output
predictions = output
Finally, I ran run_experiment cross_val.cfg, which submitted the job to the grid engine.
- The custom metric registration now happens inside `_classify_featureset()` and we also add it to `globals()` so that it gets serialized for gridmap properly. - Unfortunately, this means that `_parse_and_validate_metrics()` can no longer recognize invalid metrics at config parsing time. This means that the user needs to wait until the experiment starts running to find out that metrics are invalid.
- It now returns the metric function just like the custom learner loader. - Since we are not attaching to `skll.metrics`, we can remove one of the checks.
- Add new custom metric test for conflicting filename which is now fine. - Update regex in one of the tests to match new invalid metric finding code. - Refactor `_cleanup_custom_metrics()` to make it cleaner. - Make sure all `run_configuration()` calls are local.
|
Okay, I have modified the implementation such that custom metrics are now properly serialized when using @mulhod can you please re-run your |
mulhod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I think the cost of having the check in _classify_featureset is pretty acceptable. Chances are, users who would be making use of this feature will be the type of users that won't have issues with this feature anyway. Or, if they do, they'll be able to easily debug an issue that arises due to an error they made.
This PR closes #606, closes #609, and closes #610.
Custom Metrics:
skll.metricsand then tellSCORERSthat the function that the name points toskll.metrics.<filename>.<function>.test_custom_metrics.pyto test the custom metric functionality.Add new metrics:
jaccard_scoremetric from scikit-learn since those can be quite useful.precisionandrecallto be consistent withf1_scoreandf0.5_score.