Skip to content

Conversation

@desilinguist
Copy link
Collaborator

@desilinguist desilinguist commented Feb 3, 2021

This PR closes #653.

It pretty much works out of the box except for two changes:

  • Setting the new keyword argument error_score to "raises" in the GridSearchCV() call made in Learner.train() since we want to raise an exception if there was any problem with fitting the estimator. This change is necessary because the new scikit-learn default is to simply return a nan as the fit score in case of a problem which does not work for us.

  • LinearRegression models in scikit-learn now support a new keyword argument positive which can be set to True to use Non-negative Least Squares (NNLS) regression. This is probably something we want to enable in SKLL since it could be useful in RSMTool. This required the fix for Learner._check_input_formatting() does not work for dense featuresets #656 which has already been merged.

  • Add a test for this new non-negative regression.

  • Other minor changes:

    • Since Python 3.6 is so long in the tooth, I have changed the Linux builds (on Travis) to use Python 3.7 and the Windows builds (on Azure) to use Python 3.8. I am using Python 3.9 locally.

    • Update both requirements.txt and conda_requirements.txt to use the new version of scikit-learn.

- `GridSearchCV` now does not explicitly raise an error if there was an error in fitting the estimator. Rather, it simply returns `nan` as the score. This is not what we want in SKLL, so we set the `error_score` parameter to `raises` which will behave as expected.
- This requires setting `_use_dense_features` since sklearn requires non-sparse features for this type of model.
@codecov
Copy link

codecov bot commented Feb 3, 2021

Codecov Report

Merging #659 (1b4b870) into main (a9321a4) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #659   +/-   ##
=======================================
  Coverage   95.09%   95.09%           
=======================================
  Files          27       27           
  Lines        3101     3101           
=======================================
  Hits         2949     2949           
  Misses        152      152           
Impacted Files Coverage Δ
skll/learner/__init__.py 96.47% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9321a4...1b4b870. Read the comment docs.

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@desilinguist desilinguist merged commit 07de429 into main Feb 4, 2021
@delete-merged-branch delete-merged-branch bot deleted the 653-update-sklearn-to-0-24-1 branch February 4, 2021 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade scikit-learn to 0.24.1

4 participants