Upgrade scikit-learn to v1.0.1 #702

desilinguist · 2021-11-30T16:48:53Z

This PR closes #699.

This change is pretty straightforward but it’s definitely backwards incompatible which we will reflect in the release version when we put together the release.

The change specifically motivated by the upgrade are:

Update requirements.txt and conda_requirements.txt to point to the latest scikit-learn (v1.0.1) and allow up to v1.0.2.
Use squared_error as the default value of the loss parameter for RANSACRegressor.
Use toarray() for converting sparse numpy arrays to dense instead of todense() since the latter returns an np.matrix instead of an np.ndarray. Scikit-learn is planning to drop support for np.matrix inputs and is already displaying FutureWarnings.
Force normalize to False for Lars models
- Scikit-learn v1.0 will be deprecating the normalize attribute for linear models
- This attribute is set to False by default in most scikit-learn linear models anyway and so no warnings are surfaced in SKLL.
- However, for Lars, the default value of normalize is still set to True and so we need to force it to False to avoid FutureWarning instances.
- The if statement added will actually lead to an execption in the _create_estimator() method when the normalize attribute doesn't exist, so that will be the perfect reminder to excise it entirely when the time comes.

Other changes include:

Update Python versions in CI builds (3.9 for Windows/Azure and 3.9 for Linux/Gitlab)
Update pre-commit and pre-commit hooks to their latest versions

- Use 3.8 for Linux - Use 3.9 for Windows

- scikit-learn is dropping support for `np.matrix` which is what we get from `todense()`, so we need to use `toarray()` instead.

- The old default `squared_loss` has been deprecated and renamed to `squared_error`.

- Scikit-learn v1.0 will be deprecating the `normalize` attribute for linear models - This attribute is set to `False` by default in most scikit-learn linear models anyway and so no warnings are surfaced in SKLL. - However, for `Lars`, the default value of `normalize` is still set to `True` and so we need to force it to False to avoid deprecation warnings. - This code will actually lead to an execption in the `_create_estimator()` method when the `normalize` attribute doesn't exist, so that will be the perfect reminder to excise this if block entirely when the time comes.

codecov · 2021-11-30T17:23:59Z

Codecov Report

Merging #702 (afad8de) into main (4ead823) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #702      +/-   ##
==========================================
+ Coverage   96.85%   96.89%   +0.03%     
==========================================
  Files          63       63              
  Lines        9098     9197      +99     
==========================================
+ Hits         8812     8911      +99     
  Misses        286      286

Impacted Files	Coverage Δ
tests/test_preprocessing.py	`100.00% <ø> (ø)`
skll/data/writers.py	`94.17% <100.00%> (+0.06%)`	⬆️
skll/learner/__init__.py	`97.13% <100.00%> (+0.04%)`	⬆️
skll/learner/utils.py	`94.62% <100.00%> (+0.03%)`	⬆️
tests/test_commandline_utils.py	`99.66% <100.00%> (+<0.01%)`	⬆️
tests/test_featureset.py	`99.78% <100.00%> (+<0.01%)`	⬆️
tests/test_regression.py	`99.64% <100.00%> (+<0.01%)`	⬆️
tests/test_cv.py	`100.00% <0.00%> (ø)`
tests/test_output.py	`100.00% <0.00%> (ø)`
tests/test_classification.py	`100.00% <0.00%> (ø)`
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ead823...afad8de. Read the comment docs.

mulhod · 2021-11-30T17:30:19Z

Should the scikit-learn version also be updated in the Conda recipe? It is currently >=0.24.1,<=0.24.2. Would this version include 1.0, e.g. >=1.0,<=1.0.1?

desilinguist · 2021-11-30T17:51:58Z

Should the scikit-learn version also be updated in the Conda recipe? It is currently >=0.24.1,<=0.24.2. Would this version include 1.0, e.g. >=1.0,<=1.0.1?

Good catch! I will modify it to be the same as the requirements files.

Frost45

Looks great 🎉

mulhod

LGTM

desilinguist added 7 commits November 29, 2021 19:16

Update pre-commit to use latest versions of hooks

8dc2fd4

Update scikit-learn in requirements file.

4171724

Update Python versions in CI builds

8d258fc

- Use 3.8 for Linux - Use 3.9 for Windows

Update conda requirements file too.

ec0ff40

Use toarray() instead of todense()

5d81e13

- scikit-learn is dropping support for `np.matrix` which is what we get from `todense()`, so we need to use `toarray()` instead.

Use squared_error as default loss for RANSAC

adb0ce1

- The old default `squared_loss` has been deprecated and renamed to `squared_error`.

desilinguist requested review from Frost45, damien2012eng and mulhod November 30, 2021 16:48

desilinguist self-assigned this Nov 30, 2021

Update scikit-learn in the conda-recipe.

afad8de

Frost45 approved these changes Nov 30, 2021

View reviewed changes

mulhod approved these changes Nov 30, 2021

View reviewed changes

desilinguist merged commit a5d5b3f into main Nov 30, 2021

delete-merged-branch bot deleted the 699-upgrade-scikit-learn-to-v1 branch November 30, 2021 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade scikit-learn to v1.0.1 #702

Upgrade scikit-learn to v1.0.1 #702

Uh oh!

desilinguist commented Nov 30, 2021

Uh oh!

codecov bot commented Nov 30, 2021 •

edited

Loading

Uh oh!

mulhod commented Nov 30, 2021

Uh oh!

desilinguist commented Nov 30, 2021

Uh oh!

Frost45 left a comment

Uh oh!

mulhod left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Upgrade scikit-learn to v1.0.1 #702

Upgrade scikit-learn to v1.0.1 #702

Uh oh!

Conversation

desilinguist commented Nov 30, 2021

Uh oh!

codecov bot commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mulhod commented Nov 30, 2021

Uh oh!

desilinguist commented Nov 30, 2021

Uh oh!

Frost45 left a comment

Choose a reason for hiding this comment

Uh oh!

mulhod left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Nov 30, 2021 •

edited

Loading