Skip to content

Conversation

@desilinguist
Copy link
Collaborator

This PR closes #699.

This change is pretty straightforward but it’s definitely backwards incompatible which we will reflect in the release version when we put together the release.

The change specifically motivated by the upgrade are:

  • Update requirements.txt and conda_requirements.txt to point to the latest scikit-learn (v1.0.1) and allow up to v1.0.2.
  • Use squared_error as the default value of the loss parameter for RANSACRegressor.
  • Use toarray() for converting sparse numpy arrays to dense instead of todense() since the latter returns an np.matrix instead of an np.ndarray. Scikit-learn is planning to drop support for np.matrix inputs and is already displaying FutureWarnings.
  • Force normalize to False for Lars models
    • Scikit-learn v1.0 will be deprecating the normalize attribute for linear models
    • This attribute is set to False by default in most scikit-learn linear models anyway and so no warnings are surfaced in SKLL.
    • However, for Lars, the default value of normalize is still set to True and so we need to force it to False to avoid FutureWarning instances.
    • The if statement added will actually lead to an execption in the _create_estimator() method when the normalize attribute doesn't exist, so that will be the perfect reminder to excise it entirely when the time comes.

Other changes include:

  • Update Python versions in CI builds (3.9 for Windows/Azure and 3.9 for Linux/Gitlab)
  • Update pre-commit and pre-commit hooks to their latest versions

- Use 3.8 for Linux
- Use 3.9 for Windows
- scikit-learn is dropping support for `np.matrix` which is what we get from `todense()`, so we need to use `toarray()` instead.
- The old default `squared_loss` has been deprecated and renamed to `squared_error`.
- Scikit-learn v1.0 will be deprecating the `normalize` attribute for linear models
- This attribute is set to `False` by default in most scikit-learn linear models anyway and so no warnings are surfaced in SKLL.
- However, for `Lars`, the default value of `normalize` is still set to `True` and so we need to force it to False to avoid deprecation warnings.
- This code will actually lead to an execption in the `_create_estimator()` method when the `normalize` attribute  doesn't exist, so that will be the perfect reminder to excise this if block entirely when the time comes.
@codecov
Copy link

codecov bot commented Nov 30, 2021

Codecov Report

Merging #702 (afad8de) into main (4ead823) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #702      +/-   ##
==========================================
+ Coverage   96.85%   96.89%   +0.03%     
==========================================
  Files          63       63              
  Lines        9098     9197      +99     
==========================================
+ Hits         8812     8911      +99     
  Misses        286      286              
Impacted Files Coverage Δ
tests/test_preprocessing.py 100.00% <ø> (ø)
skll/data/writers.py 94.17% <100.00%> (+0.06%) ⬆️
skll/learner/__init__.py 97.13% <100.00%> (+0.04%) ⬆️
skll/learner/utils.py 94.62% <100.00%> (+0.03%) ⬆️
tests/test_commandline_utils.py 99.66% <100.00%> (+<0.01%) ⬆️
tests/test_featureset.py 99.78% <100.00%> (+<0.01%) ⬆️
tests/test_regression.py 99.64% <100.00%> (+<0.01%) ⬆️
tests/test_cv.py 100.00% <0.00%> (ø)
tests/test_output.py 100.00% <0.00%> (ø)
tests/test_classification.py 100.00% <0.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ead823...afad8de. Read the comment docs.

@mulhod
Copy link
Contributor

mulhod commented Nov 30, 2021

Should the scikit-learn version also be updated in the Conda recipe? It is currently >=0.24.1,<=0.24.2. Would this version include 1.0, e.g. >=1.0,<=1.0.1?

@desilinguist
Copy link
Collaborator Author

Should the scikit-learn version also be updated in the Conda recipe? It is currently >=0.24.1,<=0.24.2. Would this version include 1.0, e.g. >=1.0,<=1.0.1?

Good catch! I will modify it to be the same as the requirements files.

Copy link
Contributor

@Frost45 Frost45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 🎉

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@desilinguist desilinguist merged commit a5d5b3f into main Nov 30, 2021
@delete-merged-branch delete-merged-branch bot deleted the 699-upgrade-scikit-learn-to-v1 branch November 30, 2021 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update scikit-learn to v1.0

4 participants