Skip to content

Conversation

@desilinguist
Copy link
Collaborator

The following changes were made in order to upgrade scikit-learn to v1.1.2 and seaborn to 0.12.0 (which was released today and is unpinned)

  • Update scikit-learn to latest version in requirements.txt.
  • Replace public SCORERS with private _SCORERS.
    • Scikit-learn has modified their metrics interface to deprecate the SCORERS` dictionary and made it private instead so we need to start the private interface. This is a bit risky as the private can change at any time but since we pin scikit-learn, this is of an issue.
  • Handle future deprecations in scikit-learn planned for v1.2 and v1.3.
    • Replace deprecated loss parameter for SGDClassifier with log_loss.
    • Replace deprecated auto parameter for tree models with sqrt.
    • Use get_feature_names_out() instead of deprecated get_feature_names()
    • Use 'estimator' instead of deprecated 'base_estimator' for RANSACRegressor.
    • Explicitly specify min_samples when using RANSACRegressor with an estimator that is not LinearRegression.
  • Update Boston voting regressor results.
  • Update learning curve code for seaborn v0.12.0
    • Seaborn v0.12.0 has some breaking API changes that now require the changes.
    • Hue levels and keywords should be handled by pointplot and FacetGrid.
    • We need to make sure that the variable in the data frame that maps to levels is categorical.
    • It is now recommended to explicitly assign palette colors to hue.

Scikit-learn has modified their metrics interface to deprecate the
`SCORERS` dictionary and made it private instead so we need to start
using the private interface. This is a bit risky as the private
interface can change at any time but since we pin scikit-learn, this is
less of an issue.
- Replace `loss` parameter for `SGDClassifier` with `log_loss`.
- Replace `auto` parameter for tree models with `sqrt`.
- Use `get_feature_names_out()` instead of `get_feature_names()`
- Explicitly specify `min_samples` when using RANSACRegressor with an
  estimator that is not LinearRegression.
- Use 'estimator' instead of deprecated 'base_estimator' for `RANSACRegressor`.
- Seaborn v0.12.0 has some breaking API changes that now require the
  following changes.
- Hue levels and keywords should be handled by `pointplot` and
  not `FacetGrid`.
- We need to make sure that the variable in the data frame that maps to
  hue levels is categorical.
- It is now recommended to explicitly assign palette colors to hue
  levels.
@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

Base: 96.90% // Head: 96.90% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (0b3ca9b) compared to base (1502fe8).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #716   +/-   ##
=======================================
  Coverage   96.90%   96.90%           
=======================================
  Files          63       63           
  Lines        9263     9271    +8     
=======================================
+ Hits         8976     8984    +8     
  Misses        287      287           
Impacted Files Coverage Δ
skll/utils/constants.py 100.00% <ø> (ø)
skll/__init__.py 100.00% <100.00%> (ø)
skll/experiments/__init__.py 94.63% <100.00%> (ø)
skll/experiments/output.py 97.38% <100.00%> (+0.01%) ⬆️
skll/learner/__init__.py 97.15% <100.00%> (+0.02%) ⬆️
skll/metrics.py 97.89% <100.00%> (+0.02%) ⬆️
tests/test_commandline_utils.py 99.67% <100.00%> (ø)
tests/test_custom_metrics.py 100.00% <100.00%> (ø)
tests/test_featureset.py 99.78% <100.00%> (ø)
tests/test_regression.py 99.64% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@desilinguist
Copy link
Collaborator Author

Github is being dumb and still displaying an old codecov result. There's no issue with coverage and indicated by codecov's comment on the PR.

num_metrics = len(df['metric'].unique())
df_melted = pd.melt(df, id_vars=[c for c in df.columns
if c not in ['train_score_mean', 'test_score_mean']])
# make sure the "variable" column is cateogrical since it will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# make sure the "variable" column is cateogrical since it will be
# make sure the "variable" column is categorical since it will be

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, @Frost45! I don't want to run the entire build again for this typo so I promise I will fix it in the next PR :)

@desilinguist desilinguist merged commit c49147b into main Sep 8, 2022
@desilinguist desilinguist deleted the 713-upgrade-scikit-learn branch September 8, 2022 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants