[MRG] Fixes to test_pprint.py breaking after changes to the tested API by scouvreur · Pull Request #13529 · scikit-learn/scikit-learn

scouvreur · 2019-03-27T13:56:07Z

Reference Issues/PRs

This PR is an improvement for issue #13508.

What does this implement/fix? Explain your changes.

This change attempts to improve pain points found in issue #13470, where changes to the underlying API parameters for which sklearn/utils/tests/test_pprint.py is tested causes the test to fail.

As a solution, this PR adds minimal examples of the classes listed below to the test itself.

To-do

Example constructors needing to be excerpted to test_pprinting:

jnothman · 2019-03-27T21:01:00Z

You haven't imported BaseEstimator

scouvreur · 2019-03-29T03:36:28Z

When I use pytest test_pprint.py to run tests on my changes, I seem to get an error at the make_pipeline(StandardScaler(), LogisticRegression(C=999)) stage, due to the a string not being 'passthrough'. Do you have any idea why that might be ?

Full stacktrace here:

$ pytest test_pprint.py 
================================================ test session starts ================================================
platform darwin -- Python 3.6.8, pytest-4.3.0, py-1.8.0, pluggy-0.9.0
rootdir: /Users/stephane.couvreur/Documents/Open_Source/scikit-learn, inifile: setup.cfg
plugins: remotedata-0.3.1, openfiles-0.3.2, cov-2.6.1
collected 9 items                                                                                                   

test_pprint.py ..F......                                                                                      [100%]

===================================================== FAILURES ======================================================
___________________________________________________ test_pipeline ___________________________________________________

    def test_pipeline():
        # Render a pipeline object
>       pipeline = make_pipeline(StandardScaler(), LogisticRegression(C=999))

test_pprint.py:94: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../pipeline.py:604: in make_pipeline
    return Pipeline(_name_estimators(steps), memory=memory)
../../pipeline.py:119: in __init__
    self._validate_steps()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = Pipeline(memory=None,
         steps=[('standardscaler',
                 StandardScaler(copy=True, with_mean=True, wi...                         solver='warn', tol=0.0001, verbose=0,
                                    warm_start=False))])

    def _validate_steps(self):
        names, estimators = zip(*self.steps)
    
        # validate names
        self._validate_names(names)
    
        # validate estimators
        transformers = estimators[:-1]
        estimator = estimators[-1]
    
        for t in transformers:
            if t is None or t == 'passthrough':
                continue
            if (not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not
                    hasattr(t, "transform")):
                raise TypeError("All intermediate steps should be "
                                "transformers and implement fit and transform "
                                "or be the string 'passthrough' "
                                "'%s' (type %s) doesn't" % (t, type(t)))
    
        # We allow last estimator to be None as an identity transformation
        if (estimator is not None and estimator != 'passthrough'
                and not hasattr(estimator, "fit")):
            raise TypeError(
                "Last step of Pipeline should implement fit "
                "or be the string 'passthrough'. "
>               "'%s' (type %s) doesn't" % (estimator, type(estimator)))
E           TypeError: Last step of Pipeline should implement fit or be the string 'passthrough'. 'LogisticRegression(C=999, class_weight=None, dual=False, fit_intercept=True,
E                              intercept_scaling=1, l1_ratio=None, max_iter=100,
E                              multi_class='warn', n_jobs=None, penalty='l2',
E                              random_state=None, solver='warn', tol=0.0001, verbose=0,
E                              warm_start=False)' (type <class 'sklearn.utils.tests.test_pprint.LogisticRegression'>) doesn't

../../pipeline.py:176: TypeError
======================================== 1 failed, 8 passed in 0.87 seconds =======================================

jnothman · 2019-03-30T23:35:53Z

I see. Define fit(self, X, y): return self

scouvreur · 2019-04-02T13:11:46Z

Apologies @jnothman @NicolasHug - is code coverage reduction inevitable as we excerpt more classes to test_pprint.py ?

NicolasHug · 2019-04-02T13:14:48Z

Don't worry about code coverage for now

scouvreur · 2019-04-02T13:27:52Z

Thanks @NicolasHug got it - I will keep working through the list then

scouvreur · 2019-04-05T07:10:51Z

Should I also excerpt the following @NicolasHug ?

sklearn/utils/tests/test_pprint.py

NicolasHug

A few comments, this looks good so far

Should I also excerpt the following @NicolasHug ?

SelectKBestP: No (signature is unlikely to change)
chi2: No (same)
SVC yes please
LinearSVC No, feel free to replace its use by SVC instead (you'll need to update the expected strings ;)). Else, yes
PCA yes please
NMF Yes
SimpleImputer Yes

In general, try inheriting from as few classes as possible. BaseEstimator is often enough.

sklearn/utils/tests/test_pprint.py

… attributes. All tests are passing except test_pipeline()

…ted RFE estimator, tests pass on local build

… skip]

…[ci skip]

…ci skip]

…cikit-learn#13585)

hermidalc · 2019-04-08T12:44:41Z

Sorry guys - I will just wait on some changes to OpenMP (#13390, #13543) to be able to build and run tests locally

Hi @scouvreur - are you using Anaconda? Create an independent conda environment just for sklearn development. You just need to run this below and it will have everything to build and test:

conda create -n sklearn-dev docutils matplotlib numpydoc cython joblib pandas pillow pyamg pytest python scipy sphinx
conda activate sklearn-dev

scouvreur · 2019-04-08T19:32:12Z

Thanks alot for the tip @hermidalc ! I still could not build on a Mac OS machine, but on Ubuntu I was able. In any case it is better practice to work within a clean env.

… class

…ctor

…timator base class

NicolasHug · 2019-04-08T20:00:44Z

Have you tried https://scikit-learn.org/dev/developers/advanced_installation.html#mac-osx ?

Merging #13543 might take some time

…city, removed _BaseComposition class import

…pected string using pytest test_pprint -vvv and removed LinearSVC class import

jnothman

Otherwise LGTM.
This should certainly make these tests less brittle

sklearn/neural_network/tests/test_mlp.py

scouvreur · 2019-04-09T09:47:51Z

Have you tried https://scikit-learn.org/dev/developers/advanced_installation.html#mac-osx ?
Merging #13543 might take some time

Thanks @NicolasHug - that worked !

NicolasHug

diff is a bit strange but LGTM anyway

sklearn/utils/tests/test_pprint.py

NicolasHug

diff is a bit strange but LGTM anyway

scouvreur · 2019-04-09T12:35:21Z

Thanks @NicolasHug @hermidalc @jnothman for your feedback and support - again alot of learning from solving this ! I will keep going, see you in the next one.

jnothman · 2019-04-09T12:46:20Z

Thanks @scouvreur!

…rn#13529)

…ikit-learn#13529)" This reverts commit 64cd9db.

…rn#13529)

scouvreur force-pushed the pprint-API-changes-fix branch from 5a6731e to ea8c2aa Compare April 1, 2019 07:06

scouvreur changed the title ~~[WIP] Fixes to test_pprint.py breaking after changes to the tested API~~ [MRG] Fixes to test_pprint.py breaking after changes to the tested API Apr 3, 2019

hermidalc added a commit to hermidalc/scikit-learn that referenced this pull request Apr 5, 2019

Update relevant test_pprint to be in line with pr scikit-learn#13529

eb19a96

hermidalc reviewed Apr 5, 2019

View reviewed changes

sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved

hermidalc mentioned this pull request Apr 5, 2019

RFE/RFECV step enhancements #13470

Closed

NicolasHug reviewed Apr 5, 2019

View reviewed changes

scouvreur added 11 commits April 6, 2019 20:53

PR setup

1d07295

Added BaseEstimator from sklearn.base, added LogisticRegression class…

1171f9e

… attributes. All tests are passing except test_pipeline()

Added fit() method to excerpted LogisticRegression class

b9d0ea9

Removed extra whitespace

9e5979d

Added RFE class to test [ci skip]

af636d3

Added _get_support_mask method delegated from SelectorMixin to excerp…

3a85a05

…ted RFE estimator, tests pass on local build

Fixed formatting and linting issues [ci skip]

88c56b9

Added excerpted GridSearchCV estimator, tests pass on local build [ci…

bea872b

… skip]

Added excerpted CountVectorizer estimator, tests pass on local build …

443dd7a

…[ci skip]

Added excerpted Pipeline estimator, tests pass on local build [ci skip]

8a6d7c5

Added excerpted StandardScaler estimator, tests pass on local build […

6c98465

…ci skip]

scouvreur force-pushed the pprint-API-changes-fix branch from 31b90a7 to 6c98465 Compare April 6, 2019 20:00

scouvreur added 2 commits April 6, 2019 21:05

Removed docstrings from excerpted constructors [ci skip]

73f33f4

Removed input checking from CountVectorizer constructor [ci skip]

87dff68

hermidalc added a commit to hermidalc/scikit-learn that referenced this pull request Apr 6, 2019

Update class defs to be in line with PR scikit-learn#13529

8f4e057

aditya1702 and others added 2 commits April 7, 2019 07:56

Use fixed random seed for generating X in test_mlp.test_gradient() (s…

3ce6237

…cikit-learn#13585)

DOC fix typo in comments for svm/classes.py (scikit-learn#13589)

95470d4

scouvreur added 4 commits April 8, 2019 20:35

Changed RFE class constructor to only inherit from BaseEstimator base…

1d02c54

… class

Removed VectorizerMixin inheritance from CountVectorizer class constu…

5775895

…ctor

Removed _validate_steps method from Pipeline class constructor

9270e85

Refactored GridSearchCV class constructor to only inherit from BaseEs…

e2d9168

…timator base class

scouvreur added 8 commits April 8, 2019 21:22

Excerpted PCA class constructor

69b3326

Excerpted NMF class constructor

73392c4

Excerpted SimpleImputer class constructor

d6dd05e

Cleanup of libraries no longer required

16b4d16

Changed Pipeline constructor to inherit from BaseEstimator for simpli…

024e52b

…city, removed _BaseComposition class import

Removed SelectorMixin base class import as no longer required

aa5f636

Excerpted SVC class constructor

90dce9d

Replaced LinearSVC with SVC in test_gridsearch_pipeline(), adapted ex…

19a4fe8

…pected string using pytest test_pprint -vvv and removed LinearSVC class import

jnothman approved these changes Apr 9, 2019

View reviewed changes

sklearn/neural_network/tests/test_mlp.py Show resolved Hide resolved

NicolasHug approved these changes Apr 9, 2019

View reviewed changes

sklearn/utils/tests/test_pprint.py Outdated Show resolved Hide resolved

NicolasHug approved these changes Apr 9, 2019

View reviewed changes

Removed unneeded nu classs attribute in SVC class constructor

bbe61f3

jnothman merged commit 0e3c187 into scikit-learn:master Apr 9, 2019

hermidalc added a commit to hermidalc/scikit-learn that referenced this pull request Apr 9, 2019

Update test_pprint.py to be in line with scikit-learn#13529

82efb27

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019

TST Fixes to make test_pprint.py more resilient to change (scikit-lea…

7558d50

…rn#13529)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

TST Fixes to make test_pprint.py more resilient to change (scikit-lea…

64cd9db

…rn#13529)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "TST Fixes to make test_pprint.py more resilient to change (sc…

54f0cf5

…ikit-learn#13529)" This reverts commit 64cd9db.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "TST Fixes to make test_pprint.py more resilient to change (sc…

fcca429

…ikit-learn#13529)" This reverts commit 64cd9db.

NicolasHug mentioned this pull request Jun 1, 2019

sklearn/utils/tests/test_pprint.py is too brittle to API changes #13508

Closed

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

TST Fixes to make test_pprint.py more resilient to change (scikit-lea…

6ce2b05

…rn#13529)

Uh oh!

Conversation

scouvreur commented Mar 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

To-do

Uh oh!

jnothman commented Mar 27, 2019

Uh oh!

scouvreur commented Mar 29, 2019

Uh oh!

jnothman commented Mar 30, 2019 via email

Uh oh!

scouvreur commented Apr 2, 2019

Uh oh!

NicolasHug commented Apr 2, 2019

Uh oh!

scouvreur commented Apr 2, 2019

Uh oh!

scouvreur commented Apr 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hermidalc commented Apr 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scouvreur commented Apr 8, 2019

Uh oh!

NicolasHug commented Apr 8, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scouvreur commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

scouvreur commented Apr 9, 2019

Uh oh!

jnothman commented Apr 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

scouvreur commented Mar 27, 2019 •

edited

Loading

scouvreur commented Apr 5, 2019 •

edited

Loading

hermidalc commented Apr 8, 2019 •

edited

Loading

scouvreur commented Apr 9, 2019 •

edited

Loading