API Change default value of as_frame in fetch_openml to 'auto' by fujiaxiang · Pull Request #17610 · scikit-learn/scikit-learn

fujiaxiang · 2020-06-16T11:06:40Z

This is a follow up to #17396.

As discussed in #17396, I'm changing the default value of as_frame in function fetch_openml to 'auto'.

fujiaxiang · 2020-06-16T11:19:19Z

I didn't add any warnings for this change of behavior because fetch_openml is an experimental feature: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html

Let me know if you guys think we should add a warning.

sklearn/datasets/_openml.py

fujiaxiang · 2020-06-16T13:26:13Z

@glemaitre thanks for review, will follow the guidelines in the doc.

Sorry I am not familiar with sklearn development policies. Will read the docs more.

glemaitre · 2020-06-16T14:00:50Z

Don't worry. That's why I give pointer :)

…_asframe_default_to_auto

fujiaxiang · 2020-06-17T12:45:07Z

hi @glemaitre I added the future warning and corresponding test, could you review?

glemaitre

Only nitpicks. Otherwise looks good

sklearn/datasets/_openml.py

sklearn/datasets/tests/test_openml.py

lesteve · 2020-06-23T15:23:59Z

@fujiaxiang there was a conflict in the whats_new I took the liberty of fixing it, hopefully you don't mind too much.

updated docstring Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

fujiaxiang · 2020-06-24T02:43:32Z

@glemaitre thanks for review, have updated the test and docstring as requested.

@lesteve Not at all, thanks! I also took the liberty to reorder the whatsnew entries a bit by their labels (i.e Enhancement comes before API). Could you review?

lesteve

A few comments

sklearn/datasets/_openml.py

doc/whats_new/v0.24.rst

…_asframe_default_to_auto

jnothman

I think forcing the user to always set as_frame is excessively conservative, and creates unnecessary as_frame= where the dataset would fail to load with as_frame=False (e.g. string columns).

I think we should either:

return a dataframe for any dataset that fails with as_frame=False (e.g. non-numeric datasets); or
given the experimental label, simply adopt as_frame='auto' but raise a ChangedBehaviourWarning in the case that as_frame=False would have returned an array, but now will return a dataframe (e.g. all-numeric data)

…_asframe_default_to_auto # Conflicts: # doc/whats_new/v0.24.rst # sklearn/datasets/_openml.py

fujiaxiang · 2020-07-06T04:04:17Z

I think we should either:

return a dataframe for any dataset that fails with as_frame=False (e.g. non-numeric datasets); or

given the experimental label, simply adopt as_frame='auto' but raise a ChangedBehaviourWarning in the case that as_frame=False would have returned an array, but now will return a dataframe (e.g. all-numeric data)

Hi @jnothman, thanks for the two suggestions.

I am not particularly fond of Option 1. As per discussions in #14888 and #17396, it causes confusion to users if the return type depends on the data content itself, and potentially causes unwanted exceptions. It also defeats the purpose of having 'auto' being one of the option for as_frame.

I think Option 2 is ok. Although, as @glemaitre and @lesteve pointed out, we have already compensated for the experimental label by shortening the deprecation cycle by 1 version. Do you think this is still too conservative? Another issue we face now is that ChangedBehaviourWarning itself is deprecated in 0.24. If we are to go with Option 2, is there an alternative to it (perhaps UserWarning)? I didn't find such info in the deprecation message.

glemaitre · 2020-07-06T07:44:02Z

given the experimental label, simply adopt as_frame='auto' but raise a ChangedBehaviourWarning in the case that as_frame=False would have returned an array, but now will return a dataframe (e.g. all-numeric data)

I am + 1 with option 2. I would be in favor to keep ChangedBehaviour for these specific case.

fujiaxiang · 2020-07-07T05:58:03Z

@glemaitre @jnothman I have updated the code to simply adopt new default value as_frame='auto' in 0.24 and raises ChangedBehaviorWarning in cases where it used to return array but now returns DataFrame.

For now I have to include a filter to ignore the deprecation warning of ChangedBehaviorWarning in the test.

@adrinjalali also mentioned in #17804 that perhaps we shouldn't even raise a warning for experimental features.
This seems to be inline with other similar projects pandas and tensorflow, which follows or loosely follows Semantic Verisoning 2.0.
I couldn't find any scikit-learn documentation that explicitly discussed version policy for experimental features, so I took a look at similar projects tensorflow and pandas.

In Tensorflow version policy (https://www.tensorflow.org/guide/versions)

Some parts of TensorFlow can change in backward incompatible ways at any point. These include:

Experimental APIs:

In Tensorflow documentation (https://www.tensorflow.org/probability/api_docs/python/tfp/experimental)

tfp.experimental has no API stability guarantee. The public footprint of tfp.experimental code may change without notice or warning.

Also in Tensorflow (https://www.tensorflow.org/api_docs/python/tf/autograph/experimental/Feature)

These conversion options are experimental. They are subject to change without notice and offer no guarantees.

In Pandas version policy (https://pandas.pydata.org/docs/development/policies.html)

Pandas may change the behavior of experimental features at any time.

Also in pandas (https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html)

Experimental: the behaviour of pd.NA can still change without warning.

StringDtype is currently considered experimental. The implementation and parts of the API may change without warning.

Both allows experimental features to change any time, and may do so without warning.

So I think it is ok to make this change without any warning. It is a matter of how you guys (sklearn core development team) want to set the policy. I myself kind of prefer the flexibility to change experimental features without warning, as this allows developers to quickly test new features among receive community feedback. Another small downside of giving ChangedBahaviorWarning is when a behavior is changed a second time while the feature is still experimental. It may confuse users what's actually going on.

Perhaps you guys can have a discussion and decide what kind of policy/guideline you want to set?

adrinjalali · 2020-07-07T08:03:00Z

Yes, I would be very much happier if we treat the experimental features as experimental and change them when needed without a warning. The warnings unnecessarily clutter user's space and prevent us from raising warnings in places where we actually need to raise a warning (bad default param and intercept scaling just two examples).

@fujiaxiang I know you really mean no harm, but it would be much nicer if you could avoid using "you guys" (https://youguys.club/). Thanks :)

…_asframe_default_to_auto

fujiaxiang · 2020-07-12T03:19:29Z

@adrinjalali @glemaitre @jnothman @lesteve I removed the warning message. Could you review?

fujiaxiang · 2020-07-12T03:19:53Z

I know you really mean no harm, but it would be much nicer if you could avoid using "you guys"

@adrinjalali Sure, will avoid using that phrase.

fujiaxiang · 2020-07-14T10:21:58Z

ping

fujiaxiang · 2020-08-08T03:34:36Z

@adrinjalali @glemaitre @jnothman @lesteve Anyone can help review?

sklearn/datasets/_openml.py

glemaitre · 2020-08-10T07:19:25Z

doc/whats_new/v0.24.rst

  a pandas Series. :pr:`17491` by :user:`Alex Liang <tianchuliang>`.

+- |API| The default value of `as_frame` in :func:`datasets.fetch_openml` will
+  change from False to 'auto' in 0.25. It now issues a `FutureWarning` when


It does not issue FutureWarning anymore.

added type hint for parameter `as_frame` Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…_asframe_default_to_auto

jnothman · 2020-08-15T11:06:46Z

Thanks @fujiaxiang

…t-learn#17610)

changed default value of as_frame in fetch_openml to 'auto'

51b18e8

github-actions bot added the module:datasets label Jun 16, 2020

added whatsnew entry

e0ec3ae

fujiaxiang changed the title ~~[WIP] Change default value of as_frame in fetch_openml to 'auto'~~ [MRG] Change default value of as_frame in fetch_openml to 'auto' Jun 16, 2020

glemaitre reviewed Jun 16, 2020

View reviewed changes

sklearn/datasets/_openml.py Show resolved Hide resolved

fujiaxiang changed the title ~~[MRG] Change default value of as_frame in fetch_openml to 'auto'~~ [WIP] Change default value of as_frame in fetch_openml to 'auto' Jun 16, 2020

fujiaxiang added 5 commits June 17, 2020 15:43

added FutureWarning in fetch_openml when as_frame is not specified

323dc11

minor update in warnings filter

d6ef7c6

Merge remote-tracking branch 'upstream/master' into fetch_openml_make…

e798ca3

…_asframe_default_to_auto

added one more warnings filter

736e375

updated docstring

bb30bb5

fujiaxiang requested a review from glemaitre June 17, 2020 12:44

fujiaxiang changed the title ~~[WIP] Change default value of as_frame in fetch_openml to 'auto'~~ [MRG] Change default value of as_frame in fetch_openml to 'auto' Jun 17, 2020

glemaitre approved these changes Jun 23, 2020

View reviewed changes

sklearn/datasets/_openml.py Outdated Show resolved Hide resolved

sklearn/datasets/tests/test_openml.py Outdated Show resolved Hide resolved

Merge branch 'master' into fetch_openml_make_asframe_default_to_auto

04c492c

fujiaxiang and others added 2 commits June 24, 2020 10:21

Apply suggestions from code review

b4f4b37

updated docstring Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

updated warnings test; cleaned up whatsnew; reformatted docstring

4396e0a

reformatted docstring

d94feb6

lesteve reviewed Jun 24, 2020

View reviewed changes

sklearn/datasets/_openml.py Outdated Show resolved Hide resolved

sklearn/datasets/_openml.py Outdated Show resolved Hide resolved

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved

fujiaxiang added 4 commits June 25, 2020 14:38

Merge remote-tracking branch 'upstream/master' into fetch_openml_make…

5067210

…_asframe_default_to_auto

added more explanation in warning message

acf47e6

updated warning message in test

bf82f20

updated docstring to give more details

0a78438

fujiaxiang requested a review from lesteve June 25, 2020 10:40

jnothman requested changes Jun 25, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into fetch_openml_make…

2d1e860

…_asframe_default_to_auto # Conflicts: # doc/whats_new/v0.24.rst # sklearn/datasets/_openml.py

fujiaxiang requested a review from jnothman July 6, 2020 05:28

glemaitre mentioned this pull request Jul 6, 2020

MNT deprecate ChangedBehaviorWarning and NonBLASDotWarning #17804

Merged

fujiaxiang added 3 commits July 7, 2020 10:42

changed default value of as_frame in fetch_openml to 'auto'

7b93b19

added as_frame=False in some existing tests

fabcbd4

added as_frame=False in an existing tests

66f6c9d

fujiaxiang added 3 commits July 12, 2020 10:07

Merge remote-tracking branch 'upstream/master' into fetch_openml_make…

81190ba

…_asframe_default_to_auto

removed ChangedBehaviorWarning and relevant test

588f3ab

removed unused imports

e22c1ed

cmarmo added the Waiting for Reviewer label Jul 19, 2020

glemaitre reviewed Aug 10, 2020

View reviewed changes

glemaitre removed the Waiting for Reviewer label Aug 10, 2020

fujiaxiang and others added 4 commits August 15, 2020 09:05

Update sklearn/datasets/_openml.py

3f77c38

added type hint for parameter `as_frame` Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

updated whatsnew entry

80830b4

Merge remote-tracking branch 'upstream/master' into fetch_openml_make…

1e14748

…_asframe_default_to_auto

explicity type conversion to satisfy mypy requirement

2033cd7

jnothman approved these changes Aug 15, 2020

View reviewed changes

jnothman changed the title ~~[MRG] Change default value of as_frame in fetch_openml to 'auto'~~ API Change default value of as_frame in fetch_openml to 'auto' Aug 15, 2020

jnothman merged commit bdf2ff5 into scikit-learn:master Aug 15, 2020

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

API Change default value of as_frame in fetch_openml to 'auto' (sciki…

a4b4258

…t-learn#17610)

Uh oh!

Conversation

fujiaxiang commented Jun 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fujiaxiang commented Jun 16, 2020

Uh oh!

Uh oh!

fujiaxiang commented Jun 16, 2020

Uh oh!

glemaitre commented Jun 16, 2020

Uh oh!

fujiaxiang commented Jun 17, 2020

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lesteve commented Jun 23, 2020

Uh oh!

fujiaxiang commented Jun 24, 2020

Uh oh!

lesteve left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

fujiaxiang commented Jul 6, 2020

Uh oh!

glemaitre commented Jul 6, 2020

Uh oh!

fujiaxiang commented Jul 7, 2020

Uh oh!

adrinjalali commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fujiaxiang commented Jul 12, 2020

Uh oh!

fujiaxiang commented Jul 12, 2020

Uh oh!

fujiaxiang commented Jul 14, 2020

Uh oh!

fujiaxiang commented Aug 8, 2020

Uh oh!

Uh oh!

glemaitre Aug 10, 2020

Choose a reason for hiding this comment

Uh oh!

jnothman commented Aug 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fujiaxiang commented Jun 16, 2020 •

edited

Loading

adrinjalali commented Jul 7, 2020 •

edited

Loading