API Change default value of as_frame in fetch_openml to 'auto'#17610
Conversation
|
I didn't add any warnings for this change of behavior because Let me know if you guys think we should add a warning. |
|
@glemaitre thanks for review, will follow the guidelines in the doc. Sorry I am not familiar with sklearn development policies. Will read the docs more. |
|
Don't worry. That's why I give pointer :) |
|
hi @glemaitre I added the future warning and corresponding test, could you review? |
glemaitre
left a comment
There was a problem hiding this comment.
Only nitpicks. Otherwise looks good
|
@fujiaxiang there was a conflict in the whats_new I took the liberty of fixing it, hopefully you don't mind too much. |
updated docstring Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
@glemaitre thanks for review, have updated the test and docstring as requested. @lesteve Not at all, thanks! I also took the liberty to reorder the whatsnew entries a bit by their labels (i.e |
jnothman
left a comment
There was a problem hiding this comment.
I think forcing the user to always set as_frame is excessively conservative, and creates unnecessary as_frame= where the dataset would fail to load with as_frame=False (e.g. string columns).
I think we should either:
- return a dataframe for any dataset that fails with as_frame=False (e.g. non-numeric datasets); or
- given the experimental label, simply adopt
as_frame='auto'but raise a ChangedBehaviourWarning in the case that as_frame=False would have returned an array, but now will return a dataframe (e.g. all-numeric data)
…_asframe_default_to_auto # Conflicts: # doc/whats_new/v0.24.rst # sklearn/datasets/_openml.py
Hi @jnothman, thanks for the two suggestions. I am not particularly fond of Option 1. As per discussions in #14888 and #17396, it causes confusion to users if the return type depends on the data content itself, and potentially causes unwanted exceptions. It also defeats the purpose of having I think Option 2 is ok. Although, as @glemaitre and @lesteve pointed out, we have already compensated for the |
I am + 1 with option 2. I would be in favor to keep ChangedBehaviour for these specific case. |
|
@glemaitre @jnothman I have updated the code to simply adopt new default value For now I have to include a filter to ignore the deprecation warning of @adrinjalali also mentioned in #17804 that perhaps we shouldn't even raise a warning for experimental features. In Tensorflow version policy (https://www.tensorflow.org/guide/versions)
In Tensorflow documentation (https://www.tensorflow.org/probability/api_docs/python/tfp/experimental)
Also in Tensorflow (https://www.tensorflow.org/api_docs/python/tf/autograph/experimental/Feature)
In Pandas version policy (https://pandas.pydata.org/docs/development/policies.html)
Also in pandas (https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html)
Both allows experimental features to change any time, and may do so without warning. So I think it is ok to make this change without any warning. It is a matter of how you guys (sklearn core development team) want to set the policy. I myself kind of prefer the flexibility to change experimental features without warning, as this allows developers to quickly test new features among receive community feedback. Another small downside of giving Perhaps you guys can have a discussion and decide what kind of policy/guideline you want to set? |
|
Yes, I would be very much happier if we treat the experimental features as experimental and change them when needed without a warning. The warnings unnecessarily clutter user's space and prevent us from raising warnings in places where we actually need to raise a warning (bad default param and intercept scaling just two examples). @fujiaxiang I know you really mean no harm, but it would be much nicer if you could avoid using "you guys" (https://youguys.club/). Thanks :) |
|
@adrinjalali @glemaitre @jnothman @lesteve I removed the warning message. Could you review? |
@adrinjalali Sure, will avoid using that phrase. |
|
ping |
|
@adrinjalali @glemaitre @jnothman @lesteve Anyone can help review? |
doc/whats_new/v0.24.rst
Outdated
| a pandas Series. :pr:`17491` by :user:`Alex Liang <tianchuliang>`. | ||
|
|
||
| - |API| The default value of `as_frame` in :func:`datasets.fetch_openml` will | ||
| change from False to 'auto' in 0.25. It now issues a `FutureWarning` when |
There was a problem hiding this comment.
It does not issue FutureWarning anymore.
added type hint for parameter `as_frame` Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…_asframe_default_to_auto
|
Thanks @fujiaxiang |
This is a follow up to #17396.
As discussed in #17396, I'm changing the default value of
as_framein functionfetch_openmlto'auto'.