Skip to content

Add as_frame='auto' option in datasets.fetch_openml#17396

Merged
rth merged 5 commits intoscikit-learn:masterfrom
fujiaxiang:add_as_frame_auto_option_in_fetch_openml
Jun 10, 2020
Merged

Add as_frame='auto' option in datasets.fetch_openml#17396
rth merged 5 commits intoscikit-learn:masterfrom
fujiaxiang:add_as_frame_auto_option_in_fetch_openml

Conversation

@fujiaxiang
Copy link
Copy Markdown
Contributor

@fujiaxiang fujiaxiang commented May 31, 2020

Reference Issues/PRs

Closes #14888

What does this implement/fix? Explain your changes.

This PR adds an 'auto' option for argument as_frame in function datasets.fetch_openml.
When as_frame is set to be 'auto', returned data will be converted to pandas DataFrame or Series unless data is sparse.

Any other comments?

This PR doesn't fully close #14888. The team still need to discuss and decide whether to make 'auto' the default option.

@fujiaxiang fujiaxiang changed the title [WIP] Added 'auto' option for argument as_frame in datasets.fetch_openml [MRG] Added 'auto' option for argument as_frame in datasets.fetch_openml May 31, 2020
@fujiaxiang
Copy link
Copy Markdown
Contributor Author

ping

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me aside from a few comments below. I would also be +1 to make "auto" the default since fetch_openml is experimental.

data. If ``return_X_y`` is True, then ``(data, target)`` will be pandas
DataFrames or Series as describe above.
If as_frame is 'auto', the data and target will be converted to
DataFrame or Series as if as_frame is set to True, unless the dataset
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always a DataFrame I think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.target may be a Series. For example:

data_id = 61  # iris dataset version 1
data = fetch_openml(data_id=data_id, as_frame=True)

>>> type(data.target)
<class 'pandas.core.series.Series'>

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I would also be +1 to make "auto" the default since fetch_openml is experimental.

Let's see what other reviewers think about doing that.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Jun 9, 2020 via email

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Jun 9, 2020

Could you propose that change to default in another pull request?

@fujiaxiang
Copy link
Copy Markdown
Contributor Author

Sure. Do you want to merge this first? Or do I branch out from here?

@rth rth changed the title [MRG] Added 'auto' option for argument as_frame in datasets.fetch_openml Add as_frame='auto' option in datasets.fetch_openml Jun 10, 2020
@rth rth merged commit 77a3429 into scikit-learn:master Jun 10, 2020
@rth
Copy link
Copy Markdown
Member

rth commented Jun 10, 2020

Thanks! Please open a new PR for changing the default.

@fujiaxiang fujiaxiang deleted the add_as_frame_auto_option_in_fetch_openml branch June 16, 2020 11:15
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add fetch_openml(..., as_frame='auto') option

3 participants