[WIP] fetch_openml: ability to return DataFrame#11875
[WIP] fetch_openml: ability to return DataFrame#11875jorisvandenbossche wants to merge 6 commits intoscikit-learn:masterfrom
Conversation
|
@jorisvandenbossche are you still working on this? Shall I take over? |
|
I think it can mainly use some review. Although, actually, probably docs and tests could already be added. Feel free to do that / push this forward! |
|
Great, I will make some time for this. Can I push to your PR, or should I fetch your branch and open a new PR? |
|
Added you as collaborator to my fork, you should be able to push to this PR then |
| X = data.iloc[:, col_slice_x] | ||
| X.columns = data_columns | ||
|
|
||
| all_numeric = all(features_dict[feature]['data_type'] == 'numeric' |
There was a problem hiding this comment.
we should also consider 'real' and 'integer'
|
|
||
| for feature in data_columns: | ||
| data_type = features_dict[feature]['data_type'] | ||
| if data_type == 'numeric': |
There was a problem hiding this comment.
We should add the case for 'real' which should also be np.float64
There was a problem hiding this comment.
Also we need an elif for 'integer' to have integer column.
There was a problem hiding this comment.
In the case of integer, it will be tricky to manage the missing values thought.
|
To think about:
|
|
anyone working on this right now? |
|
On my mental to do list. Apparently the feature is there (by Joris) but needs test cases |
|
|
||
| nominal_attributes = dict(nominal_attributes) | ||
|
|
||
| data = pd.DataFrame(arff_data) |
There was a problem hiding this comment.
Are there any risks in Pandas doing type inference here before we set the dtypes below?
|
We could try to finish this during the sprint @jorisvandenbossche ? |
|
I don't mind getting it right in terms of use of pd.Categorical. I'm sure
this will be helpful to users.
|
|
Tests failing, in case you weren't aware |
|
Let me know when this wants review. |
|
Any progress on this? I'd really like to have it ;) |
|
Continued and resolved in #13902. Closing. |
Just a proof of concept I quickly tried out, no docs or tests yet.
Fixes #11818