-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
[RFC] return Series for target when using as_frame and return_X_y #16012
Copy link
Copy link
Open
Labels
Description
This a general question regarding the API of as_frame and return_X_y with our loader/fetcher. We have an inconsistent behaviour which could be solved straight away.
fetch_opeml introduced as_frame exposing frame attribute in the Bunch which is a Pandas DataFrame. In conjunction with return_X_y=True, we exposed data and target. data will always be a DataFrame while target is supposed to be a DataFrame or a Series depending on the number of columns in the target.
X, y = fetch_openml('iris', as_frame=True, return_X_y=True)
y0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
Name: class, Length: 150, dtype: category
Categories (3, object): [Iris-setosa, Iris-versicolor, Iris-virginica]
In #15950, we introduce as_frame to fetch_california_housing. The API is the same apart of the target output. The target is a DataFrame even with a single column.
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
y MedHouseVal
0 4.526
1 3.585
2 3.521
3 3.413
4 3.422
... ...
20635 0.781
20636 0.771
20637 0.923
20638 0.847
20639 0.894
[20640 rows x 1 columns]
So my question is: what type of target do we want when the target is 1D?
Reactions are currently unavailable