-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
LabelEncoder ignores pandas CategoricalDtype order #12086
Copy link
Copy link
Closed
Labels
EnhancementModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practices
Milestone
Description
The order of labels of pandas’ categorical features as CategoricalDtype(order=True) might be used by estimators. For example:
print(houses['quality'].unique())
[poor, fair, typical, good, excellent]
Categories (4, object): [poor < fair < typical < good < excellent]
Note how order is embedded in the data type above.
I was expecting codes like these:
0 poor
1 fair
2 typical
3 good
4 excellent
And I'm sure estimators would provide more meaningful results if such order was used.
But LabelEncoder gives random integer codes, probably using data as it comes:
3 poor
1 fair
0 typical
4 good
2 excellent
Thank you in advance
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
EnhancementModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practices