-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[ENH] categorical feature support: input checking - column type encoding by the __dataframe__ protocol #6470
Copy link
Copy link
Closed
Labels
API designAPI design & software architectureAPI design & software architectureenhancementAdding new functionalityAdding new functionalitymodule:datatypesdatatypes module: data containers, checkers & convertersdatatypes module: data containers, checkers & convertersmodule:forecastingforecasting module: forecasting, incl probabilistic and hierarchical forecastingforecasting module: forecasting, incl probabilistic and hierarchical forecastingmodule:transformationstransformations module: time series transformation, feature extraction, pre-/post-processingtransformations module: time series transformation, feature extraction, pre-/post-processing
Metadata
Metadata
Assignees
Labels
API designAPI design & software architectureAPI design & software architectureenhancementAdding new functionalityAdding new functionalitymodule:datatypesdatatypes module: data containers, checkers & convertersdatatypes module: data containers, checkers & convertersmodule:forecastingforecasting module: forecasting, incl probabilistic and hierarchical forecastingforecasting module: forecasting, incl probabilistic and hierarchical forecastingmodule:transformationstransformations module: time series transformation, feature extraction, pre-/post-processingtransformations module: time series transformation, feature extraction, pre-/post-processing
Type
Projects
Status
Done
Thinking about apsects of the design for categorical feature support (umbrella issue: #6109), I realized that we need to have access to the information "which columns are categorical". Given how we adapt multiple data container formats (mtypes), and support planned in future for, e.g.,
polars, we should have a central interface point where we can query this.My suggestion, to avoid lots of case distinctions at the consumption points, is creation of a central interface point at input checking time, using the
datatypescheckers:checkfields forSeries,Panel,Hierarchicalmtypes are extended by afeature_kindfield, this is a tuple of__dataframe__DtypeKindenum values (e.g.,DtypeKind.CATEGORICALetc). Reference: https://data-apis.org/dataframe-protocol/latest/API.htmlfitchecks, this is written to self asfeature_kind_in_, similar tofeature_names_in_DtypeKind.CATEGORICALis in thefeature_kind_in_.FYI @Abhay-Lejith, @pranavvp16, @yarnabrina.
I would also suggest some communication between @Abhay-Lejith and @pranavvp16 - @Abhay-Lejith is focusing on categorical feature support, while @pranavvp16 is currently diving into the
datatypesmodule and is probably the one with the freshest working knowledge of it.