Skip to content

[ENH] categorical feature support: input checking - column type encoding by the __dataframe__ protocol #6470

@fkiraly

Description

@fkiraly

Thinking about apsects of the design for categorical feature support (umbrella issue: #6109), I realized that we need to have access to the information "which columns are categorical". Given how we adapt multiple data container formats (mtypes), and support planned in future for, e.g., polars, we should have a central interface point where we can query this.

My suggestion, to avoid lots of case distinctions at the consumption points, is creation of a central interface point at input checking time, using the datatypes checkers:

  • the check fields for Series, Panel, Hierarchical mtypes are extended by a feature_kind field, this is a tuple of __dataframe__ DtypeKind enum values (e.g., DtypeKind.CATEGORICAL etc). Reference: https://data-apis.org/dataframe-protocol/latest/API.html
  • in relevant estimators, on fit checks, this is written to self as feature_kind_in_, similar to feature_names_in_
  • a check for "has categorical features?" can then be performed by checking whether DtypeKind.CATEGORICAL is in the feature_kind_in_.

FYI @Abhay-Lejith, @pranavvp16, @yarnabrina.

I would also suggest some communication between @Abhay-Lejith and @pranavvp16 - @Abhay-Lejith is focusing on categorical feature support, while @pranavvp16 is currently diving into the datatypes module and is probably the one with the freshest working knowledge of it.

Metadata

Metadata

Assignees

Labels

API designAPI design & software architectureenhancementAdding new functionalitymodule:datatypesdatatypes module: data containers, checkers & convertersmodule:forecastingforecasting module: forecasting, incl probabilistic and hierarchical forecastingmodule:transformationstransformations module: time series transformation, feature extraction, pre-/post-processing

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions