Skip to content

Discretizer #5778

@mblondel

Description

@mblondel

Binarizer transforms continuous values to two states (0 or 1). It would be nice to generalize this to an arbitrary number of states K.

This preprocessor would produce a scipy sparse matrix of shape (n_samples, K * n_features) using the one-of-K encoding. The K thresholds could be chosen uniformly between the min and max of each feature or using the K-quantiles.

For example, using uniformly chosen thresholds, if min=0, max=1.0 and K=3, a feature value between 0 and 0.33 would be encoded as [1, 0, 0], a value between 0.33 and 0.66 as [0, 1, 0] and a value between 0.66 and 1.0 as [0, 0, 1].

My usecase is that this encoding might be more meaningful than continuous values when using PolynomialFeatures.

Possibly related to #1062.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions