Skip to content

UnaryEncoder #8628

@jnothman

Description

@jnothman

I'm sure we've discussed this before, but I'm not sure where, and there certainly does not appear to be an active PR. For ordinal (and discretized; see #7668) features, a "unary" encoding (is there a better name for this) is more informative than a one-hot encoding. For k values 0, ..., k - 1 of the ordinal feature x, this creates k - 1 binary features such that the ith is active if x > i (for i = 0, ... k - 1). Below is an initial implementation.

class UnaryEncoder(BaseEstimator, TransformerMixin):
    def __init__(self, n_values):
        self.n_values = n_values

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        values = np.arange(self.n_values - 1)
        X = check_array(X)
        Xt = np.hstack([values < X[:, i, None] for i in range(X.shape[1])])
        return Xt
>>> UnaryEncoder(3).fit_transform([[0], [1], [2]])
array([[0, 0],
       [ 1, 0],
       [ 1,  1]])

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions