Skip to content

sklearn.base.clone cannot clone estimator with pandas data frame parameters #5522

@yinsong1986

Description

@yinsong1986

I am trying to create an estimator with pandas data frame as one of the parameters, and find sklearn has problem for cloning this estimator by sklearn.base.clone. The sample code is as following:

from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd
from sklearn.base import clone

class DummyEstimator(BaseEstimator, TransformerMixin):
    """This is a dummpy class for generating numerical features

    This feature extractor extracts numerical features from pandas data frame.

    Parameters
    ----------

    df: pandas data frame
        The pandas data frame parameter.

    Notes
    -----
    """
    def __init__(self, df):
        self.df =df

    def fit(self, X, y=None):
        pass

    def transform(self, X, y=None):
        pass

if __name__ == "__main__":
    # Generate a data frame
    d = {"a": [1, 2, 3],
         "b": [4, 5, 6],
         "c": [7, 8, 9]
    }
    df = pd.DataFrame(d)
    # Get an estimator instance
    de = DummyEstimator(df)
    # Clone the estimator
    ret = clone(de)

if you run the above code, you would get error as following:

Traceback (most recent call last):
File "bug.py", line 38, in <module>
    ret = clone(de)
File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 93, in clone
    if not equality_test:
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 730, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions