1

I am trying to fit a model using Dask framework, and the estimator used in example says it does not accept Dask dataframe. Can someone help me please ?

    from dask_ml.model_selection import IncrementalSearchCV
    from sklearn.linear_model import SGDClassifier

    ddx,ddy = dd.from_pandas(X,chunksize=100000), 
    dd.from_pandas(y,chunksize=100000)
    X_train, X_test, y_train, y_test = train_test_split(ddx, ddy)
    model = SGDClassifier(loss='log')
    params = { 'alpha': np.logspace(-2, 1, num=1000) }
    search = IncrementalSearchCV(model, params,
                         n_initial_parameters=10, random_state=0)
    search.fit(X_train, y_train, classes=classes)
    y_pred = search.predict_proba(X_test) 

The error log is : TypeError: This estimator does not support dask dataframes.

It appears on the search fit line. When I replace by partial_fit it works but then the same error happens on the predict_proba line.

5
  • Please provide more context. More specifically - how do you import the SGDClassifier and IncrementalSearchCV? Also, is this the row where the error occurs - search.fit(X_train, y_train, classes=classes)? Commented Mar 25, 2020 at 10:24
  • 1
    Also, everything works fine when i use Incremental instead of IncrementalSearchCV but the issue is : I need to optimize the hyperparameters. Commented Mar 25, 2020 at 11:09
  • Is there a chance dask_ml also provides an SGDClassifier? Maybe you should use the "dask version" if there is one, of course. Commented Mar 25, 2020 at 12:15
  • Also, if you update your question with what you told me in the comments (here), it would be better for others who read it and would be able to help you more :) Commented Mar 25, 2020 at 12:15
  • Dask uses these imports on its quick-run examples so it should work with these imports Commented Mar 25, 2020 at 12:34

1 Answer 1

2

IncrementalSearchCV currently requires Dask Arrays, perhaps you can convert your data.

I opened https://github.com/dask/dask-ml/issues/628 to suport dataframes. Would welcome help if you're interested in working on it.

Sign up to request clarification or add additional context in comments.

1 Comment

Hello, my X_train data has several columns. What you are saying is that I can't use IncrementalSearchCV to optimize the model for now ? How can I help on the Github issue ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.