Skip to content
This repository was archived by the owner on Nov 27, 2023. It is now read-only.

Merge dask ml 1.6.0#10

Merged
zachary-mcpher merged 39 commits intomasterfrom
merge-dask-ml-1.6.0
Aug 11, 2020
Merged

Merge dask ml 1.6.0#10
zachary-mcpher merged 39 commits intomasterfrom
merge-dask-ml-1.6.0

Conversation

@zachary-mcpher
Copy link
Copy Markdown

Brings our ZEFR-INC/dask-ml fork of dask/dask-ml up to date with dask/dask-ml's v.1.6.0.

TomAugspurger and others added 30 commits April 2, 2020 08:25
* Add annotations for metrics module
Co-authored-by: dheim <physics@d-heim.de>
Co-authored-by: Yuuichi ASAHI <asahi.yuuichi@qst.go.jp>
* DOC: make dataframe conversion more clear
* DOC: better examples

* MAINT: fix typo

* Fix XGBoost link
* basic make_classification as dask.DataFrame

* returning a tuple of (DataFrame, Series)
…pted" (dask#653)

* MAINT: better chunk warning in utils.py
* Added annotations to preprocessing

* remove mypy warnings

* Correct: import dask.dataframe

* No errors

* __typing

* Resolve more errors

* Flake8 and Isort errors

* Nits

* Nits

* Nits

* Return type added

* Nits

* Callable
* Add blockwise estimators

```python
In [1]: import sklearn.linear_model
   ...: import dask_ml.datasets
   ...: import dask_ml.ensemble
   ...:
   ...: X, y = dask_ml.datasets.make_classification(n_features=20, chunks=25)
   ...:
   ...: clf = dask_ml.ensemble.BlockwiseVotingClassifier(
   ...:     sklearn.linear_model.LogisticRegression(), voting="soft",
   ...:     classes=[0, 1]
   ...: )
   ...:
   ...: clf.fit(X, y)

In [2]: clf.estimators_
Out[2]:
[LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False)]
```
* DOC: clarify model selection issues
* Added InverseDecaySearchCV
* DOC: 1.4.0

* fixup
* CI: Added 3.8

* bump to 38

* compat fixups

* impute n_features_in

* pca n_features_in_

* more compat

* more compat

* compat

* new URL

* require 0.23+

* require 0.23+

* fixup

* cleanup

* fix iid

* trigger ci

* linting

* xfail

* xfail

* more xfail
* DEPR: Remove previously deprecated Partials
* Log amount of found clusters in kmeans init
Copy link
Copy Markdown

@jamesbsilva jamesbsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell the parts related to CV are just adding typing and doctor stuff so I think should be compatible especially if all tests pass

Copy link
Copy Markdown

@tian-yi-zefr tian-yi-zefr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any particular files we need to review?

@zachary-mcpher
Copy link
Copy Markdown
Author

zachary-mcpher commented Aug 11, 2020

is there any particular files we need to review?

Nothing in particular, I've looked through most of the changes and I don't suspect any to be breaking. Some notable additions:

* Added :class:`dask_ml.decomposition.IncrementalPCA` for out-of-core / distributed incremental PCA (:pr:`619`)
* Scikit-Learn 0.23.0 or newer is now required
* Improved logging and monitoring in incremental model selection (:pr:`528`)
* Compatibility with Scikit-Learn 0.23.0 (:pr:`669`)
* * Added support for ``dask.dataframe`` objects in :class:`dask_ml.model_selection.HyperbandSearchCV` (:pr:`701`)
* Addition of `dask_ml.ensemble.BlockwiseVotingClassifier
* Addition of dask_ml.feature_extraction.CountVectorizer

Copy link
Copy Markdown

@tian-yi-zefr tian-yi-zefr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try it!

Copy link
Copy Markdown

@zexuan-zhou zexuan-zhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@zachary-mcpher zachary-mcpher merged commit 93a6dab into master Aug 11, 2020
@zachary-mcpher zachary-mcpher deleted the merge-dask-ml-1.6.0 branch August 11, 2020 23:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.