Merge dask ml 1.6.0 by zachary-mcpher · Pull Request #10 · ZEFR-INC/dask-ml

zachary-mcpher · 2020-08-11T18:07:12Z

Brings our ZEFR-INC/dask-ml fork of dask/dask-ml up to date with dask/dask-ml's v.1.6.0.

* Add annotations for metrics module

Co-authored-by: dheim <physics@d-heim.de> Co-authored-by: Yuuichi ASAHI <asahi.yuuichi@qst.go.jp>

* DOC: make dataframe conversion more clear

* DOC: better examples * MAINT: fix typo * Fix XGBoost link

* basic make_classification as dask.DataFrame * returning a tuple of (DataFrame, Series)

…pted" (dask#653) * MAINT: better chunk warning in utils.py

Closes dask#658

* Added annotations to preprocessing * remove mypy warnings * Correct: import dask.dataframe * No errors * __typing * Resolve more errors * Flake8 and Isort errors * Nits * Nits * Nits * Return type added * Nits * Callable

* Add blockwise estimators ```python In [1]: import sklearn.linear_model ...: import dask_ml.datasets ...: import dask_ml.ensemble ...: ...: X, y = dask_ml.datasets.make_classification(n_features=20, chunks=25) ...: ...: clf = dask_ml.ensemble.BlockwiseVotingClassifier( ...: sklearn.linear_model.LogisticRegression(), voting="soft", ...: classes=[0, 1] ...: ) ...: ...: clf.fit(X, y) In [2]: clf.estimators_ Out[2]: [LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False), LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False), LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False), LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)] ```

* DOC: clarify model selection issues * Added InverseDecaySearchCV

* DOC: 1.4.0 * fixup

* CI: Added 3.8 * bump to 38 * compat fixups * impute n_features_in * pca n_features_in_ * more compat * more compat * compat * new URL * require 0.23+ * require 0.23+ * fixup * cleanup * fix iid * trigger ci * linting * xfail * xfail * more xfail

* DEPR: Remove previously deprecated Partials

…k#506)

* Log amount of found clusters in kmeans init

Only within-block shuffling is allowed. This is the default, regardless of the types.

* Support Dask Dataframes in Hyperband

* Update changelog for 1.6.0

…-ml-1.6.0 RLS: v.1.6.0

jamesbsilva

From what I can tell the parts related to CV are just adding typing and doctor stuff so I think should be compatible especially if all tests pass

tian-yi-zefr

is there any particular files we need to review?

zachary-mcpher · 2020-08-11T18:28:03Z

is there any particular files we need to review?

Nothing in particular, I've looked through most of the changes and I don't suspect any to be breaking. Some notable additions:

* Added :class:`dask_ml.decomposition.IncrementalPCA` for out-of-core / distributed incremental PCA (:pr:`619`)
* Scikit-Learn 0.23.0 or newer is now required
* Improved logging and monitoring in incremental model selection (:pr:`528`)
* Compatibility with Scikit-Learn 0.23.0 (:pr:`669`)
* * Added support for ``dask.dataframe`` objects in :class:`dask_ml.model_selection.HyperbandSearchCV` (:pr:`701`)
* Addition of `dask_ml.ensemble.BlockwiseVotingClassifier
* Addition of dask_ml.feature_extraction.CountVectorizer

tian-yi-zefr

let's try it!

zexuan-zhou

lgtm.

TomAugspurger and others added 30 commits April 2, 2020 08:25

fail on linting issues (dask#634)

fddc19d

Annotate metrics (dask#630)

e5a2f27

* Add annotations for metrics module

Incremental pca (dask#619)

85d0454

Co-authored-by: dheim <physics@d-heim.de> Co-authored-by: Yuuichi ASAHI <asahi.yuuichi@qst.go.jp>

DOC: provide methods to circumvent unknown chunk sizes (dask#637)

b9d903e

* DOC: make dataframe conversion more clear

DOC: better link to dask-examples (dask#642)

5d24b72

* DOC: better examples * MAINT: fix typo * Fix XGBoost link

Remove dask-tensorflow references (dask#645)

ad13e1f

ENH: provide monitoring of model selection searches (dask#528)

6fc6011

make_classification as dask.DataFrame (dask#644)

6142908

* basic make_classification as dask.DataFrame * returning a tuple of (DataFrame, Series)

Updated estimator check link in contrib docs. (dask#648)

ab6f9cc

MAINT: provide potential solutions in warning on "dataframes not acce…

5e5633b

…pted" (dask#653) * MAINT: better chunk warning in utils.py

Import ABC from collections.abc for Python 3 compatibility. (dask#656)

07591b7

Normalize fixup (dask#659)

e6fd1b5

Closes dask#658

Added annotations to preprocessing (dask#640)

fbcfdf7

* Added annotations to preprocessing * remove mypy warnings * Correct: import dask.dataframe * No errors * __typing * Resolve more errors * Flake8 and Isort errors * Nits * Nits * Nits * Return type added * Nits * Callable

DOC: clarify model selection issues (dask#432)

c535a51

* DOC: clarify model selection issues * Added InverseDecaySearchCV

DOC: 1.4.0 (dask#662)

a749910

* DOC: 1.4.0 * fixup

RLS: v1.4.0

e6183d6

BUG: don't try to invoke pdb while init'ing models (dask#670)

682da8f

CI: Added 3.8 (dask#669)

9b600ab

* CI: Added 3.8 * bump to 38 * compat fixups * impute n_features_in * pca n_features_in_ * more compat * more compat * compat * new URL * require 0.23+ * require 0.23+ * fixup * cleanup * fix iid * trigger ci * linting * xfail * xfail * more xfail

DEPR: Remove previously deprecated Partials (dask#674)

25beabe

* DEPR: Remove previously deprecated Partials

RLS: v.1.5.0

201b998

DOC: mirrors cv note in Scikit-Learn docs for RandomizedSearchCV (das…

0f4efe8

…k#506)

MAINT: use async/await in model selection (dask#675)

a770dbd

MAINT: in tests, don't use patch as a context manager (dask#679)

4fe3af2

Log amount of found clusters in kmeans init (dask#688)

c04fdc9

* Log amount of found clusters in kmeans init

update doc instructions (dask#694)

81b9843

DOC: add install pre-commit code (dask#692)

daf9ea8

remove xpassing tests (dask#695)

7c1eecb

MAINT: Add type hints to model selection (dask#668)

fc6a04a

Docs: Add datasets to API documentation (dask#702)

fe10a45

TomAugspurger and others added 9 commits July 21, 2020 08:04

DOC: Fixed train_test_split docs for Dask DataFrame (dask#703)

c609ab3

Only within-block shuffling is allowed. This is the default, regardless of the types.

Support Dask Dataframes in Hyperband (dask#701)

382bbb6

* Support Dask Dataframes in Hyperband

DOC: add conda build instructions (dask#706)

389084a

ENH: add squared option to mse (dask#707)

2a5e967

CI: upstream-dev compat (dask#709)

ac6af85

Count vectorizer (with Actors) (dask#705)

236e13a

Update changelog for 1.6.0 (dask#711)

2038baf

* Update changelog for 1.6.0

RLS: v1.6.0

868ac5d

Merge tag 'v1.6.0' of https://github.com/dask/dask-ml into merge-dask…

016b0ed

…-ml-1.6.0 RLS: v.1.6.0

zachary-mcpher requested review from jamesbsilva, ryan-deak-zefr, tian-yi-zefr and zexuan-zhou August 11, 2020 18:07

jamesbsilva approved these changes Aug 11, 2020

View reviewed changes

tian-yi-zefr reviewed Aug 11, 2020

View reviewed changes

tian-yi-zefr approved these changes Aug 11, 2020

View reviewed changes

zexuan-zhou approved these changes Aug 11, 2020

View reviewed changes

zachary-mcpher merged commit 93a6dab into master Aug 11, 2020

zachary-mcpher deleted the merge-dask-ml-1.6.0 branch August 11, 2020 23:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge dask ml 1.6.0#10

Merge dask ml 1.6.0#10
zachary-mcpher merged 39 commits intomasterfrom
merge-dask-ml-1.6.0

zachary-mcpher commented Aug 11, 2020

Uh oh!

jamesbsilva left a comment

Uh oh!

tian-yi-zefr left a comment

Uh oh!

zachary-mcpher commented Aug 11, 2020 •

edited

Loading

Uh oh!

tian-yi-zefr left a comment

Uh oh!

zexuan-zhou left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

zachary-mcpher commented Aug 11, 2020

Uh oh!

jamesbsilva left a comment

Choose a reason for hiding this comment

Uh oh!

tian-yi-zefr left a comment

Choose a reason for hiding this comment

Uh oh!

zachary-mcpher commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tian-yi-zefr left a comment

Choose a reason for hiding this comment

Uh oh!

zexuan-zhou left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

zachary-mcpher commented Aug 11, 2020 •

edited

Loading