Skip to content

[MRG] DOC: shift projects missing in related_projects from the wiki page#8297

Closed
dalmia wants to merge 4 commits intoscikit-learn:masterfrom
dalmia:8266
Closed

[MRG] DOC: shift projects missing in related_projects from the wiki page#8297
dalmia wants to merge 4 commits intoscikit-learn:masterfrom
dalmia:8266

Conversation

@dalmia
Copy link
Copy Markdown
Contributor

@dalmia dalmia commented Feb 5, 2017

Reference Issue

Fixes #8266

What does this implement/fix? Explain your changes.

Links the related projects from the wiki page to related_projects.rst

Any other comments

Am linking everything that is missing as of now. However, in the issue thread it was mentioned that we need include only the important. Please provide feedback as to what all should be included.

@dalmia dalmia changed the title DOC: incorporate related projects (except gists) [WIP] DOC: incorporate related projects (except gists) Feb 5, 2017
@dalmia dalmia changed the title [WIP] DOC: incorporate related projects (except gists) [WIP] DOC: shift projects missing in related_projects from the wiki page Feb 5, 2017
@dalmia dalmia changed the title [WIP] DOC: shift projects missing in related_projects from the wiki page [MRG] DOC: shift projects missing in related_projects from the wiki page Feb 5, 2017
wrapper around scikit-learn that makes it easy to run machine learning
experiments with multiple learners and large feature sets.

- `sklearn-deap <https://github.com/rsteca/sklearn-deap>`_ Use evolutionary
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this belongs in this subsection, which is about providing API wrappers. PyMC appears to be accidentally, incorrectly, in this section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other possible choices seem - Auto-ML and Model Export for production. Might you have any preference?

- `mlxtend <https://github.com/rasbt/mlxtend>`_ Includes model visualization
utilities.

- `Fast svmlight / libsvm file loader <https://github.com/mblondel/svmlight-loader>`_
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain whether this is current, i.e. still much faster than what's in sklearn.datasets. @mblondel?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it is still significantly faster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be kept included then.

Caruana et al's Ensemble Selection algorithm in Python, based on scikit-learn

- `random-output-trees <https://github.com/arjoly/random-output-trees>`_
Multi-output random forest on randomised output space
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be clearer about what it's useful for


- `libOPF <https://github.com/LibOPF/LibOPF>`_ Optimal path forest classifier

- `pyensemble <https://github.com/dclambert/pyensemble>`_ An implementation of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is better fit in the Auto-ML section. But you would do well to check that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt this was a trade-off between the two but maybe Auto-ML might be more natural for it. I'll make the change.

K-means and mixture of von Mises Fisher clustering routines for data on the
unit hypersphere.

- `pyIPCA <https://github.com/pickle27/pyIPCA>`_ Incremental Principal
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How, if at all, does this differ from our IncrementalPCA?

Copy link
Copy Markdown
Contributor Author

@dalmia dalmia Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a modification of our IncrementalPCA like:

class CCIPCA(BaseEstimator, TransformerMixin):
    """Candid covariance-free incremental principal component analysis (CCIPCA)
    Linear dimensionality reduction using an online incremental PCA algorithm.
    CCIPCA computes the principal components incrementally without
    estimating the covariance matrix. This algorithm was designed for high
    dimensional data and converges quickly. 
    This implementation only works for dense arrays. However it should scale
    well to large data.

- `Deep Learning <http://deeplearning.net/software_links/>`_ A curated list of deep learning
software libraries.

- `glm-sklearn <https://github.com/jcrudy/glm-sklearn>`_ scikit-learn
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fits in regression and classification above

- Generating data with `non-parametric Gaussian mixture models <https://gist.github.com/2011426>`_
Useful if you need "random" data that should have non-trivial structure.

- `scikit-protopy <https://github.com/dvro/scikit-protopy>`_ scikit-learn
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This best lives with decomposition and clustering.

---------------------

The `wiki <https://github.com/scikit-learn/scikit-learn/wiki/Third-party-projects-and-code-snippets>`_ has more!
**Gists**
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these gists can't really be considered projects, and a Wiki page might best be retained for them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was skeptic about including this here too. Should we then rename the original wiki to just "Code Snippets" ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since we are going to include this heading anyways, it might be better to retain the gists in this page and save the trouble of maintaining another wiki page separately?

@dalmia
Copy link
Copy Markdown
Contributor Author

dalmia commented Feb 16, 2017

I have removed the gists as they correctly don't belong here. We should retain the original wiki page for them. If this seems fine, I'll move on to making the change there.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 16, 2017

Codecov Report

Merging #8297 into master will decrease coverage by 1.44%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8297      +/-   ##
==========================================
- Coverage   96.19%   94.75%   -1.45%     
==========================================
  Files         348      342       -6     
  Lines       64645    60809    -3836     
==========================================
- Hits        62187    57617    -4570     
- Misses       2458     3192     +734
Impacted Files Coverage Δ
sklearn/feature_extraction/tests/test_image.py 4.78% <0%> (-95.22%) ⬇️
sklearn/feature_extraction/image.py 58.16% <0%> (-41.84%) ⬇️
sklearn/utils/arpack.py 42.5% <0%> (-32.5%) ⬇️
sklearn/utils/random.py 59.29% <0%> (-32.2%) ⬇️
sklearn/datasets/tests/test_kddcup99.py 29.16% <0%> (-10.84%) ⬇️
sklearn/manifold/mds.py 84.46% <0%> (-9.71%) ⬇️
sklearn/utils/tests/test_estimator_checks.py 88.23% <0%> (-8.96%) ⬇️
sklearn/linear_model/tests/test_bayes.py 83.01% <0%> (-8.16%) ⬇️
sklearn/utils/tests/test_utils.py 92.94% <0%> (-7.06%) ⬇️
sklearn/manifold/spectral_embedding_.py 85.16% <0%> (-6.25%) ⬇️
... and 260 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e29334...ce6d7d3. Read the comment docs.

- `random-output-trees <https://github.com/arjoly/random-output-trees>`_
Randomized output tree for multilabel / multi-output regression tasks

- `fastFM <https://github.com/ibayer/fastFM>`_ Fast factorization machine
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Factorization" makes me pretty sure this belongs with decomposition, not classification/regression

- `fastFM <https://github.com/ibayer/fastFM>`_ Fast factorization machine
implementation compatible with scikit-learn

- `glm-sklearn <https://github.com/jcrudy/glm-sklearn>`_ scikit-learn
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this higher up in the list.

software libraries.

- `sklearn-deap <https://github.com/rsteca/sklearn-deap>`_ Use evolutionary
algorithms instead of gridsearch in scikit-learn.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gridsearch -> grid search.

This does not belong here. Perhaps a "model selection and evaluation" section under "other estimators and tasks"

@jnothman jnothman added Easy Well-defined and straightforward way to resolve Stalled good first issue Easy with clear instructions to resolve help wanted labels Feb 6, 2018
@cmarmo cmarmo removed the help wanted label May 5, 2020
@rth rth closed this in #17129 May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve Stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retire Third-party-projects wiki page?

4 participants