New text preprocessor API based on callable by larsmans · Pull Request #1 · ogrisel/scikit-learn

larsmans · 2011-05-02T21:09:05Z

Hi,

I'm sending this to you because I understand you're the NLP/text processing guy in the project.

Noticing how simple preprocessor objects in text.py really are, I figured we could just as well make them callables. That way, the null preprocessor is just lambda x: x and any object with a text processing method can be converted into a decorator with a reusable higher-order function/class decorator such as:

def make_callable(cls, method):
    cls.__call__ = lambda self, x: getattr(self, method)(x)
    return cls

Regards,
Lars

…iscriminant_analysis

We now have the inheritance scheme: Covariance <-- ShrunkCovariance <-- LedoitWolf since LedoitWolf is a particular case of shrinkage. Should we put LedoitWolf class within the shrunk_covariance.py file?

…into kpca

…earn

This way we avoid (or more precisely minimize) the need to deal with partially downloaded files and the errors that arise when you Control-C a started download.

…ng positive values)

This update mainly fixes a heisen bug in Parallel's doctests.

Do not open file write file until download is complete.

* Plumb memory leaks in allocation * Don't cast return value from malloc * Remove unused variables * No more register keyword; is a no-op in modern compilers * Cosmetic changes

larsmans · 2011-05-02T21:11:23Z

Darn, forgot to review the commit range. I also hadn't seen how far your text-features branch diverges from scikits-learn:master, so never mind.

ogrisel · 2011-05-02T22:32:04Z

2011/5/2 larsmans
reply@reply.github.com:

Hi,

I'm sending this to you because I understand you're the NLP/text processing guy in the project.

@mblondel and @pprett are more NLP expert than I am :)

Noticing how simple preprocessor objects in text.py really are, I figured we could just as well make them callables. That way, the null preprocessor is just lambda x: x and any object with a text processing method can be converted into a decorator with a reusable higher-order function/class decorator such as:

def make_callable(cls, method):
cls.call = lambda self, x: getattr(self, method)(x)
return cls

The problem with higher order functions and lambda expression is that
they are not picklable which is a requirement (for instance to be able
to train on a machine and predict on another, or use multiprocessing
to perform grid search of hyperparameters in //).

There is some proposal to simplify the text text feature exrtractors
though: https://github.com/scikit-learn/scikit-learn/issues#issue/37

I have checkpointed some work in progress from last WE in this branch:
https://github.com/ogrisel/scikit-learn/commits/text-features (it's
unfinished, the tests won't pass and it's unusable in the current
state).

Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

pep8 fixes

Jakevdp manifold

Minor stuff for DBSCAN

This commit includes the following list of changes: - Documentation has been enhanced and completed. - Examples have been added. - The `percentage` (float) parameter has become `step` (int or float), and indicates the number of features to remove at each iteration (int), or the percentage of features to remove (float) with respect to the original number of features. - Exactly `n_features_to_select` are now always selected. It may not always have been the case before, as too many features could have been removed at a time in the last step of the elimination. - The `ranking_` attribute is now a proper ranking of the features (i.e., best features are ranked #1). - The code of `RFECV` has been made simpler. - The `cv` argument of RFECV.fit has been moved into the constructor and is now passed through `check_cv`. - Tests.

Hmmc reformat

Pos coeff

REF: hack to be able to share distutils utilities.

…ad dataset)

Pulling from upstream

…scikit-learn#7838) * initial commit for return_std * initial commit for return_std * adding tests, examples, ARD predict_std * adding tests, examples, ARD predict_std * a smidge more documentation * a smidge more documentation * Missed a few PEP8 issues * Changing predict_std to return_std #1 * Changing predict_std to return_std #2 * Changing predict_std to return_std #3 * Changing predict_std to return_std final * adding better plots via polynomial regression * trying to fix flake error * fix to ARD plotting issue * fixing some flakes * Two blank lines part 1 * Two blank lines part 2 * More newlines! * Even more newlines * adding info to the doc string for the two plot files * Rephrasing "polynomial" for Bayesian Ridge Regression * Updating "polynomia" for ARD * Adding more formal references * Another asked-for improvement to doc string. * Fixing flake8 errors * Cleaning up the tests a smidge. * A few more flakes * requested fixes from Andy * Mini bug fix * Final pep8 fix * pep8 fix round 2 * Fix beta_ to alpha_ in the comments

* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI

mblondel and others added 30 commits March 11, 2011 21:58

Implement transform in LDA.

11edf68

Add LDA to plot_pca.py and rename to plot_pca_vs_lda.py.

d244e49

Proper implementation of predict_log_proba in LDA.

cc921d8

Merge branch 'lda' of https://github.com/mblondel/scikit-learn into d…

82185f1

…iscriminant_analysis

re-add self.means_

4bab789

Add linear_kernel, polynomial_kernel and rbf_kernel.

2e19d22

Small optimizations for polynomial_kernel and rbf_kernel.

80d50d8

Add KernelCenterer.

4c54e87

Add KernelPCA.

ca851cb

Add kernel PCA example.

ac8bf20

Merge branch 'master' into kpca

344c782

Add KernelPCA documentation.

cceac31

Add test for precomputed kernel.

e9a3722

Optim in polynomial_kernel.

72d4317

Efficient fit_transform in PCA.

af84a63

Tweaked plot aspect ratio

28997f1

Refactoring of the covariance estimators modules.

b9a4154

We now have the inheritance scheme: Covariance <-- ShrunkCovariance <-- LedoitWolf since LedoitWolf is a particular case of shrinkage. Should we put LedoitWolf class within the shrunk_covariance.py file?

Merge branch 'mblondel-kpca' of https://github.com/vene/scikit-learn …

61a39cd

…into kpca

Cosmit.

9556b55

Use TransformerMixin in KernelPCA.

cac1865

tiny updates on lda (checks, numerical stability)

4c03d21

OAS estimator of covariance + new example.

fbe83f8

Added KPCA citation.

677f255

Fixed non-ascii characters

22a251b

Change PCA test to fit just once

416a4a2

Capitalized Gram, added y=None in fit, pep8 test.

bae0826

pep8 in plot_kpca

9913182

Attributes renamed and documented.

ab506f1

ENH: factorize some plot code in face recognition example

4ecd8e9

Began work on decompositions package.

44b6890

Fabian Pedregosa and others added 14 commits April 27, 2011 09:06

Merge branch 'covariance' of git://github.com/VirgileFritsch/scikit-l…

762d270

…earn

Cosmetic changes in covariance.

f3f7b47

DOC: add low-level methods from libsvm.

a048079

FIX: fix rename of grid_scores_

7b70505

Do not open file write file until download is complete.

44f3d70

This way we avoid (or more precisely minimize) the need to deal with partially downloaded files and the errors that arise when you Control-C a started download.

FIX: make normalizer use the real l1 norm on each row (without assumi…

c674b6c

…ng positive values)

Add tests for libsvm.cross_validation.

39149ea

ENH: Update joblib

c88b5bc

This update mainly fixes a heisen bug in Parallel's doctests.

Merged pull request scikit-learn#140 from fabianp/lfw.

adcbdfd

Do not open file write file until download is complete.

DOC: typo in line-prof package name

266a6e4

FIX: broken import in bench_plot_nmf

c5d20ba

DOC: fix doctests to make them work with numpy 1.5 and olderw

b43b502

Cleanup lib{linear,svm} C helper routines

1409e01

* Plumb memory leaks in allocation * Don't cast return value from malloc * Remove unused variables * No more register keyword; is a no-op in modern compilers * Cosmetic changes

New text preprocessor API based on callable

743b337

larsmans closed this May 2, 2011

ogrisel pushed a commit that referenced this pull request May 30, 2011

Merge pull request #1 from ogrisel/pberkes-mldata

4ec4a02

pep8 fixes

ogrisel pushed a commit that referenced this pull request Jul 16, 2011

Merge pull request #1 from ogrisel/jakevdp-manifold

cd694e7

Jakevdp manifold

ogrisel pushed a commit that referenced this pull request Sep 13, 2011

Merge pull request #1 from larsmans/dbscan

0d9a3fc

Minor stuff for DBSCAN

ogrisel pushed a commit that referenced this pull request Dec 15, 2011

Merge pull request #1 from ogrisel/glouppe-ensemble-rebased

c3cd700

ogrisel pushed a commit that referenced this pull request Mar 16, 2012

Merge pull request #1 from agramfort/hmmc

7e70d19

Hmmc reformat

ogrisel pushed a commit that referenced this pull request Jul 5, 2012

Merge pull request #1 from agramfort/posCoeff

3613180

Pos coeff

ogrisel pushed a commit that referenced this pull request Sep 28, 2012

Merge pull request #1 from cournape/linking_arrayfuncs

bb73ff8

REF: hack to be able to share distutils utilities.

ogrisel pushed a commit that referenced this pull request Jul 19, 2013

revision round #1 (move to examples/applications, 1 file, auto-downlo…

7b46a4f

…ad dataset)

ogrisel pushed a commit that referenced this pull request Jul 19, 2013

revision round #1 (move to examples/applications, 1 file, auto-downlo…

89e6b0d

…ad dataset)

ogrisel pushed a commit that referenced this pull request May 7, 2015

Merge pull request #1 from scikit-learn/master

7be35c8

Pulling from upstream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New text preprocessor API based on callable#1

New text preprocessor API based on callable#1
larsmans wants to merge 222 commits intoogrisel:masterfrom
larsmans:text-processor-api

larsmans commented May 2, 2011

Uh oh!

larsmans commented May 2, 2011

Uh oh!

ogrisel commented May 2, 2011

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

larsmans commented May 2, 2011

Uh oh!

larsmans commented May 2, 2011

Uh oh!

ogrisel commented May 2, 2011

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants