Skip to content

New text preprocessor API based on callable#1

Closed
larsmans wants to merge 222 commits intoogrisel:masterfrom
larsmans:text-processor-api
Closed

New text preprocessor API based on callable#1
larsmans wants to merge 222 commits intoogrisel:masterfrom
larsmans:text-processor-api

Conversation

@larsmans
Copy link
Copy Markdown
Collaborator

@larsmans larsmans commented May 2, 2011

Hi,

I'm sending this to you because I understand you're the NLP/text processing guy in the project.

Noticing how simple preprocessor objects in text.py really are, I figured we could just as well make them callables. That way, the null preprocessor is just lambda x: x and any object with a text processing method can be converted into a decorator with a reusable higher-order function/class decorator such as:

def make_callable(cls, method):
    cls.__call__ = lambda self, x: getattr(self, method)(x)
    return cls

Regards,
Lars

mblondel and others added 30 commits March 11, 2011 21:58
We now have the inheritance scheme:
Covariance <-- ShrunkCovariance <-- LedoitWolf
since LedoitWolf is a particular case of shrinkage.

Should we put LedoitWolf class within the shrunk_covariance.py file?
Fabian Pedregosa and others added 14 commits April 27, 2011 09:06
This way we avoid (or more precisely minimize) the need to deal with
partially downloaded files and the errors that arise when you
Control-C a started download.
This update mainly fixes a heisen bug in Parallel's doctests.
Do not open file write file until download is complete.
* Plumb memory leaks in allocation
* Don't cast return value from malloc
* Remove unused variables
* No more register keyword; is a no-op in modern compilers
* Cosmetic changes
@larsmans
Copy link
Copy Markdown
Collaborator Author

larsmans commented May 2, 2011

Darn, forgot to review the commit range. I also hadn't seen how far your text-features branch diverges from scikits-learn:master, so never mind.

@larsmans larsmans closed this May 2, 2011
@ogrisel
Copy link
Copy Markdown
Owner

ogrisel commented May 2, 2011

2011/5/2 larsmans
reply@reply.github.com:

Hi,

I'm sending this to you because I understand you're the NLP/text processing guy in the project.

@mblondel and @pprett are more NLP expert than I am :)

Noticing how simple preprocessor objects in text.py really are, I figured we could just as well make them callables. That way, the null preprocessor is just lambda x: x and any object with a text processing method can be converted into a decorator with a reusable higher-order function/class decorator such as:

   def make_callable(cls, method):
       cls.call = lambda self, x: getattr(self, method)(x)
       return cls

The problem with higher order functions and lambda expression is that
they are not picklable which is a requirement (for instance to be able
to train on a machine and predict on another, or use multiprocessing
to perform grid search of hyperparameters in //).

There is some proposal to simplify the text text feature exrtractors
though: https://github.com/scikit-learn/scikit-learn/issues#issue/37

I have checkpointed some work in progress from last WE in this branch:
https://github.com/ogrisel/scikit-learn/commits/text-features (it's
unfinished, the tests won't pass and it's unusable in the current
state).

Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

ogrisel pushed a commit that referenced this pull request May 30, 2011
ogrisel pushed a commit that referenced this pull request Jul 16, 2011
ogrisel pushed a commit that referenced this pull request Sep 13, 2011
ogrisel pushed a commit that referenced this pull request Sep 13, 2011
This commit includes the following list of changes:
- Documentation has been enhanced and completed.
- Examples have been added.
- The `percentage` (float) parameter has become `step` (int or float), and indicates the number of features to remove at each iteration (int), or the percentage of features to remove (float) with respect to the original number of features.
- Exactly `n_features_to_select` are now always selected. It may not always have been the case before, as too many features could have been removed at a time in the last step of the elimination.
- The `ranking_` attribute is now a proper ranking of the features (i.e., best features are ranked #1).
- The code of `RFECV` has been made simpler.
- The `cv` argument of RFECV.fit has been moved into the constructor and is now passed through `check_cv`.
- Tests.
ogrisel pushed a commit that referenced this pull request Mar 16, 2012
ogrisel pushed a commit that referenced this pull request Jul 5, 2012
ogrisel pushed a commit that referenced this pull request Sep 28, 2012
REF: hack to be able to share distutils utilities.
ogrisel pushed a commit that referenced this pull request Jul 19, 2013
ogrisel pushed a commit that referenced this pull request Jul 19, 2013
ogrisel pushed a commit that referenced this pull request May 7, 2015
ogrisel pushed a commit that referenced this pull request Dec 26, 2016
…scikit-learn#7838)

* initial commit for return_std

* initial commit for return_std

* adding tests, examples, ARD predict_std

* adding tests, examples, ARD predict_std

* a smidge more documentation

* a smidge more documentation

* Missed a few PEP8 issues

* Changing predict_std to return_std #1

* Changing predict_std to return_std #2

* Changing predict_std to return_std #3

* Changing predict_std to return_std final

* adding better plots via polynomial regression

* trying to fix flake error

* fix to ARD plotting issue

* fixing some flakes

* Two blank lines part 1

* Two blank lines part 2

* More newlines!

* Even more newlines

* adding info to the doc string for the two plot files

* Rephrasing "polynomial" for Bayesian Ridge Regression

* Updating "polynomia" for ARD

* Adding more formal references

* Another asked-for improvement to doc string.

* Fixing flake8 errors

* Cleaning up the tests a smidge.

* A few more flakes

* requested fixes from Andy

* Mini bug fix

* Final pep8 fix

* pep8 fix round 2

* Fix beta_ to alpha_ in the comments
ogrisel pushed a commit that referenced this pull request Jul 20, 2018
* Add averaging option to AMI and NMI

Leave current behavior unchanged

* Flake8 fixes

* Incorporate tests of means for AMI and NMI

* Add note about `average_method` in NMI

* Update docs from AMI, NMI changes (#1)

* Correct the NMI and AMI descriptions in docs

* Update docstrings due to averaging changes

- V-measure
- Homogeneity
- Completeness
- NMI
- AMI

* Update documentation and remove nose tests (#2)

* Update v0.20.rst

* Update test_supervised.py

* Update clustering.rst

* Fix multiple spaces after operator

* Rename all arguments

* No more arbitrary values!

* Improve handling of floating-point imprecision

* Clearly state when the change occurs

* Update AMI/NMI docs

* Update v0.20.rst

* Catch FutureWarnings in AMI and NMI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.