New text preprocessor API based on callable#1
New text preprocessor API based on callable#1larsmans wants to merge 222 commits intoogrisel:masterfrom
Conversation
…iscriminant_analysis
We now have the inheritance scheme: Covariance <-- ShrunkCovariance <-- LedoitWolf since LedoitWolf is a particular case of shrinkage. Should we put LedoitWolf class within the shrunk_covariance.py file?
This way we avoid (or more precisely minimize) the need to deal with partially downloaded files and the errors that arise when you Control-C a started download.
…ng positive values)
This update mainly fixes a heisen bug in Parallel's doctests.
Do not open file write file until download is complete.
* Plumb memory leaks in allocation * Don't cast return value from malloc * Remove unused variables * No more register keyword; is a no-op in modern compilers * Cosmetic changes
|
Darn, forgot to review the commit range. I also hadn't seen how far your |
|
2011/5/2 larsmans
@mblondel and @pprett are more NLP expert than I am :)
The problem with higher order functions and lambda expression is that There is some proposal to simplify the text text feature exrtractors I have checkpointed some work in progress from last WE in this branch: Olivier |
This commit includes the following list of changes: - Documentation has been enhanced and completed. - Examples have been added. - The `percentage` (float) parameter has become `step` (int or float), and indicates the number of features to remove at each iteration (int), or the percentage of features to remove (float) with respect to the original number of features. - Exactly `n_features_to_select` are now always selected. It may not always have been the case before, as too many features could have been removed at a time in the last step of the elimination. - The `ranking_` attribute is now a proper ranking of the features (i.e., best features are ranked #1). - The code of `RFECV` has been made simpler. - The `cv` argument of RFECV.fit has been moved into the constructor and is now passed through `check_cv`. - Tests.
REF: hack to be able to share distutils utilities.
…scikit-learn#7838) * initial commit for return_std * initial commit for return_std * adding tests, examples, ARD predict_std * adding tests, examples, ARD predict_std * a smidge more documentation * a smidge more documentation * Missed a few PEP8 issues * Changing predict_std to return_std #1 * Changing predict_std to return_std #2 * Changing predict_std to return_std #3 * Changing predict_std to return_std final * adding better plots via polynomial regression * trying to fix flake error * fix to ARD plotting issue * fixing some flakes * Two blank lines part 1 * Two blank lines part 2 * More newlines! * Even more newlines * adding info to the doc string for the two plot files * Rephrasing "polynomial" for Bayesian Ridge Regression * Updating "polynomia" for ARD * Adding more formal references * Another asked-for improvement to doc string. * Fixing flake8 errors * Cleaning up the tests a smidge. * A few more flakes * requested fixes from Andy * Mini bug fix * Final pep8 fix * pep8 fix round 2 * Fix beta_ to alpha_ in the comments
* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI
Hi,
I'm sending this to you because I understand you're the NLP/text processing guy in the project.
Noticing how simple preprocessor objects in
text.pyreally are, I figured we could just as well make them callables. That way, the null preprocessor is justlambda x: xand any object with a text processing method can be converted into a decorator with a reusable higher-order function/class decorator such as:Regards,
Lars