Support for token processor. Fixes #1156#1537
Support for token processor. Fixes #1156#1537wrichert wants to merge 29 commits intoscikit-learn:masterfrom wrichert:token-processor
Conversation
|
In the current implementation, CountVectorizer keeps at least one copy per preprocessing step (preprocessing, tokenizing, etc.). I think we could rewrite it using generators only, which would be more memory and performance friendly. |
|
Hi @wrichert. Thanks for the PR. In general I think it looks good. |
|
Right. Will move it when I have again access to the repo. |
|
Done. |
|
@amueller Is there anything needed from my side to get this PR into the next release? |
|
Maybe a short example in form of a doctest in the narrative or docstring would be nice? |
|
@amueller Any chance that this is included in 0.13? |
|
If you find two devs to review and merge it. I won't have time, sorry. |
|
I'll try to have a look soon. Sorry, I'm still pretty busy atm. |
sklearn/feature_extraction/text.py
Outdated
There was a problem hiding this comment.
Cosmetics: please add a blank line above this line (PEP 257).
|
Could you please rebase this branch on top of master (or merge master into it if you are not familiar with rebasing). Then fix pep8 issues reported by: http://pypi.python.org/pypi/pep8 |
|
Sure. Will address your suggestions beginning of next week as I won't have time before that. |
[MRG] Precompute X_argsorted in AdaBoost
Accidentally committed in 1967a0b.
…transformers. Then fix all the regressors and transformers ... meh!
Prevents warning from doctest.
…learn into token-processor Conflicts: doc/modules/feature_extraction.rst sklearn/feature_extraction/text.py
|
Hmm, I rebased the "Support for token processor. Fixes #1537" and resolved all the conflicts, but I'm not sure whether I did the right thing, seeing lots of other commits appearing in this threads. |
|
Indeed please, start a new branch off the current master and cherry pick just the commits or files that are related to issue #1156 to make the review easier. |
No description provided.