[MRG] tfidfvectorizer documentation#12204
[MRG] tfidfvectorizer documentation#12204blooraspberry wants to merge 2 commits intoscikit-learn:masterfrom
Conversation
sklearn/feature_extraction/text.py
Outdated
|
|
||
| CountVectorizer converts a collection of text documents to a matrix of token counts. | ||
|
|
||
| TfidfTransformer then converts the count matrix from CountVectorizer to a normalized tf-idf representation. Tf is term frequency, and idf is inverse document frequency. This is a common way to calculate the count of a word relative to the appearance of a ducument. |
There was a problem hiding this comment.
Can you make sure to break the lines please?
| TfidfTransformer then converts the count matrix from CountVectorizer to a normalized tf-idf representation. Tf is term frequency, and idf is inverse document frequency. This is a common way to calculate the count of a word relative to the appearance of a ducument. | ||
|
|
||
| The formula that is used to compute the tf-idf of term t is | ||
| tf-idf(d, t) = tf(t) * idf(d, t), and the idf is computed as |
There was a problem hiding this comment.
I think the other documentation has some formatting for these, can you make sure to copy the code, not the rendering?
|
Can you please reference the issues and PRs this is addressing in the description? Then merging this will close these. |
There was a problem hiding this comment.
@blooraspberry I've restarted Travis for you and there're flake8 errors. Please correct them according to https://travis-ci.org/scikit-learn/scikit-learn/jobs/435054659
|
Travis output is unreadable, so here is what needs to be fixed in text.py: |
|
Hello @blooraspberry , Thank you for participating in the WiMLDS/scikit sprint. We would love to merge all the PRs that were submitted. It would be great if you could follow up on the work that you started! For the PR you submitted, would you please update and re-submit? Please include #wimlds in your PR conversation. Any questions:
cc: @reshamas |
|
Hi Sergul,
Sorry I just saw this email -- didn't realize my github is connected to
another email account. I'll take a look soon.
Sharon
…On Sun, Nov 11, 2018 at 11:29 AM Sergul Aydore ***@***.***> wrote:
Hello @blooraspberry <https://github.com/blooraspberry> ,
Thank you for participating in the WiMLDS/scikit sprint. We would love to
merge all the PRs that were submitted. It would be great if you could
follow up on the work that you started! For the PR you submitted, would you
please update and re-submit? Please include #wimlds in your PR conversation.
Any questions:
- see workflow
<https://github.com/WiMLDS/nyc-2018-scikit-sprint/blob/master/2_contributing_workflow.md>
for reference
- ask on this PR conversation or the issue tracker
- ask on wimlds gitter <https://gitter.im/scikit-learn/wimlds> with a
reference to this PR
cc: @reshamas <https://github.com/reshamas>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12204 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQRDPCZ6Lbg7xfuDOTEPYeAq2fTiWU0_ks5uuFB8gaJpZM4XAlZG>
.
|
|
@blooraspberry |
|
I am working on this PR. |
closes #6766 and closes #9369
Reference Issues/PRs
What does this implement/fix? Explain your changes.
This adds more information in the TfidfVectorizer documentation. It now includes comments about CountVectorizer and TfidfTransformer.
Any other comments?