Skip to content

DOC Update preprocessor in CountVectorizer#17413

Merged
thomasjpfan merged 1 commit intoscikit-learn:masterfrom
yagi-3:doc_CountVectorizer_preprocess
Jun 1, 2020
Merged

DOC Update preprocessor in CountVectorizer#17413
thomasjpfan merged 1 commit intoscikit-learn:masterfrom
yagi-3:doc_CountVectorizer_preprocess

Conversation

@yagi-3
Copy link
Copy Markdown
Contributor

@yagi-3 yagi-3 commented Jun 1, 2020

Reference Issues/PRs

Fixes #17348. @venkyyuvy, thank you for your finding.

What does this implement/fix?

One minor change:

  • Doc about preprocessor argument in CountVectorizer is updated to show which processes are overrode.

Any other comments?

  • To enable lowercase and strip_accents when preprocessor is not None, code modification is needed(maybe build_preprocessor function in _VectorizerMixin class).

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @yagi-3 !

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is your first time contributing, welcome @yagi-3 !

LGTM

@thomasjpfan thomasjpfan merged commit 5816817 into scikit-learn:master Jun 1, 2020
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Jun 1, 2020

I think we had a pull request a while back to warn when parameters to text vectorisers were being overridden by other parameters. Afk now I don't recall if it was merged

@reshamas
Copy link
Copy Markdown
Member

reshamas commented Jun 6, 2020

applied to #DataUmbrella sprint

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In CountVectorizer lowercase is ignored when preprocessor is not None?

5 participants