[MRG] Add verbose option for Pipeline and Feature Union.#8568
[MRG] Add verbose option for Pipeline and Feature Union.#8568kdexd wants to merge 4 commits intoscikit-learn:masterfrom kdexd:pipeline-verbose
Conversation
sklearn/pipeline.py
Outdated
| return last_step.fit(Xt, y, **fit_params).transform(Xt) | ||
| Xt = last_step.fit(Xt, y, **fit_params).transform(Xt) | ||
| if self.verbose: | ||
| self._print_final_step(final_step_start_time, time_elapsed_so_far) |
There was a problem hiding this comment.
I would have included if self.verbose condition in the method itself, to avoid repeated checks at every calls, but this way it felt semantically more appropriate - "if verbose then print final step".
Else simply the call would state "print final step" while it internally wouldn't have printed if not verbose. I hope that is fine ?
|
|
|
This is a neat work... Thanks! |
| Pipeline(..., | ||
| steps=[('reduce_dim', PCA(copy=True,...)), | ||
| ('clf', SVC(C=1.0,...))]) | ||
| ('clf', SVC(C=1.0,...))], verbose=False) |
There was a problem hiding this comment.
Since this is an irrelevant detail, maybe use ellipsis? (...)
There was a problem hiding this comment.
Well then i'll wait for others' suggestion meanwhile. I'm not sure which should be preferred either.
There was a problem hiding this comment.
If we set an analogy, transformer_weights are shown here in same documentation. They're not having any value in this particular example.
>>> estimators = [('linear_pca', PCA()), ('kernel_pca', KernelPCA())]
>>> combined = FeatureUnion(estimators)
>>> combined # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
FeatureUnion(n_jobs=1,
transformer_list=[('linear_pca', PCA(copy=True,...)),
('kernel_pca', KernelPCA(alpha=1.0,...))],
transformer_weights=None, verbose=False)We can use this as a tie breaker 😆
|
|
||
| - Added optional parameter ``verbose`` in :class:`pipeline.Pipeline` and | ||
| :class:`pipeline.FeatureUnion` for showing progress and timing of each | ||
| step. :issue:`8568` by :user:`Karan Desai <karandesai-96>`. |
There was a problem hiding this comment.
Please add your name and link to the bottom of the page :)
|
/ping: @jnothman Resolved conflicts with master which moved forward after I opened my PR. |
|
@karandesai-96 You should not merge master into your branch but rebase your branch to master. |
|
@glemaitre I used the web interface for a very small conflict resolving, tried it for the first time. Agreed with you, not impressive. Reverting, rebasing. |
|
Rebased. |
- Each line printed by Pipeline and FeatureUnion, when their verbosity mode is on, will be 70 characters long.
|
i often merge master into my branch. usually works just fine. I'll try look
at this soon
…On 10 Apr 2017 3:10 pm, "Karan Desai" ***@***.***> wrote:
Rebased.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67jDPLMIYHq-FvNnJaRYmbQzNPP6ks5rubmogaJpZM4MY6aU>
.
|
|
I cannot say that this is a good practice or the right way to do it. I
just got it while contributing in scikit-image in which I did merge and
have been forced to reset and rebase:
http://scikit-image.org/docs/0.11.x/contribute.html#divergence-between-upstream-master-and-your-feature-branch
It seems that you have the same in the guide in scikit-learn:
http://scikit-learn.org/stable/developers/contributing.html#how-to-contribute
From the top of the head, I was thinking that the reasons were: merging
make the history difficult to read but rebasing rewrite the history.
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
|
|
I think Github deals with merging in master better than it used to. I no
longer recommend rebases: that requires resolving conflicts on every commit
rather than just between the heads (and squash-merging the final PR can
handle whatever you did along the way as it'll just diff the final heads as
long as they are merge-safe). It also messes up historical commit
references and requires force pushes and other Dangerous Ideas.
…On 12 April 2017 at 21:29, Guillaume Lemaitre ***@***.***> wrote:
I cannot say that this is a good practice or the right way to do it. I
just got it while contributing in scikit-image in which I did merge and
have been forced to reset and rebase:
http://scikit-image.org/docs/0.11.x/contribute.html#
divergence-between-upstream-master-and-your-feature-branch
It seems that you have the same in the guide in scikit-learn:
http://scikit-learn.org/stable/developers/contributing.html#how-to-
contribute
From the top of the head, I was thinking that the reasons were: merging
make the history difficult to read but rebasing rewrite the history.
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wjMpKHgN5NOr0o9iujLHWDvRSa6ks5rvLW0gaJpZM4MY6aU>
.
|
|
I completely agree with the issue linked to resolving every conflicts,
especially when resurrecting old PRs.
In fact, I personally do not have a strong opinion regarding the policy
to follow. Merge has only 1 con regarding the readability which is
partially solved when we are actually squashing/merge at the end.
Do you think that we should change the contributing guide in this regard
if there is consensus behind a only merging policy?
…--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
|
|
I'd be in favour of updating the docs, yes
…On 12 Apr 2017 10:31 pm, "Guillaume Lemaitre" ***@***.***> wrote:
I completely agree with the issue linked to resolving every conflicts,
especially when resurrecting old PRs.
In fact, I personally do not have a strong opinion regarding the policy
to follow. Merge has only 1 con regarding the readability which is
partially solved when we are actually squashing/merge at the end.
Do you think that we should change the contributing guide in this regard
if there is consensus behind a only merging policy?
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wiC4PMDDhZmw4c8nRaBj9sEilLdks5rvMQWgaJpZM4MY6aU>
.
|
|
|
||
| Parameters | ||
| ---------- | ||
| step_info : str |
There was a problem hiding this comment.
Why is this preferred over description, elapsed or something else that further ensures consistency?
I would like to see logger.short_format_time(elapsed) used as in grid search, and this code refactored into utils with that.
There was a problem hiding this comment.
What's the code in grid search you're talking about? The only logging is within joblib, right?
|
could you please resolve the conflicts? |
amueller
left a comment
There was a problem hiding this comment.
I think this would be an awesome feature to have, but it should really live in a logger or something. Right now it's cluttering up the already complicated pipeline code.
|
Hey @amueller, sorry I was away for a while I couldn't respond. I agree with your review. I will resolve the conflicts meanwhile I'll look into a more sane way for logging as you say. |
|
The verbosity from grid search lives in model_selection/_validate.py I
think.
…On 22 Jul 2017 2:00 pm, "Karan Desai" ***@***.***> wrote:
Hey @amueller <https://github.com/amueller>, sorry I was away for a while
I couldn't respond. I agree with your review. I will resolve the conflicts
meanwhile I'll look into a more sane way for logging as you say.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61rnEjFevRB91GKuw2gr5qQcQuAYks5sQXPXgaJpZM4MY6aU>
.
|
|
Indeed, and it uses joblib.logger |
|
I'd like to see this progress... Are you continuing to work on it @karandesai-96 |
|
Verbosity in pipeline very helpful feature, especially for debugging. Any plans to merge this PR ? |
|
@EBazarov as soon as there is a reasonable implementation. |
Reference Issue
Fixes #5298, #5321
What does this implement/fix? Explain your changes.
This PR adds a new argument
verbosetoPipelineandFeatureUnion, which is False by default. Setting thisTruewill print useful information during execution. Better to be explained by a code snippet:Common code snippet (imports and data loading):
Pipeline Snippet:
Expected Output:
FeatureUnion Snippet:
Expected Output:
Any other comments?
I had open PRs else where, so it took me a while to get back to this. Thanks @jnothman for mentioning me on issue thread.