Skip to content

[WIP] Add verbose option for Pipeline and Feature Union [rebased 8568]#9668

Closed
petrushev wants to merge 12 commits intoscikit-learn:masterfrom
petrushev:pipeline-verbose-rebased
Closed

[WIP] Add verbose option for Pipeline and Feature Union [rebased 8568]#9668
petrushev wants to merge 12 commits intoscikit-learn:masterfrom
petrushev:pipeline-verbose-rebased

Conversation

@petrushev
Copy link
Copy Markdown

Reference Issue

Rebases #8568
Fixes #5298, #5321

What does this implement/fix? Explain your changes.

Rebases #8568 which has the following description:

This PR adds a new argument verbose to Pipeline and FeatureUnion, which is False by default. Setting this True will print useful information during execution.

Also, it extends the make_pipeline helper to support the verbose kwarg.

The snippets in the original PR produce the expected output.

Also checked with the example feature_selection/plot_feature_selection_pipeline.py which produces a nice output:

[Pipeline] (step 1 of 2) selectkbest ........................ 0.00057s
[Pipeline] (step 2 of 2) svc ................................ 0.00067s
[Pipeline] Total time elapsed: .............................. 0.00123s

@petrushev
Copy link
Copy Markdown
Author

The whats new was not rebased correctly, will fix it.

print('%s%s%s' % (name, '.' * (70 - len(name + elapsed)), elapsed))


class _BasePipeline(six.with_metaclass(ABCMeta, _BaseComposition)):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it has been badly merged. This class has been removed in favor of _BaseComposition.

# License: BSD

from collections import defaultdict
from abc import ABCMeta, abstractmethod
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed any more with _BaseComposition



class FeatureUnion(_BaseComposition, TransformerMixin):
class FeatureUnion(_BasePipeline, TransformerMixin):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also be subclassed from BaseComposition


def __init__(self, transformer_list, n_jobs=1, transformer_weights=None,
verbose=False):
self.transformer_list = tosequence(transformer_list)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be list as previously stated

def _update_transformer_list(self, transformers):
transformers = iter(transformers)
self.transformer_list[:] = [
self.transformer_list = tosequence([
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check master

@petrushev petrushev force-pushed the pipeline-verbose-rebased branch from 916663e to 3938adc Compare September 1, 2017 12:56
@glemaitre
Copy link
Copy Markdown
Member

@petrushev Could you solve the PEP8 issue:
https://travis-ci.org/scikit-learn/scikit-learn/jobs/270816940

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman this PR has been resurrect and it seems to have been extensively reviewed previously.

Having a quick look, I was thinking that a decorator could be used in multiple places instead of repeating the same line around the processing. WDYT?


.. _Neeraj Gangwar: http://neerajgangwar.in
.. _Arthur Mensch: https://amensch.fr

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this blank line.

- Added optional parameter ``verbose`` in functions `pipeline.make_pipeline`
and `pipeline.make_union` to extend the same functionality as the
corresponding classes. :issue:`9668` by
:user:`Baze Petrushev <petrushev>`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you could also add @karandesai-96 in the authorship since that it did a good part of the job as well :)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @glemaitre and @petrushev
Current What's New file on this branch has these two lines:

Miscellaneous

- Added optional parameter ``verbose`` in :class:`pipeline.Pipeline` and
  :class:`pipeline.FeatureUnion` for showing progress and timing of each
     step. :issue:`8568` by :user:`Karan Desai <karandesai-96>`.

- Added optional parameter ``verbose`` in functions `pipeline.make_pipeline`
  and `pipeline.make_union` to extend the same functionality as the
  corresponding classes. :issue:`9668` by
  :user:`Baze Petrushev <petrushev>`.

I think they should be unified outlining the feature, issue, PR and contributors.

from .externals.joblib import Parallel, delayed
from .externals import six
from .utils.metaestimators import if_delegate_has_method
from .utils.metaestimators import if_delegate_has_method, _BaseComposition
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert to the version of master with line 20 and line 24. It is equivalent to master in bring 2 diffs.


def _fit_one_transformer(transformer, X, y):
return transformer.fit(X, y)
def _fit_one_transformer(trans, X, y, verbose=False, idx=None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace trans by transformer in all this function

total_steps=None, name=None):
# idx, total_steps and name are not required when verbosity is disabled
step_start_time = time.time()
trans = trans.fit(X, y)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trans -> transformer

return res, transformer
return res * weight, transformer
return res, trans
return res * weight, trans
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trans -> transformer

This estimator
"""
self._validate_transformers()
all_transformers = [(name, trans, weight) for name, trans, weight in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff seems weird to me. It is not in master. I would be checking that this is not from the previous version before _BaseComposition

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this was done, along with the change in _fit_one_transform, for the benefit of the verbosity, so not in master.
Here 's the commit in the original PR.

sum of n_components (output dimension) over transformers.
"""
self._validate_transformers()
all_transformers = [(name, trans, weight) for name, trans, weight in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as the previous statement


class NoInvTransf(NoTrans):
def transform(self, X):
def transform(self, X, y=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove y=None. This is from the previous transformer version


class Transf(NoInvTransf):
def transform(self, X):
def transform(self, X, y=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove y=None

@kdexd
Copy link
Copy Markdown

kdexd commented Sep 3, 2017

Hi @petrushev, thanks for taking forward my PR.
@jnothman: I was amidst a lot of segfaults and failing tests of my codebase since past few weeks due to GSoC. I planned to work the remaining part of my PR after final evaluation, but it was hard to pull myself out. Nevertheless, I'm glad that this unmerged feature is approaching completion!

@petrushev petrushev force-pushed the pipeline-verbose-rebased branch from 28740b6 to efb4aac Compare September 3, 2017 21:06
@petrushev petrushev changed the title Add verbose option for Pipeline and Feature Union [rebased 8568] [WIP] Add verbose option for Pipeline and Feature Union [rebased 8568] Sep 3, 2017
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Sep 4, 2017

Thanks for taking this on, @petrushev.

As stated in the previous PR, I had hoped we could share the logging logic with sklearn/model_selection/_validation.py's _fit_and_score by putting them in a sklearn/utils/ helper function. In particular, I thought we could share the formatting features, including the use of logger.short_format_time. Formatting times as %0.5f seems particularly weird, because I don't think 10 microseconds is a meaningful precision to report. It's this kind of discrepancy that makes me uncomfortable about reinventing the wheel, preferring to share the code.

@petrushev
Copy link
Copy Markdown
Author

@jnothman Sure, that seems like the way to go. I'll see what I can do about it.

@petrushev petrushev force-pushed the pipeline-verbose-rebased branch from 39a0736 to 2e5b6cc Compare September 5, 2017 22:30
@petrushev
Copy link
Copy Markdown
Author

@jnothman I wonder if this is something like you had in mind? The _fit_and_score is refactored in both _validation and cross_validation.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Sep 5, 2017 via email

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Sep 5, 2017 via email

@petrushev
Copy link
Copy Markdown
Author

of course, i just wanted to make sure i'm on the right track

@amueller
Copy link
Copy Markdown
Member

are you still working on this?

@petrushev
Copy link
Copy Markdown
Author

@amueller unfortunately i can't really promise i'll commit myself to it any time soon :/

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Oct 25, 2017 via email

@jnothman jnothman added Easy Well-defined and straightforward way to resolve help wanted Stalled labels Oct 25, 2017
@amueller
Copy link
Copy Markdown
Member

@petrushev now worries and thanks for an honest assessment :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Easy Well-defined and straightforward way to resolve help wanted Stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Verbose option on Pipeline

5 participants