[MRG+1] ENH: dataset-fetching with use figshare and checksum#9240
[MRG+1] ENH: dataset-fetching with use figshare and checksum#9240ogrisel merged 69 commits intoscikit-learn:masterfrom
Conversation
|
Thanks for this. Much appreciated. |
|
do we want this for the release? |
|
it is labeled as such
…On Wed, Jun 28, 2017 at 5:13 PM Andreas Mueller ***@***.***> wrote:
do we want this for the release?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9240 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGt-46pAF2h0SdTHX2wImITWh28__u0lks5sIm2lgaJpZM4OIGv_>
.
|
|
I would like it if it's close enough: it avoids people complaining when less reliable servers break. |
|
Before we merge, one of the problem with logging is that it does not have the same behaviour with python 2.7 and python 3 (I was not aware of that I have to say or maybe I forgot): import logging
logger = logging.getLogger('logger')
logger.warn('warning')Python 2.7: Python 3: For this PR in particular that means that on Python 2.7 no messages will be printed about the datasets being downloaded. Here is what I propose (after a quick chat with @ogrisel):
Note logging is only used in |
sklearn/__init__.py
Outdated
| logger = logging.getLogger(__name__) | ||
| if six.PY2: | ||
| logger.addHandler(logging.StreamHandler()) | ||
| logger.level = logging.INFO |
There was a problem hiding this comment.
Maybe add a small comment to state that this ensures that we get a informative message on stdout by default when downloading a dataset on Python 2 (so as to replicate the default behavior of the logging module in Python 3).
0d0664d to
2037f34
Compare
|
is there a way to make the handler only apply to logging emitted by
scikit-learn?
…On 2 Aug 2017 6:01 pm, "Olivier Grisel" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/__init__.py
<#9240 (comment)>
:
> @@ -17,6 +17,14 @@
import warnings
import os
from contextlib import contextmanager as _contextmanager
+import logging
+
+import six
+
+logger = logging.getLogger(__name__)
+if six.PY2:
+ logger.addHandler(logging.StreamHandler())
+logger.level = logging.INFO
Maybe add a small comment to state that this ensures that we get a
informative message on stdout by default when downloading a dataset on
Python 2 (so as to replicate the default behavior of the logging module in
Python 3).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9240 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6--gjteY3_qG3NxYN9I8eVSds_nRks5sUCzugaJpZM4OIGv_>
.
|
This is already the case: logger = logging.getLogger(__name__)is only used by sklearn components ( |
I think this is going to have this behaviour as long as we use |
2037f34 to
91b7c3b
Compare
| from contextlib import contextmanager as _contextmanager | ||
| import logging | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
why not "sklearn" instead of __name__?
There was a problem hiding this comment.
I guess this is the just the general convention right?
I found this from the Python doc
A good convention to use when naming loggers is to use a module-level logger, in each module which uses logging, named as follows:
logger = logging.getLogger(__name__)
and this from the Hitchhiker's guide to Python:
Best practice when instantiating loggers in a library is to only create them using the
__name__global variable: the logging module creates a hierarchy of loggers using dot notation, so using__name__ensures no name collisions.
Move logger.warning to logger.info and logger.info to logger.debug [doc build]
91b7c3b to
6daa256
Compare
|
Not sure whether it is entirely necessary but 5d040fc may need to be backported in 0.19.X (Add download_if_missing argument to fetch_20newsgroups_vectorized) |
|
Happy to merge now and then I'll go through all the necessary backports. |
|
Merged, will do the backport to 0.19.X. |
|
Cool, though I think it'd be good to do a final pass to see if everything is backported that we want. Though an issue in scikit-optimize kinda points toward re-thinking the decision of making |
|
+1 for just release but I will have to have a look at the pickling error of scikit-optimize/scikit-optimize#461. |
|
I think one of us should compile a PR with everything from master that is
safe to go into 0.19, to be reviewed, merged and released. I'm happy to
have a go.
…On 4 Aug 2017 9:30 pm, "Olivier Grisel" ***@***.***> wrote:
+1 for just release but I will have to have a look at the pickling error
of scikit-optimize/scikit-optimize#461
<scikit-optimize/scikit-optimize#461>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9240 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-B4KwUOHyr2SZDkvOMsTzk43ekQks5sUwDrgaJpZM4OIGv_>
.
|
Reference Issue
Fixes #7425. Fixes #8089
What does this implement/fix? Explain your changes.
fetch_...) to figshare and changed their URLs accordingly.Any other comments?
This PR takes over (Fixes #7429)
cc: @nelson-liu, @jnothman