-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
This issue might become either a doc-improvement one, or possibly lead to some non-backwards-compatible change in defaults.
Summary, TL;DR
Dask distributed (and specifically client-side) logging should behave roughly according to Configuring Logging for a Library, avoiding application-level logging-configuration decisions. The default/absent logging configuration does not do this and configures a custom (flat, non-hierarchical) logging structure with specific Handlers. There is an option to use the newer logging dictConfig schema, but this brings other considerations.
Detail
I have an application which loads a number of libraries, including "foo", and the Dask distributed Client. The expectation I can easily configure application logging externally, including logging from foo and (Dask) distributed and any other library. Providing they are all good library logging citizens then this should be easy and not require special knowledge. In actuality I need to treat the Dask distributed client library specifically, and have special knowledge of its logging-configuration, so it can't be treated globally as just another library.
Imagine we are running in an application's config.py - or we might be configuring logging in a notebook via ipython_kernel_config.py. The following could also be transposed into the logging module YAML schema. Testing in daskdev/dask:latest (docker run -it --rm daskdev/dask python:
>>> import logging
>>> from logging import StreamHandler, Formatter
We want all loggers to have a standard format, so we can change the formatter on the root logger's handler:
>>> root_handler = logging.StreamHandler()
>>> root_handler.setFormatter(Formatter("[%(levelname)1.1s %(asctime)s.%(msecs).03d %(name)s %(module)s:%(lineno)d] %(message)s"))
>>> logging.root.handlers = [root_handler]
Test what the foo logger output will look like:
>>> foo_logger = logging.getLogger("foo")
>>> foo_logger.warning("foo warning")
[W 2019-05-05 22:53:43,628.628 foo <stdin>:1] foo warning
All good so far. Check Dask distributed logging:
>>> import distributed
>>> distributed_logger = logging.getLogger("distributed")
>>> distributed_logger.warning("distributed warning")
distributed - WARNING - distributed warning
Not the format we are after. At this point we check distributed log config/init code and also check:
>>> distributed_logger.handlers
[<StreamHandler <stderr> (NOTSET)>]
>>> distributed_logger.propagate
False
>>> distributed_logger.handlers[0] == logging.root.handlers[0]
False
So the distributed logger is a pseudo-root logger and its handlers and formatters cannot be configured globally if-desired via the root logger.
>>> distributed_logger.handlers = []
>>> distributed_logger.propagate = True
>>> distributed_logger.warning("distributed warning")
[W 2019-05-05 23:02:09,853.853 distributed <stdin>:1] distributed warning
Additional analysis
Going further, you notice that using the old/simple/default logging-config mechanism, you don't actually have a hierarchy of loggers at all - it's been flattened using propagate = False and local Handlers on any pre-configured or default loggers:
>>> distributed_client_logger = logging.getLogger("distributed.client")
>>> distributed_client_logger.warning("distributed.client warning")
distributed.client - WARNING - distributed.client warning
... still has the old format.
>>> distributed_client_logger.handlers = []
>>> distributed_client_logger.propagate = True
>>> distributed_client_logger.warning("distributed.client warning")
[W 2019-05-05 23:11:18,901.901 distributed.client <stdin>:1] distributed.client warning
... to fix.
Things become more complicated when running some non-standard logging. A real-world example is when running IPython notebooks in a Docker container. The logging here is complex - the kernel process root logger is actually outputting to the running notebook (so via Tornado into the browser). If I want certain libraries to do Docker-style logging to stderr, I need to do some work on the root logger and its handlers. This works for all libraries, but then requires additional work for Dask distributed client, since it assumes that its default StreamHandler should go to stderr.
What about logging schema-based logging?
It is possible to switch to the newer logging schema-based logging, by adding a version = 1 key to the Dask logging config. However:
- This forces me to configure Dask client logging specifically, rather than just being able to configure logging globally
- I've not tested this yet, but I suspect that a key
disable_existing_loggers = Falsewill be required in order to prevent Dask client logging configuration from obliterating any/all other application logging configuration that preceded it.
Possible changes
One or more of:
- Entirely deprecate and later remove the old/simple/default logging-configuration method.
- Remove the recommendation for the simple logging system, replace with schema-based.
- Schema-based logging could by-default if-not-specified act like
disable_existing_loggers = False. - Document the above issues, considerations and workarounds.
Related issues/PRs
- distributed hijacks the logger #618 distributed hijacks the logger and 08c6c67 is the change where the loggers were flattened and provided individual
Handlers. - logging configuration turns off existing loggers #1381 logging configuration turns off existing loggers