Skip to content

Old/simple/default logging config is opaque and hard to externally-configure #2659

@javabrett

Description

@javabrett

This issue might become either a doc-improvement one, or possibly lead to some non-backwards-compatible change in defaults.

Summary, TL;DR

Dask distributed (and specifically client-side) logging should behave roughly according to Configuring Logging for a Library, avoiding application-level logging-configuration decisions. The default/absent logging configuration does not do this and configures a custom (flat, non-hierarchical) logging structure with specific Handlers. There is an option to use the newer logging dictConfig schema, but this brings other considerations.

Detail

I have an application which loads a number of libraries, including "foo", and the Dask distributed Client. The expectation I can easily configure application logging externally, including logging from foo and (Dask) distributed and any other library. Providing they are all good library logging citizens then this should be easy and not require special knowledge. In actuality I need to treat the Dask distributed client library specifically, and have special knowledge of its logging-configuration, so it can't be treated globally as just another library.

Imagine we are running in an application's config.py - or we might be configuring logging in a notebook via ipython_kernel_config.py. The following could also be transposed into the logging module YAML schema. Testing in daskdev/dask:latest (docker run -it --rm daskdev/dask python:

>>> import logging
>>> from logging import StreamHandler, Formatter

We want all loggers to have a standard format, so we can change the formatter on the root logger's handler:

>>> root_handler = logging.StreamHandler()
>>> root_handler.setFormatter(Formatter("[%(levelname)1.1s %(asctime)s.%(msecs).03d %(name)s %(module)s:%(lineno)d] %(message)s"))
>>> logging.root.handlers = [root_handler]

Test what the foo logger output will look like:

>>> foo_logger = logging.getLogger("foo")
>>> foo_logger.warning("foo warning")
[W 2019-05-05 22:53:43,628.628 foo <stdin>:1] foo warning

All good so far. Check Dask distributed logging:

>>> import distributed
>>> distributed_logger = logging.getLogger("distributed")
>>> distributed_logger.warning("distributed warning")
distributed - WARNING - distributed warning

Not the format we are after. At this point we check distributed log config/init code and also check:

>>> distributed_logger.handlers
[<StreamHandler <stderr> (NOTSET)>]
>>> distributed_logger.propagate
False
>>> distributed_logger.handlers[0] == logging.root.handlers[0]
False

So the distributed logger is a pseudo-root logger and its handlers and formatters cannot be configured globally if-desired via the root logger.

>>> distributed_logger.handlers = []
>>> distributed_logger.propagate = True
>>> distributed_logger.warning("distributed warning")
[W 2019-05-05 23:02:09,853.853 distributed <stdin>:1] distributed warning

Additional analysis

Going further, you notice that using the old/simple/default logging-config mechanism, you don't actually have a hierarchy of loggers at all - it's been flattened using propagate = False and local Handlers on any pre-configured or default loggers:

>>> distributed_client_logger = logging.getLogger("distributed.client")
>>> distributed_client_logger.warning("distributed.client warning")
distributed.client - WARNING - distributed.client warning

... still has the old format.

>>> distributed_client_logger.handlers = []
>>> distributed_client_logger.propagate = True
>>> distributed_client_logger.warning("distributed.client warning")
[W 2019-05-05 23:11:18,901.901 distributed.client <stdin>:1] distributed.client warning

... to fix.

Things become more complicated when running some non-standard logging. A real-world example is when running IPython notebooks in a Docker container. The logging here is complex - the kernel process root logger is actually outputting to the running notebook (so via Tornado into the browser). If I want certain libraries to do Docker-style logging to stderr, I need to do some work on the root logger and its handlers. This works for all libraries, but then requires additional work for Dask distributed client, since it assumes that its default StreamHandler should go to stderr.

What about logging schema-based logging?

It is possible to switch to the newer logging schema-based logging, by adding a version = 1 key to the Dask logging config. However:

  • This forces me to configure Dask client logging specifically, rather than just being able to configure logging globally
  • I've not tested this yet, but I suspect that a key disable_existing_loggers = False will be required in order to prevent Dask client logging configuration from obliterating any/all other application logging configuration that preceded it.

Possible changes

One or more of:

  • Entirely deprecate and later remove the old/simple/default logging-configuration method.
  • Remove the recommendation for the simple logging system, replace with schema-based.
  • Schema-based logging could by-default if-not-specified act like disable_existing_loggers = False.
  • Document the above issues, considerations and workarounds.

Related issues/PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscussing a topic with no specific actions yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions