Old/simple/default logging config is opaque and hard to externally-configure

This issue might become either a doc-improvement one, or possibly lead to some non-backwards-compatible change in defaults.

# Summary, TL;DR

Dask `distributed` (and specifically client-side) logging should behave roughly according to [Configuring Logging for a Library](https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library), avoiding application-level logging-configuration decisions.  The default/absent logging configuration does not do this and configures a custom (flat, non-hierarchical) logging structure with specific `Handler`s.  There is an option to use the newer `logging` `dictConfig` schema, but this brings other considerations.

# Detail

I have an application which loads a number of libraries, including "foo", and the Dask `distributed` `Client`.  The expectation I can easily configure application logging externally, including logging from `foo` and (Dask) `distributed` and any other library.  Providing they are all good library logging citizens then this should be easy and not require special knowledge.  In actuality I need to treat the Dask `distributed` client library specifically, and have special knowledge of its logging-configuration, so it can't be treated globally as just another library.

Imagine we are running in an application's `config.py` - or we might be configuring logging in a notebook via `ipython_kernel_config.py`.  The following could also be transposed into the `logging` module YAML schema.  Testing in `daskdev/dask:latest` (`docker run -it --rm daskdev/dask python`:

```
>>> import logging
>>> from logging import StreamHandler, Formatter
```

We want all loggers to have a standard format, so we can change the formatter on the root logger's handler:

```
>>> root_handler = logging.StreamHandler()
>>> root_handler.setFormatter(Formatter("[%(levelname)1.1s %(asctime)s.%(msecs).03d %(name)s %(module)s:%(lineno)d] %(message)s"))
>>> logging.root.handlers = [root_handler]
```

Test what the `foo` logger output will look like:

```
>>> foo_logger = logging.getLogger("foo")
>>> foo_logger.warning("foo warning")
[W 2019-05-05 22:53:43,628.628 foo <stdin>:1] foo warning
```

All good so far.  Check Dask `distributed` logging:

```
>>> import distributed
>>> distributed_logger = logging.getLogger("distributed")
>>> distributed_logger.warning("distributed warning")
distributed - WARNING - distributed warning
```

Not the format we are after.  At this point we check `distributed` log config/init code and also check:

```
>>> distributed_logger.handlers
[<StreamHandler <stderr> (NOTSET)>]
>>> distributed_logger.propagate
False
>>> distributed_logger.handlers[0] == logging.root.handlers[0]
False
```

So the `distributed` logger is a pseudo-root logger and its handlers and formatters cannot be configured globally if-desired via the root logger.

```
>>> distributed_logger.handlers = []
>>> distributed_logger.propagate = True
>>> distributed_logger.warning("distributed warning")
[W 2019-05-05 23:02:09,853.853 distributed <stdin>:1] distributed warning
```

# Additional analysis

Going further, you notice that using the old/simple/default logging-config mechanism, you don't actually have a hierarchy of loggers at all - it's been flattened using `propagate = False` and local `Handler`s on any pre-configured or default loggers:

```
>>> distributed_client_logger = logging.getLogger("distributed.client")
>>> distributed_client_logger.warning("distributed.client warning")
distributed.client - WARNING - distributed.client warning
```

... still has the old format.

```
>>> distributed_client_logger.handlers = []
>>> distributed_client_logger.propagate = True
>>> distributed_client_logger.warning("distributed.client warning")
[W 2019-05-05 23:11:18,901.901 distributed.client <stdin>:1] distributed.client warning
```

... to fix.

Things become more complicated when running some non-standard logging. A real-world example is when running IPython notebooks in a Docker container.  The logging here is complex - the kernel process root logger is actually outputting to the running notebook (so via Tornado into the browser).  If I want certain libraries to do Docker-style logging to `stderr`, I need to do some work on the root logger and its handlers.  This works for all libraries, but then requires additional work for Dask `distributed` client, since it assumes that its default `StreamHandler` should go to `stderr`.

# What about `logging` schema-based logging?

It is possible to switch to the newer `logging` schema-based logging, by adding a `version = 1` key to the Dask logging config.  However:

- This forces me to configure Dask client logging specifically, rather than just being able to configure logging globally
- I've not tested this yet, but I suspect that a key `disable_existing_loggers = False` will be required in order to prevent Dask client logging configuration from obliterating any/all other application logging configuration that preceded it.

# Possible changes

One or more of:

- Entirely deprecate and later remove the old/simple/default logging-configuration method.
- Remove the recommendation for the simple logging system, replace with schema-based.
- Schema-based logging could by-default if-not-specified act like `disable_existing_loggers = False`.
- Document the above issues, considerations and workarounds.

# Related issues/PRs

- #618 _distributed hijacks the logger_ and 08c6c67a9258316b08fbf5aa3453a3af8cd1c72b is the change where the loggers were flattened and provided individual `Handler`s.
- #1381 _logging configuration turns off existing loggers_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Old/simple/default logging config is opaque and hard to externally-configure #2659

Summary, TL;DR

Detail

Additional analysis

What about `logging` schema-based logging?

Possible changes

Related issues/PRs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Old/simple/default logging config is opaque and hard to externally-configure #2659

Description

Summary, TL;DR

Detail

Additional analysis

What about logging schema-based logging?

Possible changes

Related issues/PRs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What about `logging` schema-based logging?