Logging standardization - Contextual logging - Structured logging

Logging is often a crucial instrument for debugging and we are using different ways to do so.

1. Python stdlib `logging` for human readable messages without contextual information
2. Implementation specific log calls which store logs in a `deque` with _some_ context information, e.g. 
* `Scheduler.log_event`/`Scheduler.events` Logs external stimuli from workers and clients as an event in a dictionary by source
* `Scheduler.transition_log` Exclusively used to log transitions in a semi-structured format `(key, start, finish, recommendations, timestamp)`
* `Worker.log` Unstructured. Part events, part transitions, sometimes with timestamps

The problems I see with this approach are multifold

1. The internal `deque` logging has been frequently the cause for memory related troubles since they accumulate memory over time and users are often not aware of this. We artificially need to limit the amount of logs to keep with options like `transition-log-length`, `events-log-length`, `events-cleanup-delay`, etc.
2. Our internal logging is not in a standardized format. Mostly there are tuples logged where the order and length is different, depending on what kind of _event_ was logged (e.g. work stealing is different to transition, external stimuli log entirely different information)
3. Neither the stdlib logging nor the implementation specific logic currently logs enough context information (that's very subjective of course). For instance, we know the module which created the log event but not which worker or which thread issued it, let alone in what context. Context could be as simple as logging the worker name, ip, thread ID, etc. but also application specific things like computation ID (https://github.com/dask/distributed/issues/4613) of a transition (see also https://github.com/dask/distributed/issues/4037 https://github.com/dask/distributed/issues/4718)
4. The split into internal and stdlib logging means that to get _all_ logs we usually need to consolidate multiple sources. For instance, we'll need to collect stdout/err (or however your stdlib logger is configured), scrape all workers and the scheduler. All in different formats.
5. Machine readability is often not great. For the _simple_ filtering of "give me all events belonging to a key" we have specialized functions like `story` but we need to write specialized functions for every possible query https://github.com/dask/distributed/blob/b577ece3f4bf5626d5ab6040065d8ff4d3880feb/distributed/worker.py#L1946-L1958
6. Our internal logging is ephemeral by design and this is not optional or configurable
7. The internal logging cannot be filtered by a level.

Most, if not all, of the above described issues can be addressed by custom solutions.
For instance
* our deque loggers could be implemented as stdlib logging handlers to have *one* source which is highly configurable (https://docs.python.org/3/library/logging.handlers.html#logging.handlers.QueueHandler)
* Structured logging can be implemented for better machine readability https://docs.python.org/3/howto/logging-cookbook.html#implementing-structured-logging
* Log via adapters to add more context information https://docs.python.org/3/howto/logging-cookbook.html#adding-contextual-information-to-your-logging-output
* etc.

Instead of doing this all by ourselves, we could also resort to libs which are doing a great job of encapsulating this in easy to use APIs. One lib I am decently familiar with and is quite popular is [structlog](https://www.structlog.org/en/stable/getting-started.html) and I was wondering if this was something we are interested in.



	def story(self, *keys):
	keys = [key.key if isinstance(key, TaskState) else key for key in keys]
	return [
	msg
	for msg in self.log
	if any(key in msg for key in keys)
	or any(
	key in c
	for key in keys
	for c in msg
	if isinstance(c, (tuple, list, set))
	)
	]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logging standardization - Contextual logging - Structured logging #4762

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Logging standardization - Contextual logging - Structured logging #4762

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions