feat: add monitoring plugin interface#2830
Conversation
|
Hi @johanneskoester making this a full PR so you can review. |
Yes, good idea! Should be a separate method then, instead of debug_dag I think. |
Great, I agree. What do you think about the monitoring interface? If this is the approach we take, I'm happy to start publishing the interface and plugins to pypi and write some tests for snakemake. |
johanneskoester
left a comment
There was a problem hiding this comment.
Awesome, only a few things.
| "Snakemake will notify the service on errors and completed execution." | ||
| "Currently slack and workflow management system (wms) are supported.", | ||
| "--monitoring-provider", | ||
| help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>", |
There was a problem hiding this comment.
| help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>", | |
| help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake-monitoring-plugin-<name>", |
There was a problem hiding this comment.
Thinking about it, why not just snakemake-monitor-plugin-... (without the ing)?
There was a problem hiding this comment.
I'm not sure tbh - I think b/c I had "monitoring-provider" stuck in my head. I like snakemake-monitor-plugin better, I'll make that change.
|
nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin |
Thanks. Yes, I agree. I think working out the suggestion in #2827 to break out the |
|
One further thought: couldn't we instead call the interface snakemake-interface-logger-plugins, and unify monitoring and logging? Isn't it in the end just the question where the log messages are directed to? For the cli, I would then imagine that people can specify any number of loggers based on what plugins are installed, e.g. |
|
that sounds about right, this is directly related to the logging itself. It might help to think about how the logs might be used. I for one, was first interested in structuring logs to be posted directly to the Google Cloud Logging API and use that as a "database" where I can build a monitor that queries logs using filters on the fields that the login plugin defines.
Taking a step back it becomes clear that this is resembles more of a pub/sub approach which opens the door for using an actual pub/sub architecture. This might be complicating things more than it needs to be, but it also allows for the remote jobs that are being executed to post their logs (instead of having to i.e do some |
|
@johanneskoester I think unifying monitoring and logging is a great idea. Just to confirm though, the interface design would stay the same? Just the name would change? Now would be a good time to change the design. If we go this direction, perhaps we should compile any changes to logging.py we would want to see and implement them here. I'm currently traveling so I won't be able to make any code progress, but am happy to keep discussing! |
|
@johanneskoester would you be open to changes to the logger class in this PR? To be honest, if we want to do logging plugins, I think we should consider a pretty big refactor of how logging is handled. Currently the use of the log_handlers list and class wrapping the logger is pretty non standard and would limit potential logging plugins imo. If we were to use the logging addHandler method, we could design logging plugins to be subclasses of logging.Handler. Also it would be nice to use the built in logging.Filter to handle filtering of logs at the handler level, that way plugins could have access to all logs. Overall, the Logger class is doing too many things and could benefit from a refactor to make it more modular, especially if we want to move forward with log/monitor plugins. Here is an example of how we could use logging.dictConfig to specify different handlers, filters, for matters, etc. As an example, currently passing a failed jobs logs to a logging plugin would be difficult because show_logs is too deeply nested as #2827 pointed out. Ideally it should be easy for future plugins to have access to all logs. Finally, we could consider using a different logging package like structlog. While this would take a bit of work and add a dependency, I think it is worth thinking about as it is a bit more modern than pythons built in logging and would possibly make refactoring a bit easier. I'm happy to flesh out this refactoring some more, but want to make sure that you're open to it @johanneskoester. |
|
|
@cademirch yes absolutely. The logging code of Snakemake has never seen a proper design and has only been extended ad hoc when needed. Structlog looks great and I am completely open for a rewrite of all these parts. Looking forward to your changes! Once we have logging plugins, people can also experiment with different ways of presenting the info on the terminal (e.g. in a more dynamic way for example, also including progress bars and so on). |
|
Sounds good. I'll leave this PR where its at for now to revisit down the line and open a new one for logging refactoring once. One question I do have: how much would you like to retain the current stdout logging formatting? Would you be opposed to pretty-printed json logs to stdout? This can be decided later, and I can work up some demos eventually, just curious what you might have in mind. And I agree re presenting info on the terminal... I have some ideas already :) |
|
Not sure, I think the current logging is better than json because it is more readable. If anything it should rather be yaml as a default. However, with plugins and options of those plugins the possibilities are endless and people could use profiles to customize to whatever defaults they like most. |
|
That makes sense. I had made some progress in the logging refactoring last month, but had to put it down due to some deadlines. Those are wrapping up this week so I plan to start working on this again more next week. For now I'm planning on reimplementing the current stdout/file log formatting as closely as possible, then adding different output options (progress bars, JSON, yaml, etc) as plugins and or built in options. |
|
closed in favor of #3107 |
Followup to #2830. This PR attempts to make Snakemake's logging more modular to make room for a potential logging plugin interface, or other logging/monitoring additions, as well as make logging in Snakemake more standardized. Currently, most of logging logic is deep within `logging.Logger`, mainly in `text_handler()` with direct calls to `logger.info/warn/etc`. In this PR, this logic has been refactored to more comply with the standard `logging` practices. To do so, three new classes were added: 1. `ColorizingTextHandler` manages adding color to log records and emitting to stream. 2. `DefaultFormatter` formats log records to be printed. Much of the logic from `text_handler()` is here now. 3. `DefaultFilter` filters log records from being emitted based on level and quietness settings. Together, these classes should output identical logs to the current logging implementation. I've tested this briefly, but further testing would be good. With these changes, it is much easier to add new `logHandlers` via `logger.addHandler`. New handlers can decide how to format and filter logs themselves, an important characteristic missing in the current implementation that motivated this refactoring (#2830 (comment)). This would be a breaking change for `--log-handler-script`, as user provided log handlers will need to subclass `logging.logHandler`. Furthermore, logic for adding the user handler (and formatter/filter) will need to be added. As for the logging plug interface (#2830 (comment)), I'm not sure that is the route to take, but would appreciate thoughts on this. First, this functionality is already largely provided by `--log-handler-script` and adding a logging plugin would be redundant. Second, I think that development of such plugins would likely be minimal, and coordinated effort in this repo to maintain a good set of logging options would be preferable. For example: - A standard HTTP log handler send emit log records to a user supplied address on which a 3rd party monitoring app is listening. - A handler/formatter that uses [rich](https://github.com/Textualize/rich) to make a prettier/interactive console on which logs are written to. I think both of these additions would benefit being standardized to Snakemake, as well as default options to users. Additionally I think it would be worthwhile to unify logging in Snakemake. Currently, the use of the wrapper `logging.Logger` class and `setup_logger()` function being called in multiple places makes it difficult to work with the logger and is likely the source of some issues (#3044, #2974, #2882, #2797, #2727). Unifying the logger and its configuration as recommended by the [logging docs](https://docs.python.org/3/howto/logging-cookbook.html#a-cli-application-starter-template) would make it easier to work the the logger. Another option would be to use something like [loguru](https://loguru.readthedocs.io/en/stable/index.html). Whichever option, I think simplifying and unifying the logger should be the next step of this PR. One main question I have is how much we would like to preserve the current stderr output formatting. Most of logging logic is in formatting rule info and such. Moving forward we can now try different formatting. Personally, I find the current detailed rule output helpful for sanity checks when developing and doing a dry run. However, when actually running the workflow (especially a large one) the terminal is flooded and errors can quickly be lost which requires searching the log file. Of course this can be alleviated with --quiet. Another question is regarding the two calls of setup_logger() brought up in #2797. I still don't understand why these are necessary and understanding this would be helpful in unifying this. I'm happy to keep pushing this forward but would appreciate your thoughts before continuing @johanneskoester! I <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new CLI option (--logger) for configuring custom loggers. - Enhanced structured logging delivers clearer runtime and resource insights, including rule graph logging. - Added a new method for logging rule graph information. - Added a new enumeration value `REASON` to the `Quietness` class. - **Refactor** - Overhauled the logging architecture across execution, scheduling, and workflow components for improved traceability and error handling. - **Chores** - Updated dependencies and setup configurations to include new logger plugins. - Adjusted test configurations to capture all output, ensuring more comprehensive test diagnostics. - **Bug Fixes** - Resolved minor command and formatting inconsistencies for smoother operations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Followup to snakemake#2830. This PR attempts to make Snakemake's logging more modular to make room for a potential logging plugin interface, or other logging/monitoring additions, as well as make logging in Snakemake more standardized. Currently, most of logging logic is deep within `logging.Logger`, mainly in `text_handler()` with direct calls to `logger.info/warn/etc`. In this PR, this logic has been refactored to more comply with the standard `logging` practices. To do so, three new classes were added: 1. `ColorizingTextHandler` manages adding color to log records and emitting to stream. 2. `DefaultFormatter` formats log records to be printed. Much of the logic from `text_handler()` is here now. 3. `DefaultFilter` filters log records from being emitted based on level and quietness settings. Together, these classes should output identical logs to the current logging implementation. I've tested this briefly, but further testing would be good. With these changes, it is much easier to add new `logHandlers` via `logger.addHandler`. New handlers can decide how to format and filter logs themselves, an important characteristic missing in the current implementation that motivated this refactoring (snakemake#2830 (comment)). This would be a breaking change for `--log-handler-script`, as user provided log handlers will need to subclass `logging.logHandler`. Furthermore, logic for adding the user handler (and formatter/filter) will need to be added. As for the logging plug interface (snakemake#2830 (comment)), I'm not sure that is the route to take, but would appreciate thoughts on this. First, this functionality is already largely provided by `--log-handler-script` and adding a logging plugin would be redundant. Second, I think that development of such plugins would likely be minimal, and coordinated effort in this repo to maintain a good set of logging options would be preferable. For example: - A standard HTTP log handler send emit log records to a user supplied address on which a 3rd party monitoring app is listening. - A handler/formatter that uses [rich](https://github.com/Textualize/rich) to make a prettier/interactive console on which logs are written to. I think both of these additions would benefit being standardized to Snakemake, as well as default options to users. Additionally I think it would be worthwhile to unify logging in Snakemake. Currently, the use of the wrapper `logging.Logger` class and `setup_logger()` function being called in multiple places makes it difficult to work with the logger and is likely the source of some issues (snakemake#3044, snakemake#2974, snakemake#2882, snakemake#2797, snakemake#2727). Unifying the logger and its configuration as recommended by the [logging docs](https://docs.python.org/3/howto/logging-cookbook.html#a-cli-application-starter-template) would make it easier to work the the logger. Another option would be to use something like [loguru](https://loguru.readthedocs.io/en/stable/index.html). Whichever option, I think simplifying and unifying the logger should be the next step of this PR. One main question I have is how much we would like to preserve the current stderr output formatting. Most of logging logic is in formatting rule info and such. Moving forward we can now try different formatting. Personally, I find the current detailed rule output helpful for sanity checks when developing and doing a dry run. However, when actually running the workflow (especially a large one) the terminal is flooded and errors can quickly be lost which requires searching the log file. Of course this can be alleviated with --quiet. Another question is regarding the two calls of setup_logger() brought up in snakemake#2797. I still don't understand why these are necessary and understanding this would be helpful in unifying this. I'm happy to keep pushing this forward but would appreciate your thoughts before continuing @johanneskoester! I <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new CLI option (--logger) for configuring custom loggers. - Enhanced structured logging delivers clearer runtime and resource insights, including rule graph logging. - Added a new method for logging rule graph information. - Added a new enumeration value `REASON` to the `Quietness` class. - **Refactor** - Overhauled the logging architecture across execution, scheduling, and workflow components for improved traceability and error handling. - **Chores** - Updated dependencies and setup configurations to include new logger plugins. - Adjusted test configurations to capture all output, ensuring more comprehensive test diagnostics. - **Bug Fixes** - Resolved minor command and formatting inconsistencies for smoother operations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>




Description
@johanneskoester, just wanted to start the discussion on adding the monitoring plugin interface.
I've implemented a monitoring plugin interface in this repo. In this PR I've replaced the
--log-serviceand--wmsCLI options with--monitoring-provider, as well as wired up the use of the monitoring provider.I designed the
MonitoringProviderbase class around using thelog_handler, as this is whatlogging.WMSLoggerandlogging.SlackLoggerused. New monitoring plugins will just need to define alog_handlerand will be good to go.In order to test this, I reimplemented the
WMSLoggerinto a monitoring plugin in this repo.Off the top of my head, things left to do are:
Shouldn't take long to do all of the above, just want to make sure this is the way to go first.. I'll keep this as a draft until we decide on the way forward.
Extra:
It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:
Doing this in
_build_dagwould allow the monitor to update its DAG visualization if the workflow has checkpoints. Of course this assumes we use the log_handler for the monitoring plugin interface. I can't think of any issues logging the rulegraph in_build_dagcould cause, but feedback here would be helpful.