feat: add monitoring plugin interface by cademirch · Pull Request #2830 · snakemake/snakemake

cademirch · 2024-04-20T04:27:21Z

Description

@johanneskoester, just wanted to start the discussion on adding the monitoring plugin interface.

I've implemented a monitoring plugin interface in this repo. In this PR I've replaced the --log-service and --wms CLI options with --monitoring-provider, as well as wired up the use of the monitoring provider.

I designed the MonitoringProvider base class around using the log_handler, as this is what logging.WMSLogger and logging.SlackLogger used. New monitoring plugins will just need to define a log_handler and will be good to go.

In order to test this, I reimplemented the WMSLogger into a monitoring plugin in this repo.

Off the top of my head, things left to do are:

Write tests and docs
Publish the interface to PyPi
Implement SlackLogger as a plugin and publish
Publish WMSLogger plugin
Remove WMSLogger and SlackLogger from logging.py

Shouldn't take long to do all of the above, just want to make sure this is the way to go first.. I'll keep this as a draft until we decide on the way forward.

Extra:
It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

# workflow.py
def _build_dag(self):
        logger.info("Building DAG of jobs...")
        async_run(self.dag.init())
        async_run(self.dag.update_checkpoint_dependencies())
+      logger.debug_dag(self.dag.rule_dot()) # Could also define new func for this instead of debug_dag

Doing this in _build_dag would allow the monitor to update its DAG visualization if the workflow has checkpoints. Of course this assumes we use the log_handler for the monitoring plugin interface. I can't think of any issues logging the rulegraph in _build_dag could cause, but feedback here would be helpful.

cademirch · 2024-04-25T21:14:24Z

Hi @johanneskoester making this a full PR so you can review.

johanneskoester · 2024-04-28T19:03:53Z

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

cademirch · 2024-04-28T19:06:30Z

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

Great, I agree. What do you think about the monitoring interface? If this is the approach we take, I'm happy to start publishing the interface and plugins to pypi and write some tests for snakemake.

johanneskoester

Awesome, only a few things.

johanneskoester · 2024-04-28T19:05:14Z

snakemake/cli.py

-        "Snakemake will notify the service on errors and completed execution."
-        "Currently slack and workflow management system (wms) are supported.",
+        "--monitoring-provider",
+        help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",


Suggested change

help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",

help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake-monitoring-plugin-<name>",

Thinking about it, why not just snakemake-monitor-plugin-... (without the ing)?

I'm not sure tbh - I think b/c I had "monitoring-provider" stuck in my head. I like snakemake-monitor-plugin better, I'll make that change.

jjjermiah · 2024-05-02T14:46:12Z

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

cademirch · 2024-05-02T14:49:24Z

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

Thanks. Yes, I agree. I think working out the suggestion in #2827 to break out the show_failed_logs into its own method would be a good place to start, as it would be nice to offer failed logs to the monitor plugin

johanneskoester · 2024-05-02T20:36:46Z

One further thought: couldn't we instead call the interface snakemake-interface-logger-plugins, and unify monitoring and logging? Isn't it in the end just the question where the log messages are directed to? For the cli, I would then imagine that people can specify any number of loggers based on what plugins are installed, e.g. snakemake --logger json slack would run snakemake with snakemake-logger-plugin-json and snakemake-logger-plugin-slack, whereas snakemake --logger text wms-monitor would use the term logger (which will stay built into snakemake analogously to the local executor) and snakemake-logger-plugin-wms-monitor. Importantly, loggers that write to the terminal should be mutually exclusive, i.e. Snakemake should give an error if e.g. snakemake --logger text json is executed.

jjjermiah · 2024-05-02T21:04:04Z

that sounds about right, this is directly related to the logging itself. It might help to think about how the logs might be used.

I for one, was first interested in structuring logs to be posted directly to the Google Cloud Logging API and use that as a "database" where I can build a monitor that queries logs using filters on the fields that the login plugin defines.

I've tested this out using the log-handler-script argument to circumvent the issue regarding the google-batch plugin's logging issues.

Taking a step back it becomes clear that this is resembles more of a pub/sub approach which opens the door for using an actual pub/sub architecture.

This might be complicating things more than it needs to be, but it also allows for the remote jobs that are being executed to post their logs (instead of having to i.e do some kubectl logs snakejob-...... as the only other way to get a job's logs

cademirch · 2024-05-03T04:40:35Z

@johanneskoester I think unifying monitoring and logging is a great idea. Just to confirm though, the interface design would stay the same? Just the name would change? Now would be a good time to change the design.

If we go this direction, perhaps we should compile any changes to logging.py we would want to see and implement them here.

I'm currently traveling so I won't be able to make any code progress, but am happy to keep discussing!

cademirch · 2024-05-03T10:28:31Z

@johanneskoester would you be open to changes to the logger class in this PR? To be honest, if we want to do logging plugins, I think we should consider a pretty big refactor of how logging is handled. Currently the use of the log_handlers list and class wrapping the logger is pretty non standard and would limit potential logging plugins imo. If we were to use the logging addHandler method, we could design logging plugins to be subclasses of logging.Handler. Also it would be nice to use the built in logging.Filter to handle filtering of logs at the handler level, that way plugins could have access to all logs. Overall, the Logger class is doing too many things and could benefit from a refactor to make it more modular, especially if we want to move forward with log/monitor plugins. Here is an example of how we could use logging.dictConfig to specify different handlers, filters, for matters, etc.

As an example, currently passing a failed jobs logs to a logging plugin would be difficult because show_logs is too deeply nested as #2827 pointed out. Ideally it should be easy for future plugins to have access to all logs.

Finally, we could consider using a different logging package like structlog. While this would take a bit of work and add a dependency, I think it is worth thinking about as it is a bit more modern than pythons built in logging and would possibly make refactoring a bit easier.

I'm happy to flesh out this refactoring some more, but want to make sure that you're open to it @johanneskoester.

sonarqubecloud · 2024-05-03T17:38:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

johanneskoester · 2024-05-04T06:40:19Z

@cademirch yes absolutely. The logging code of Snakemake has never seen a proper design and has only been extended ad hoc when needed. Structlog looks great and I am completely open for a rewrite of all these parts. Looking forward to your changes!

Once we have logging plugins, people can also experiment with different ways of presenting the info on the terminal (e.g. in a more dynamic way for example, also including progress bars and so on).

cademirch · 2024-05-04T12:30:44Z

Sounds good. I'll leave this PR where its at for now to revisit down the line and open a new one for logging refactoring once. One question I do have: how much would you like to retain the current stdout logging formatting? Would you be opposed to pretty-printed json logs to stdout? This can be decided later, and I can work up some demos eventually, just curious what you might have in mind.

And I agree re presenting info on the terminal... I have some ideas already :)

johanneskoester · 2024-06-24T08:14:58Z

Not sure, I think the current logging is better than json because it is more readable. If anything it should rather be yaml as a default. However, with plugins and options of those plugins the possibilities are endless and people could use profiles to customize to whatever defaults they like most.

cademirch · 2024-06-24T09:35:04Z

That makes sense. I had made some progress in the logging refactoring last month, but had to put it down due to some deadlines. Those are wrapping up this week so I plan to start working on this again more next week. For now I'm planning on reimplementing the current stdout/file log formatting as closely as possible, then adding different output options (progress bars, JSON, yaml, etc) as plugins and or built in options.

cademirch · 2024-09-29T05:07:49Z

closed in favor of #3107

@johanneskoester

Followup to #2830. This PR attempts to make Snakemake's logging more modular to make room for a potential logging plugin interface, or other logging/monitoring additions, as well as make logging in Snakemake more standardized. Currently, most of logging logic is deep within `logging.Logger`, mainly in `text_handler()` with direct calls to `logger.info/warn/etc`. In this PR, this logic has been refactored to more comply with the standard `logging` practices. To do so, three new classes were added: 1. `ColorizingTextHandler` manages adding color to log records and emitting to stream. 2. `DefaultFormatter` formats log records to be printed. Much of the logic from `text_handler()` is here now. 3. `DefaultFilter` filters log records from being emitted based on level and quietness settings. Together, these classes should output identical logs to the current logging implementation. I've tested this briefly, but further testing would be good. With these changes, it is much easier to add new `logHandlers` via `logger.addHandler`. New handlers can decide how to format and filter logs themselves, an important characteristic missing in the current implementation that motivated this refactoring (#2830 (comment)). This would be a breaking change for `--log-handler-script`, as user provided log handlers will need to subclass `logging.logHandler`. Furthermore, logic for adding the user handler (and formatter/filter) will need to be added. As for the logging plug interface (#2830 (comment)), I'm not sure that is the route to take, but would appreciate thoughts on this. First, this functionality is already largely provided by `--log-handler-script` and adding a logging plugin would be redundant. Second, I think that development of such plugins would likely be minimal, and coordinated effort in this repo to maintain a good set of logging options would be preferable. For example: - A standard HTTP log handler send emit log records to a user supplied address on which a 3rd party monitoring app is listening. - A handler/formatter that uses [rich](https://github.com/Textualize/rich) to make a prettier/interactive console on which logs are written to. I think both of these additions would benefit being standardized to Snakemake, as well as default options to users. Additionally I think it would be worthwhile to unify logging in Snakemake. Currently, the use of the wrapper `logging.Logger` class and `setup_logger()` function being called in multiple places makes it difficult to work with the logger and is likely the source of some issues (#3044, #2974, #2882, #2797, #2727). Unifying the logger and its configuration as recommended by the [logging docs](https://docs.python.org/3/howto/logging-cookbook.html#a-cli-application-starter-template) would make it easier to work the the logger. Another option would be to use something like [loguru](https://loguru.readthedocs.io/en/stable/index.html). Whichever option, I think simplifying and unifying the logger should be the next step of this PR. One main question I have is how much we would like to preserve the current stderr output formatting. Most of logging logic is in formatting rule info and such. Moving forward we can now try different formatting. Personally, I find the current detailed rule output helpful for sanity checks when developing and doing a dry run. However, when actually running the workflow (especially a large one) the terminal is flooded and errors can quickly be lost which requires searching the log file. Of course this can be alleviated with --quiet. Another question is regarding the two calls of setup_logger() brought up in #2797. I still don't understand why these are necessary and understanding this would be helpful in unifying this. I'm happy to keep pushing this forward but would appreciate your thoughts before continuing @johanneskoester! I  ## Summary by CodeRabbit - **New Features** - Introduced a new CLI option (--logger) for configuring custom loggers. - Enhanced structured logging delivers clearer runtime and resource insights, including rule graph logging. - Added a new method for logging rule graph information. - Added a new enumeration value `REASON` to the `Quietness` class. - **Refactor** - Overhauled the logging architecture across execution, scheduling, and workflow components for improved traceability and error handling. - **Chores** - Updated dependencies and setup configurations to include new logger plugins. - Adjusted test configurations to capture all output, ensuring more comprehensive test diagnostics. - **Bug Fixes** - Resolved minor command and formatting inconsistencies for smoother operations.  --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

@johanneskoester

Followup to snakemake#2830. This PR attempts to make Snakemake's logging more modular to make room for a potential logging plugin interface, or other logging/monitoring additions, as well as make logging in Snakemake more standardized. Currently, most of logging logic is deep within `logging.Logger`, mainly in `text_handler()` with direct calls to `logger.info/warn/etc`. In this PR, this logic has been refactored to more comply with the standard `logging` practices. To do so, three new classes were added: 1. `ColorizingTextHandler` manages adding color to log records and emitting to stream. 2. `DefaultFormatter` formats log records to be printed. Much of the logic from `text_handler()` is here now. 3. `DefaultFilter` filters log records from being emitted based on level and quietness settings. Together, these classes should output identical logs to the current logging implementation. I've tested this briefly, but further testing would be good. With these changes, it is much easier to add new `logHandlers` via `logger.addHandler`. New handlers can decide how to format and filter logs themselves, an important characteristic missing in the current implementation that motivated this refactoring (snakemake#2830 (comment)). This would be a breaking change for `--log-handler-script`, as user provided log handlers will need to subclass `logging.logHandler`. Furthermore, logic for adding the user handler (and formatter/filter) will need to be added. As for the logging plug interface (snakemake#2830 (comment)), I'm not sure that is the route to take, but would appreciate thoughts on this. First, this functionality is already largely provided by `--log-handler-script` and adding a logging plugin would be redundant. Second, I think that development of such plugins would likely be minimal, and coordinated effort in this repo to maintain a good set of logging options would be preferable. For example: - A standard HTTP log handler send emit log records to a user supplied address on which a 3rd party monitoring app is listening. - A handler/formatter that uses [rich](https://github.com/Textualize/rich) to make a prettier/interactive console on which logs are written to. I think both of these additions would benefit being standardized to Snakemake, as well as default options to users. Additionally I think it would be worthwhile to unify logging in Snakemake. Currently, the use of the wrapper `logging.Logger` class and `setup_logger()` function being called in multiple places makes it difficult to work with the logger and is likely the source of some issues (snakemake#3044, snakemake#2974, snakemake#2882, snakemake#2797, snakemake#2727). Unifying the logger and its configuration as recommended by the [logging docs](https://docs.python.org/3/howto/logging-cookbook.html#a-cli-application-starter-template) would make it easier to work the the logger. Another option would be to use something like [loguru](https://loguru.readthedocs.io/en/stable/index.html). Whichever option, I think simplifying and unifying the logger should be the next step of this PR. One main question I have is how much we would like to preserve the current stderr output formatting. Most of logging logic is in formatting rule info and such. Moving forward we can now try different formatting. Personally, I find the current detailed rule output helpful for sanity checks when developing and doing a dry run. However, when actually running the workflow (especially a large one) the terminal is flooded and errors can quickly be lost which requires searching the log file. Of course this can be alleviated with --quiet. Another question is regarding the two calls of setup_logger() brought up in snakemake#2797. I still don't understand why these are necessary and understanding this would be helpful in unifying this. I'm happy to keep pushing this forward but would appreciate your thoughts before continuing @johanneskoester! I  ## Summary by CodeRabbit - **New Features** - Introduced a new CLI option (--logger) for configuring custom loggers. - Enhanced structured logging delivers clearer runtime and resource insights, including rule graph logging. - Added a new method for logging rule graph information. - Added a new enumeration value `REASON` to the `Quietness` class. - **Refactor** - Overhauled the logging architecture across execution, scheduling, and workflow components for improved traceability and error handling. - **Chores** - Updated dependencies and setup configurations to include new logger plugins. - Adjusted test configurations to capture all output, ensuring more comprehensive test diagnostics. - **Bug Fixes** - Resolved minor command and formatting inconsistencies for smoother operations.  --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de> Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

cademirch added 3 commits April 15, 2024 13:27

initial commit adding monitoring provider

7bf0c42

Merge branch 'snakemake:main' into feat-add-monitoring-interface

d447f64

remove wms cli options, add metadata to monitor settings

bb5ea94

cademirch changed the title ~~Feat: add monitoring plugin interface~~ feat: add monitoring plugin interface Apr 20, 2024

cademirch marked this pull request as ready for review April 25, 2024 21:14

cademirch requested a review from johanneskoester as a code owner April 25, 2024 21:14

cademirch added 2 commits April 25, 2024 14:19

Merge branch 'main' into feat-add-monitoring-interface

c029296

update docs

cb08b99

johanneskoester mentioned this pull request Apr 28, 2024

fix: Prevent binary log files to crash snakemake execution with show-failed-logs #2827

Merged

2 tasks

johanneskoester reviewed Apr 28, 2024

View reviewed changes

Merge branch 'main' into feat-add-monitoring-interface

3901423

Merge branch 'snakemake:main' into feat-add-monitoring-interface

61e1666

johanneskoester self-assigned this May 4, 2024

cademirch mentioned this pull request Jun 3, 2024

--quiet settings not intuitive #2882

Open

cademirch mentioned this pull request Sep 29, 2024

feat!: Logging refactor & add LoggerPluginInterface #3107

Merged

cademirch closed this Sep 29, 2024

	help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",
	help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake-monitoring-plugin-<name>",

Conversation

cademirch commented Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

cademirch commented Apr 25, 2024

Uh oh!

johanneskoester commented Apr 28, 2024

Uh oh!

cademirch commented Apr 28, 2024

Uh oh!

johanneskoester left a comment

Choose a reason for hiding this comment

Uh oh!

johanneskoester Apr 28, 2024

Choose a reason for hiding this comment

Uh oh!

johanneskoester Apr 28, 2024

Choose a reason for hiding this comment

Uh oh!

cademirch Apr 28, 2024

Choose a reason for hiding this comment

Uh oh!

jjjermiah commented May 2, 2024

Uh oh!

cademirch commented May 2, 2024

Uh oh!

johanneskoester commented May 2, 2024

Uh oh!

jjjermiah commented May 2, 2024

Uh oh!

cademirch commented May 3, 2024

Uh oh!

cademirch commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented May 3, 2024

Quality Gate passed

Uh oh!

johanneskoester commented May 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cademirch commented May 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johanneskoester commented Jun 24, 2024

Uh oh!

cademirch commented Jun 24, 2024

Uh oh!

cademirch commented Sep 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cademirch commented Apr 20, 2024 •

edited

Loading

cademirch commented May 3, 2024 •

edited

Loading

johanneskoester commented May 4, 2024 •

edited

Loading

cademirch commented May 4, 2024 •

edited

Loading