Skip to content

feat: add monitoring plugin interface#2830

Closed
cademirch wants to merge 7 commits intosnakemake:mainfrom
cademirch:feat-add-monitoring-interface
Closed

feat: add monitoring plugin interface#2830
cademirch wants to merge 7 commits intosnakemake:mainfrom
cademirch:feat-add-monitoring-interface

Conversation

@cademirch
Copy link
Copy Markdown
Contributor

@cademirch cademirch commented Apr 20, 2024

Description

@johanneskoester, just wanted to start the discussion on adding the monitoring plugin interface.

I've implemented a monitoring plugin interface in this repo. In this PR I've replaced the --log-service and --wms CLI options with --monitoring-provider, as well as wired up the use of the monitoring provider.

I designed the MonitoringProvider base class around using the log_handler, as this is what logging.WMSLogger and logging.SlackLogger used. New monitoring plugins will just need to define a log_handler and will be good to go.

In order to test this, I reimplemented the WMSLogger into a monitoring plugin in this repo.

Off the top of my head, things left to do are:

  • Write tests and docs
  • Publish the interface to PyPi
  • Implement SlackLogger as a plugin and publish
  • Publish WMSLogger plugin
  • Remove WMSLogger and SlackLogger from logging.py

Shouldn't take long to do all of the above, just want to make sure this is the way to go first.. I'll keep this as a draft until we decide on the way forward.

Extra:
It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

# workflow.py
def _build_dag(self):
        logger.info("Building DAG of jobs...")
        async_run(self.dag.init())
        async_run(self.dag.update_checkpoint_dependencies())
+      logger.debug_dag(self.dag.rule_dot()) # Could also define new func for this instead of debug_dag

Doing this in _build_dag would allow the monitor to update its DAG visualization if the workflow has checkpoints. Of course this assumes we use the log_handler for the monitoring plugin interface. I can't think of any issues logging the rulegraph in _build_dag could cause, but feedback here would be helpful.

@cademirch cademirch changed the title Feat: add monitoring plugin interface feat: add monitoring plugin interface Apr 20, 2024
@cademirch
Copy link
Copy Markdown
Contributor Author

Hi @johanneskoester making this a full PR so you can review.

@cademirch cademirch marked this pull request as ready for review April 25, 2024 21:14
@johanneskoester
Copy link
Copy Markdown
Contributor

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

@cademirch
Copy link
Copy Markdown
Contributor Author

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

Great, I agree. What do you think about the monitoring interface? If this is the approach we take, I'm happy to start publishing the interface and plugins to pypi and write some tests for snakemake.

Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, only a few things.

"Snakemake will notify the service on errors and completed execution."
"Currently slack and workflow management system (wms) are supported.",
"--monitoring-provider",
help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",
help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake-monitoring-plugin-<name>",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, why not just snakemake-monitor-plugin-... (without the ing)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure tbh - I think b/c I had "monitoring-provider" stuck in my head. I like snakemake-monitor-plugin better, I'll make that change.

@jjjermiah
Copy link
Copy Markdown
Contributor

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

@cademirch
Copy link
Copy Markdown
Contributor Author

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

Thanks. Yes, I agree. I think working out the suggestion in #2827 to break out the show_failed_logs into its own method would be a good place to start, as it would be nice to offer failed logs to the monitor plugin

@johanneskoester
Copy link
Copy Markdown
Contributor

One further thought: couldn't we instead call the interface snakemake-interface-logger-plugins, and unify monitoring and logging? Isn't it in the end just the question where the log messages are directed to? For the cli, I would then imagine that people can specify any number of loggers based on what plugins are installed, e.g. snakemake --logger json slack would run snakemake with snakemake-logger-plugin-json and snakemake-logger-plugin-slack, whereas snakemake --logger text wms-monitor would use the term logger (which will stay built into snakemake analogously to the local executor) and snakemake-logger-plugin-wms-monitor. Importantly, loggers that write to the terminal should be mutually exclusive, i.e. Snakemake should give an error if e.g. snakemake --logger text json is executed.

@jjjermiah
Copy link
Copy Markdown
Contributor

that sounds about right, this is directly related to the logging itself. It might help to think about how the logs might be used.

I for one, was first interested in structuring logs to be posted directly to the Google Cloud Logging API and use that as a "database" where I can build a monitor that queries logs using filters on the fields that the login plugin defines.

  • I've tested this out using the log-handler-script argument to circumvent the issue regarding the google-batch plugin's logging issues.

Taking a step back it becomes clear that this is resembles more of a pub/sub approach which opens the door for using an actual pub/sub architecture.

This might be complicating things more than it needs to be, but it also allows for the remote jobs that are being executed to post their logs (instead of having to i.e do some kubectl logs snakejob-...... as the only other way to get a job's logs

@cademirch
Copy link
Copy Markdown
Contributor Author

@johanneskoester I think unifying monitoring and logging is a great idea. Just to confirm though, the interface design would stay the same? Just the name would change? Now would be a good time to change the design.

If we go this direction, perhaps we should compile any changes to logging.py we would want to see and implement them here.

I'm currently traveling so I won't be able to make any code progress, but am happy to keep discussing!

@cademirch
Copy link
Copy Markdown
Contributor Author

cademirch commented May 3, 2024

@johanneskoester would you be open to changes to the logger class in this PR? To be honest, if we want to do logging plugins, I think we should consider a pretty big refactor of how logging is handled. Currently the use of the log_handlers list and class wrapping the logger is pretty non standard and would limit potential logging plugins imo. If we were to use the logging addHandler method, we could design logging plugins to be subclasses of logging.Handler. Also it would be nice to use the built in logging.Filter to handle filtering of logs at the handler level, that way plugins could have access to all logs. Overall, the Logger class is doing too many things and could benefit from a refactor to make it more modular, especially if we want to move forward with log/monitor plugins. Here is an example of how we could use logging.dictConfig to specify different handlers, filters, for matters, etc.

As an example, currently passing a failed jobs logs to a logging plugin would be difficult because show_logs is too deeply nested as #2827 pointed out. Ideally it should be easy for future plugins to have access to all logs.

Finally, we could consider using a different logging package like structlog. While this would take a bit of work and add a dependency, I think it is worth thinking about as it is a bit more modern than pythons built in logging and would possibly make refactoring a bit easier.

I'm happy to flesh out this refactoring some more, but want to make sure that you're open to it @johanneskoester.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented May 3, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@johanneskoester johanneskoester self-assigned this May 4, 2024
@johanneskoester
Copy link
Copy Markdown
Contributor

johanneskoester commented May 4, 2024

@cademirch yes absolutely. The logging code of Snakemake has never seen a proper design and has only been extended ad hoc when needed. Structlog looks great and I am completely open for a rewrite of all these parts. Looking forward to your changes!

Once we have logging plugins, people can also experiment with different ways of presenting the info on the terminal (e.g. in a more dynamic way for example, also including progress bars and so on).

@cademirch
Copy link
Copy Markdown
Contributor Author

cademirch commented May 4, 2024

Sounds good. I'll leave this PR where its at for now to revisit down the line and open a new one for logging refactoring once. One question I do have: how much would you like to retain the current stdout logging formatting? Would you be opposed to pretty-printed json logs to stdout? This can be decided later, and I can work up some demos eventually, just curious what you might have in mind.

And I agree re presenting info on the terminal... I have some ideas already :)

@johanneskoester
Copy link
Copy Markdown
Contributor

Not sure, I think the current logging is better than json because it is more readable. If anything it should rather be yaml as a default. However, with plugins and options of those plugins the possibilities are endless and people could use profiles to customize to whatever defaults they like most.

@cademirch
Copy link
Copy Markdown
Contributor Author

That makes sense. I had made some progress in the logging refactoring last month, but had to put it down due to some deadlines. Those are wrapping up this week so I plan to start working on this again more next week. For now I'm planning on reimplementing the current stdout/file log formatting as closely as possible, then adding different output options (progress bars, JSON, yaml, etc) as plugins and or built in options.

@cademirch
Copy link
Copy Markdown
Contributor Author

closed in favor of #3107

@cademirch cademirch closed this Sep 29, 2024
johanneskoester added a commit that referenced this pull request Mar 12, 2025
Followup to #2830.

This PR attempts to make Snakemake's logging more modular to make room
for a potential logging plugin interface, or other logging/monitoring
additions, as well as make logging in Snakemake more standardized.

Currently, most of logging logic is deep within `logging.Logger`, mainly
in `text_handler()` with direct calls to `logger.info/warn/etc`. In this
PR, this logic has been refactored to more comply with the standard
`logging` practices. To do so, three new classes were added:
1. `ColorizingTextHandler` manages adding color to log records and
emitting to stream.
2. `DefaultFormatter` formats log records to be printed. Much of the
logic from `text_handler()` is here now.
3. `DefaultFilter` filters log records from being emitted based on level
and quietness settings.

Together, these classes should output identical logs to the current
logging implementation. I've tested this briefly, but further testing
would be good.

With these changes, it is much easier to add new `logHandlers` via
`logger.addHandler`. New handlers can decide how to format and filter
logs themselves, an important characteristic missing in the current
implementation that motivated this refactoring
(#2830 (comment)).

This would be a breaking change for `--log-handler-script`, as user
provided log handlers will need to subclass `logging.logHandler`.
Furthermore, logic for adding the user handler (and formatter/filter)
will need to be added.

As for the logging plug interface
(#2830 (comment)),
I'm not sure that is the route to take, but would appreciate thoughts on
this. First, this functionality is already largely provided by
`--log-handler-script` and adding a logging plugin would be redundant.
Second, I think that development of such plugins would likely be
minimal, and coordinated effort in this repo to maintain a good set of
logging options would be preferable. For example:
- A standard HTTP log handler send emit log records to a user supplied
address on which a 3rd party monitoring app is listening.
- A handler/formatter that uses
[rich](https://github.com/Textualize/rich) to make a
prettier/interactive console on which logs are written to.

I think both of these additions would benefit being standardized to
Snakemake, as well as default options to users.

Additionally I think it would be worthwhile to unify logging in
Snakemake. Currently, the use of the wrapper `logging.Logger` class and
`setup_logger()` function being called in multiple places makes it
difficult to work with the logger and is likely the source of some
issues (#3044, #2974, #2882, #2797, #2727). Unifying the logger and its
configuration as recommended by the [logging
docs](https://docs.python.org/3/howto/logging-cookbook.html#a-cli-application-starter-template)
would make it easier to work the the logger. Another option would be to
use something like
[loguru](https://loguru.readthedocs.io/en/stable/index.html). Whichever
option, I think simplifying and unifying the logger should be the next
step of this PR.

One main question I have is how much we would like to preserve the
current stderr output formatting. Most of logging logic is in formatting
rule info and such. Moving forward we can now try different formatting.
Personally, I find the current detailed rule output helpful for sanity
checks when developing and doing a dry run. However, when actually
running the workflow (especially a large one) the terminal is flooded
and errors can quickly be lost which requires searching the log file. Of
course this can be alleviated with --quiet.

Another question is regarding the two calls of setup_logger() brought up
in #2797. I still don't understand why these are necessary and
understanding this would be helpful in unifying this.

I'm happy to keep pushing this forward but would appreciate your
thoughts before continuing @johanneskoester! I


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new CLI option (--logger) for configuring custom loggers.
- Enhanced structured logging delivers clearer runtime and resource
insights, including rule graph logging.
	- Added a new method for logging rule graph information.
	- Added a new enumeration value `REASON` to the `Quietness` class.

- **Refactor**
- Overhauled the logging architecture across execution, scheduling, and
workflow components for improved traceability and error handling.

- **Chores**
- Updated dependencies and setup configurations to include new logger
plugins.
- Adjusted test configurations to capture all output, ensuring more
comprehensive test diagnostics.

- **Bug Fixes**
- Resolved minor command and formatting inconsistencies for smoother
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
Followup to snakemake#2830.

This PR attempts to make Snakemake's logging more modular to make room
for a potential logging plugin interface, or other logging/monitoring
additions, as well as make logging in Snakemake more standardized.

Currently, most of logging logic is deep within `logging.Logger`, mainly
in `text_handler()` with direct calls to `logger.info/warn/etc`. In this
PR, this logic has been refactored to more comply with the standard
`logging` practices. To do so, three new classes were added:
1. `ColorizingTextHandler` manages adding color to log records and
emitting to stream.
2. `DefaultFormatter` formats log records to be printed. Much of the
logic from `text_handler()` is here now.
3. `DefaultFilter` filters log records from being emitted based on level
and quietness settings.

Together, these classes should output identical logs to the current
logging implementation. I've tested this briefly, but further testing
would be good.

With these changes, it is much easier to add new `logHandlers` via
`logger.addHandler`. New handlers can decide how to format and filter
logs themselves, an important characteristic missing in the current
implementation that motivated this refactoring
(snakemake#2830 (comment)).

This would be a breaking change for `--log-handler-script`, as user
provided log handlers will need to subclass `logging.logHandler`.
Furthermore, logic for adding the user handler (and formatter/filter)
will need to be added.

As for the logging plug interface
(snakemake#2830 (comment)),
I'm not sure that is the route to take, but would appreciate thoughts on
this. First, this functionality is already largely provided by
`--log-handler-script` and adding a logging plugin would be redundant.
Second, I think that development of such plugins would likely be
minimal, and coordinated effort in this repo to maintain a good set of
logging options would be preferable. For example:
- A standard HTTP log handler send emit log records to a user supplied
address on which a 3rd party monitoring app is listening.
- A handler/formatter that uses
[rich](https://github.com/Textualize/rich) to make a
prettier/interactive console on which logs are written to.

I think both of these additions would benefit being standardized to
Snakemake, as well as default options to users.

Additionally I think it would be worthwhile to unify logging in
Snakemake. Currently, the use of the wrapper `logging.Logger` class and
`setup_logger()` function being called in multiple places makes it
difficult to work with the logger and is likely the source of some
issues (snakemake#3044, snakemake#2974, snakemake#2882, snakemake#2797, snakemake#2727). Unifying the logger and its
configuration as recommended by the [logging
docs](https://docs.python.org/3/howto/logging-cookbook.html#a-cli-application-starter-template)
would make it easier to work the the logger. Another option would be to
use something like
[loguru](https://loguru.readthedocs.io/en/stable/index.html). Whichever
option, I think simplifying and unifying the logger should be the next
step of this PR.

One main question I have is how much we would like to preserve the
current stderr output formatting. Most of logging logic is in formatting
rule info and such. Moving forward we can now try different formatting.
Personally, I find the current detailed rule output helpful for sanity
checks when developing and doing a dry run. However, when actually
running the workflow (especially a large one) the terminal is flooded
and errors can quickly be lost which requires searching the log file. Of
course this can be alleviated with --quiet.

Another question is regarding the two calls of setup_logger() brought up
in snakemake#2797. I still don't understand why these are necessary and
understanding this would be helpful in unifying this.

I'm happy to keep pushing this forward but would appreciate your
thoughts before continuing @johanneskoester! I


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new CLI option (--logger) for configuring custom loggers.
- Enhanced structured logging delivers clearer runtime and resource
insights, including rule graph logging.
	- Added a new method for logging rule graph information.
	- Added a new enumeration value `REASON` to the `Quietness` class.

- **Refactor**
- Overhauled the logging architecture across execution, scheduling, and
workflow components for improved traceability and error handling.

- **Chores**
- Updated dependencies and setup configurations to include new logger
plugins.
- Adjusted test configurations to capture all output, ensuring more
comprehensive test diagnostics.

- **Bug Fixes**
- Resolved minor command and formatting inconsistencies for smoother
operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
Co-authored-by: Johannes Köster <johannes.koester@uni-due.de>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants