Skip to content

fix: LoggerManager.shutdown() error handling, flush all loggers#4196

Merged
cademirch merged 4 commits into
snakemake:mainfrom
jlumpe:flush-logging-handlers
Jun 11, 2026
Merged

fix: LoggerManager.shutdown() error handling, flush all loggers#4196
cademirch merged 4 commits into
snakemake:mainfrom
jlumpe:flush-logging-handlers

Conversation

@jlumpe

@jlumpe jlumpe commented May 19, 2026

Copy link
Copy Markdown
Contributor

Minor addendum to #4137:

  • Also flush all loggers attached to manger.logger, extend tests to cover this case.
  • Wrap all calls to Handler.close() in try/except.
  • For non-plugin loggers (attached to manager.logger directly), acquire lock before flushing/closing. Not necessary for plugin loggers because they are all feed events through a QueueListener and its thread has already been shut down.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).
  • I, as a human being, have checked each line of code in this pull request

AI-assistance disclosure

I used AI assistance for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Summary by CodeRabbit

  • Bug Fixes

    • Improved logging shutdown so all log handlers are flushed and closed safely; failures in one handler no longer interrupt complete shutdown and teardown is more thread-safe.
  • Tests

    • Expanded tests to verify that all log handlers are flushed, closed, and receive expected records during shutdown.

@jlumpe jlumpe requested a review from johanneskoester as a code owner May 19, 2026 17:59
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The PR hardens LoggerManager.stop() to defensively flush and close both QueueListener-managed plugin handlers and global logger handlers with per-operation exception suppression and proper lock handling; tests were expanded to validate both handler paths are flushed, closed, and receive a log record.

Changes

Logging Shutdown Robustness

Layer / File(s) Summary
Defensive handler shutdown in LoggerManager.stop()
src/snakemake/logging.py
Stops the QueueListener if present, flushes and closes each plugin handler inside individual try/except blocks, then iterates global logger handlers acquiring each handler's lock, flushes (exceptions suppressed), removes the handler, closes it (exceptions suppressed), and releases the lock in a finally block.
Expanded regression test for handler cleanup validation
tests/test_logging.py
Test renamed to test_stop_closes_handlers; now creates distinct TrackingHandler instances for the QueueListener (plugin) and global logger (global) paths, routes a record through the QueueListener, and asserts both handlers were flushed, closed, and received exactly one record after LoggerManager.stop().

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description check ✅ Passed The description covers the key technical changes, includes all QC checklist items (checked or unchecked as appropriate), and discloses AI assistance. It provides context by referencing PR #4137 and explains the rationale for handler lock acquisition.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title directly addresses the main changes: improving error handling in LoggerManager.shutdown() and flushing all loggers, which aligns with the actual code modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_logging.py (1)

397-465: ⚡ Quick win

Cover the new failure-tolerance path too.

This only proves the happy path. The behavior added in LoggerManager.stop() is that one handler's flush() / close() failure must not abort shutdown, but no handler here raises, so the main hardening in this PR is still untested. Add a plugin or global handler variant that raises from flush() or close() and assert manager.stop() still completes and the other handler is cleaned up.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_logging.py` around lines 397 - 465, The test currently only covers
the success path; add a variant handler that raises from flush() or close() and
verify LoggerManager.stop() still completes and cleans up other handlers: create
a RaisingHandler subclass (override flush() and/or close() to raise an
exception), attach it as either the plugin handler (via manager.queue_listener
and QueueHandler) or as a global handler on test_logger alongside the existing
TrackingHandler, call manager.stop() and assert it does not raise and that the
non-raising handler was flushed/closed and received its record; keep the
existing try/finally to stop manager.queue_listener thread to avoid leaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/snakemake/logging.py`:
- Around line 762-770: The code currently acquires handler lock before calling
self.logger.removeHandler(handler), which can reverse the logging module's lock
order and deadlock; change the sequence so you first call
self.logger.removeHandler(handler) to detach the handler, then acquire the
handler lock (handler.acquire()), perform handler.flush() and handler.close()
inside that lock (with exception handling as needed), and finally release the
lock; ensure you still handle exceptions from flush/close and avoid holding the
handler lock while calling logging configuration APIs like removeHandler().

---

Nitpick comments:
In `@tests/test_logging.py`:
- Around line 397-465: The test currently only covers the success path; add a
variant handler that raises from flush() or close() and verify
LoggerManager.stop() still completes and cleans up other handlers: create a
RaisingHandler subclass (override flush() and/or close() to raise an exception),
attach it as either the plugin handler (via manager.queue_listener and
QueueHandler) or as a global handler on test_logger alongside the existing
TrackingHandler, call manager.stop() and assert it does not raise and that the
non-raising handler was flushed/closed and received its record; keep the
existing try/finally to stop manager.queue_listener thread to avoid leaks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3628960e-3890-4131-84d3-e844a96827c0

📥 Commits

Reviewing files that changed from the base of the PR and between d853243 and 6ffd67c.

📒 Files selected for processing (2)
  • src/snakemake/logging.py
  • tests/test_logging.py

Comment thread src/snakemake/logging.py
Comment on lines +762 to +770
try:
handler.acquire()

try:
handler.flush()
except Exception:
pass

self.logger.removeHandler(handler)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# Inspect the full context of LoggerManager.stop() method
cat -n src/snakemake/logging.py | head -800 | tail -100

Repository: snakemake/snakemake

Length of output: 4489


🏁 Script executed:

# Get exact line count to understand file structure
wc -l src/snakemake/logging.py

Repository: snakemake/snakemake

Length of output: 92


🏁 Script executed:

# Look for the LoggerManager class and stop method
rg -A 50 "class LoggerManager" src/snakemake/logging.py | head -70

Repository: snakemake/snakemake

Length of output: 1988


🏁 Script executed:

# Find the stop method specifically
rg -B 5 -A 50 "def stop\(self\)" src/snakemake/logging.py

Repository: snakemake/snakemake

Length of output: 1697


🌐 Web query:

Python logging module Handler.acquire() Logger.removeHandler() lock ordering deadlock

💡 Result:

The Python logging deadlock you’re pointing to is the classic lock-order inversion between: 1) the handler-level lock acquired by Handler.acquire (held while Logger calls Handler.emit), and 2) the logging module-level lock acquired by “configuration” / other logging APIs (e.g., dictConfig/fileConfig, Logger.manager-related operations). If one thread is already holding a handler lock and calls into an API path that tries to acquire the module lock, while another thread is running configuration that acquires the module lock first and then attempts to acquire handler locks, the two lock acquisition orders can deadlock. Where the ordering comes from (source-backed): - Each Handler instance has its own lock, and the logging module also uses a module-level lock; dictConfig/fileConfig take the module lock first, then take individual handler locks when clearing/configuring handlers. [1][2] - The logging docs explicitly warn about deadlocks “because that might result in a deadlock” when you call a logging configuration API from within the Handler method “after a handler-level lock is acquired”: config APIs acquire the module-level lock, while your current call path already holds the handler-level lock, yielding an inversion. [2] How Handler.acquire and Logger.handle relate: - Logger.handle conditionally acquires the handler lock, then calls Handler.emit(record) while the handler lock is held. [3][2] How Logger.removeHandler fits in: - removeHandler is a configuration-style operation that removes a handler from a Logger; the docs state that addHandler/removeHandler use locks to ensure thread-safe operation. [2] - Deadlocks are still possible if removeHandler (or any code path it triggers) is called concurrently in a way that participates in the same lock-order inversion with an emit/handler-lock-holding path. In other words, removeHandler itself is not “magic”; it’s the interaction between “handler lock held” vs “module lock acquired first” that matters. The ordering warning in the official docs is the key guidance for avoiding this. [2] Concrete failure mode described by CPython issue reports: - A CPython issue notes deadlocks when custom handlers (or dependencies invoked by a handler) call logging APIs during emit; emit holds the handler lock, and those logging calls may try to acquire the module lock. If another thread is doing configuration that acquires module lock then handler locks, you can deadlock. [1] What to do to avoid it: - Do not call logging configuration APIs (dictConfig/fileConfig, etc.) from inside handler code paths that run under a handler lock (i.e., from within emit/handle, or any code called transitively while the handler lock is held). The docs call this out directly. [2] - If you must reconfigure while logging is active, perform the configuration from a different place/control path where you are not holding handler locks; alternatively stop/serialize logging while reconfiguring. The CPython issue discusses this as a documented workaround strategy. [1] - For user code that removes handlers: perform removeHandler/addHandler under an external synchronization strategy so you don’t race with log emission while handler locks are held (the logging module is thread-safe in general, but lock ordering issues can still arise when you reconfigure from within handler logic). [2][1] Related nuance (also from CPython discussions): - Handler.emit is effectively expected to be called with the handler lock acquired, and calling other logging APIs within that context can create deadlock risk, depending on lock ordering. [4] So, succinct answer to your query terms: - Handler.acquire matters because it holds the handler lock during emit. - Logger.removeHandler matters because it’s a locked/configuration operation that can require the module lock. - The deadlock is produced by inconsistent lock ordering between module-lock-first vs handler-lock-first, exactly as the logging docs warn when configuration APIs are invoked from inside handler methods. [2][1]

Citations:


🏁 Script executed:

# Check Python version constraints in this repo to understand logging behavior
cat setup.py | head -50

Repository: snakemake/snakemake

Length of output: 917


🏁 Script executed:

# Look for python_requires or version constraints
rg -i "python_requires|version" setup.py | head -10

Repository: snakemake/snakemake

Length of output: 207


🏁 Script executed:

# Check if there are any comments in the code about logging thread safety
rg -B 3 -A 3 "lock|thread|deadlock" src/snakemake/logging.py | head -40

Repository: snakemake/snakemake

Length of output: 1204


Remove the handler before taking its lock.

Holding the handler lock across self.logger.removeHandler(handler) reverses the Python logging module's lock order and can deadlock if another thread is reconfiguring logging concurrently. The logging module acquires its configuration lock before handler locks; if you hold a handler lock and then call a configuration API like removeHandler(), while another thread holds the module lock and tries to acquire handler locks, the two threads deadlock. Detach the handler first, then serialize flush() and close() on the handler itself.

Suggested change
 for handler in list(self.logger.handlers):
+    self.logger.removeHandler(handler)
     try:
         handler.acquire()

         try:
             handler.flush()
         except Exception:
             pass

-        self.logger.removeHandler(handler)
-
         try:
             handler.close()
         except Exception:
             pass

     finally:
         handler.release()
🧰 Tools
🪛 Ruff (0.15.13)

[error] 767-768: try-except-pass detected, consider logging the exception

(S110)


[warning] 767-767: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/snakemake/logging.py` around lines 762 - 770, The code currently acquires
handler lock before calling self.logger.removeHandler(handler), which can
reverse the logging module's lock order and deadlock; change the sequence so you
first call self.logger.removeHandler(handler) to detach the handler, then
acquire the handler lock (handler.acquire()), perform handler.flush() and
handler.close() inside that lock (with exception handling as needed), and
finally release the lock; ensure you still handle exceptions from flush/close
and avoid holding the handler lock while calling logging configuration APIs like
removeHandler().

@cademirch cademirch self-requested a review May 24, 2026 20:16

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/snakemake/logging.py (1)

801-818: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Potential deadlock from lock-order inversion in handler teardown.

handler.acquire() is called before self.logger.removeHandler(handler). That sequence can invert logging lock order and deadlock under concurrent logging reconfiguration. Detach the handler first, then lock/flush/close it.

#!/bin/bash
# Verify lock/removal order in LoggerManager.stop()
rg -n -A20 -B5 'def stop\(self\)|handler\.acquire\(\)|removeHandler\(handler\)|handler\.release\(\)' src/snakemake/logging.py
Suggested minimal reorder
 for handler in list(self.logger.handlers):
-    try:
-        handler.acquire()
+    self.logger.removeHandler(handler)
+    try:
+        handler.acquire()

         try:
             handler.flush()
         except Exception:
             pass

-        self.logger.removeHandler(handler)
-
         try:
             handler.close()
         except Exception:
             pass

     finally:
         handler.release()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/snakemake/logging.py` around lines 801 - 818, In LoggerManager.stop (the
loop iterating over self.logger.handlers), avoid acquiring the handler lock
before removing it to prevent lock-order inversion: first call
self.logger.removeHandler(handler) to detach the handler from the logger, then
acquire the handler lock (handler.acquire()), perform flush() and close() inside
the lock with exceptions handled, and finally release the lock in the finally
block; keep the existing try/except around flush/close but reorder operations so
removeHandler happens before handler.acquire()/handler.release().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/snakemake/logging.py`:
- Around line 801-818: In LoggerManager.stop (the loop iterating over
self.logger.handlers), avoid acquiring the handler lock before removing it to
prevent lock-order inversion: first call self.logger.removeHandler(handler) to
detach the handler from the logger, then acquire the handler lock
(handler.acquire()), perform flush() and close() inside the lock with exceptions
handled, and finally release the lock in the finally block; keep the existing
try/except around flush/close but reorder operations so removeHandler happens
before handler.acquire()/handler.release().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c8c65852-7abc-4954-bfee-874e62bd2687

📥 Commits

Reviewing files that changed from the base of the PR and between d7de7f7 and a3f9b57.

📒 Files selected for processing (2)
  • src/snakemake/logging.py
  • tests/test_logging.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_logging.py

@cademirch cademirch changed the title Fix: LoggerManager.shutdown() error handling, flush all loggers fix: LoggerManager.shutdown() error handling, flush all loggers Jun 11, 2026
@cademirch cademirch merged commit 5bcdbf3 into snakemake:main Jun 11, 2026
88 of 91 checks passed
@jlumpe jlumpe deleted the flush-logging-handlers branch June 11, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants