fix: LoggerManager.shutdown() error handling, flush all loggers#4196
Conversation
📝 WalkthroughWalkthroughThe PR hardens LoggerManager.stop() to defensively flush and close both QueueListener-managed plugin handlers and global logger handlers with per-operation exception suppression and proper lock handling; tests were expanded to validate both handler paths are flushed, closed, and receive a log record. ChangesLogging Shutdown Robustness
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_logging.py (1)
397-465: ⚡ Quick winCover the new failure-tolerance path too.
This only proves the happy path. The behavior added in
LoggerManager.stop()is that one handler'sflush()/close()failure must not abort shutdown, but no handler here raises, so the main hardening in this PR is still untested. Add a plugin or global handler variant that raises fromflush()orclose()and assertmanager.stop()still completes and the other handler is cleaned up.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_logging.py` around lines 397 - 465, The test currently only covers the success path; add a variant handler that raises from flush() or close() and verify LoggerManager.stop() still completes and cleans up other handlers: create a RaisingHandler subclass (override flush() and/or close() to raise an exception), attach it as either the plugin handler (via manager.queue_listener and QueueHandler) or as a global handler on test_logger alongside the existing TrackingHandler, call manager.stop() and assert it does not raise and that the non-raising handler was flushed/closed and received its record; keep the existing try/finally to stop manager.queue_listener thread to avoid leaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/snakemake/logging.py`:
- Around line 762-770: The code currently acquires handler lock before calling
self.logger.removeHandler(handler), which can reverse the logging module's lock
order and deadlock; change the sequence so you first call
self.logger.removeHandler(handler) to detach the handler, then acquire the
handler lock (handler.acquire()), perform handler.flush() and handler.close()
inside that lock (with exception handling as needed), and finally release the
lock; ensure you still handle exceptions from flush/close and avoid holding the
handler lock while calling logging configuration APIs like removeHandler().
---
Nitpick comments:
In `@tests/test_logging.py`:
- Around line 397-465: The test currently only covers the success path; add a
variant handler that raises from flush() or close() and verify
LoggerManager.stop() still completes and cleans up other handlers: create a
RaisingHandler subclass (override flush() and/or close() to raise an exception),
attach it as either the plugin handler (via manager.queue_listener and
QueueHandler) or as a global handler on test_logger alongside the existing
TrackingHandler, call manager.stop() and assert it does not raise and that the
non-raising handler was flushed/closed and received its record; keep the
existing try/finally to stop manager.queue_listener thread to avoid leaks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3628960e-3890-4131-84d3-e844a96827c0
📒 Files selected for processing (2)
src/snakemake/logging.pytests/test_logging.py
| try: | ||
| handler.acquire() | ||
|
|
||
| try: | ||
| handler.flush() | ||
| except Exception: | ||
| pass | ||
|
|
||
| self.logger.removeHandler(handler) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Inspect the full context of LoggerManager.stop() method
cat -n src/snakemake/logging.py | head -800 | tail -100Repository: snakemake/snakemake
Length of output: 4489
🏁 Script executed:
# Get exact line count to understand file structure
wc -l src/snakemake/logging.pyRepository: snakemake/snakemake
Length of output: 92
🏁 Script executed:
# Look for the LoggerManager class and stop method
rg -A 50 "class LoggerManager" src/snakemake/logging.py | head -70Repository: snakemake/snakemake
Length of output: 1988
🏁 Script executed:
# Find the stop method specifically
rg -B 5 -A 50 "def stop\(self\)" src/snakemake/logging.pyRepository: snakemake/snakemake
Length of output: 1697
🌐 Web query:
Python logging module Handler.acquire() Logger.removeHandler() lock ordering deadlock
💡 Result:
The Python logging deadlock you’re pointing to is the classic lock-order inversion between: 1) the handler-level lock acquired by Handler.acquire (held while Logger calls Handler.emit), and 2) the logging module-level lock acquired by “configuration” / other logging APIs (e.g., dictConfig/fileConfig, Logger.manager-related operations). If one thread is already holding a handler lock and calls into an API path that tries to acquire the module lock, while another thread is running configuration that acquires the module lock first and then attempts to acquire handler locks, the two lock acquisition orders can deadlock. Where the ordering comes from (source-backed): - Each Handler instance has its own lock, and the logging module also uses a module-level lock; dictConfig/fileConfig take the module lock first, then take individual handler locks when clearing/configuring handlers. [1][2] - The logging docs explicitly warn about deadlocks “because that might result in a deadlock” when you call a logging configuration API from within the Handler method “after a handler-level lock is acquired”: config APIs acquire the module-level lock, while your current call path already holds the handler-level lock, yielding an inversion. [2] How Handler.acquire and Logger.handle relate: - Logger.handle conditionally acquires the handler lock, then calls Handler.emit(record) while the handler lock is held. [3][2] How Logger.removeHandler fits in: - removeHandler is a configuration-style operation that removes a handler from a Logger; the docs state that addHandler/removeHandler use locks to ensure thread-safe operation. [2] - Deadlocks are still possible if removeHandler (or any code path it triggers) is called concurrently in a way that participates in the same lock-order inversion with an emit/handler-lock-holding path. In other words, removeHandler itself is not “magic”; it’s the interaction between “handler lock held” vs “module lock acquired first” that matters. The ordering warning in the official docs is the key guidance for avoiding this. [2] Concrete failure mode described by CPython issue reports: - A CPython issue notes deadlocks when custom handlers (or dependencies invoked by a handler) call logging APIs during emit; emit holds the handler lock, and those logging calls may try to acquire the module lock. If another thread is doing configuration that acquires module lock then handler locks, you can deadlock. [1] What to do to avoid it: - Do not call logging configuration APIs (dictConfig/fileConfig, etc.) from inside handler code paths that run under a handler lock (i.e., from within emit/handle, or any code called transitively while the handler lock is held). The docs call this out directly. [2] - If you must reconfigure while logging is active, perform the configuration from a different place/control path where you are not holding handler locks; alternatively stop/serialize logging while reconfiguring. The CPython issue discusses this as a documented workaround strategy. [1] - For user code that removes handlers: perform removeHandler/addHandler under an external synchronization strategy so you don’t race with log emission while handler locks are held (the logging module is thread-safe in general, but lock ordering issues can still arise when you reconfigure from within handler logic). [2][1] Related nuance (also from CPython discussions): - Handler.emit is effectively expected to be called with the handler lock acquired, and calling other logging APIs within that context can create deadlock risk, depending on lock ordering. [4] So, succinct answer to your query terms: - Handler.acquire matters because it holds the handler lock during emit. - Logger.removeHandler matters because it’s a locked/configuration operation that can require the module lock. - The deadlock is produced by inconsistent lock ordering between module-lock-first vs handler-lock-first, exactly as the logging docs warn when configuration APIs are invoked from inside handler methods. [2][1]
Citations:
- 1: logging deadlock when
dictConfigorfileConfigused concurrently with custom handlers python/cpython#96727 - 2: https://docs.python.org/3/library/logging.html
- 3: https://bugs.python.org/issue25668
- 4: https://bugs.python.org/msg339475
🏁 Script executed:
# Check Python version constraints in this repo to understand logging behavior
cat setup.py | head -50Repository: snakemake/snakemake
Length of output: 917
🏁 Script executed:
# Look for python_requires or version constraints
rg -i "python_requires|version" setup.py | head -10Repository: snakemake/snakemake
Length of output: 207
🏁 Script executed:
# Check if there are any comments in the code about logging thread safety
rg -B 3 -A 3 "lock|thread|deadlock" src/snakemake/logging.py | head -40Repository: snakemake/snakemake
Length of output: 1204
Remove the handler before taking its lock.
Holding the handler lock across self.logger.removeHandler(handler) reverses the Python logging module's lock order and can deadlock if another thread is reconfiguring logging concurrently. The logging module acquires its configuration lock before handler locks; if you hold a handler lock and then call a configuration API like removeHandler(), while another thread holds the module lock and tries to acquire handler locks, the two threads deadlock. Detach the handler first, then serialize flush() and close() on the handler itself.
Suggested change
for handler in list(self.logger.handlers):
+ self.logger.removeHandler(handler)
try:
handler.acquire()
try:
handler.flush()
except Exception:
pass
- self.logger.removeHandler(handler)
-
try:
handler.close()
except Exception:
pass
finally:
handler.release()🧰 Tools
🪛 Ruff (0.15.13)
[error] 767-768: try-except-pass detected, consider logging the exception
(S110)
[warning] 767-767: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/snakemake/logging.py` around lines 762 - 770, The code currently acquires
handler lock before calling self.logger.removeHandler(handler), which can
reverse the logging module's lock order and deadlock; change the sequence so you
first call self.logger.removeHandler(handler) to detach the handler, then
acquire the handler lock (handler.acquire()), perform handler.flush() and
handler.close() inside that lock (with exception handling as needed), and
finally release the lock; ensure you still handle exceptions from flush/close
and avoid holding the handler lock while calling logging configuration APIs like
removeHandler().
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/snakemake/logging.py (1)
801-818:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPotential deadlock from lock-order inversion in handler teardown.
handler.acquire()is called beforeself.logger.removeHandler(handler). That sequence can invert logging lock order and deadlock under concurrent logging reconfiguration. Detach the handler first, then lock/flush/close it.#!/bin/bash # Verify lock/removal order in LoggerManager.stop() rg -n -A20 -B5 'def stop\(self\)|handler\.acquire\(\)|removeHandler\(handler\)|handler\.release\(\)' src/snakemake/logging.pySuggested minimal reorder
for handler in list(self.logger.handlers): - try: - handler.acquire() + self.logger.removeHandler(handler) + try: + handler.acquire() try: handler.flush() except Exception: pass - self.logger.removeHandler(handler) - try: handler.close() except Exception: pass finally: handler.release()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/snakemake/logging.py` around lines 801 - 818, In LoggerManager.stop (the loop iterating over self.logger.handlers), avoid acquiring the handler lock before removing it to prevent lock-order inversion: first call self.logger.removeHandler(handler) to detach the handler from the logger, then acquire the handler lock (handler.acquire()), perform flush() and close() inside the lock with exceptions handled, and finally release the lock in the finally block; keep the existing try/except around flush/close but reorder operations so removeHandler happens before handler.acquire()/handler.release().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/snakemake/logging.py`:
- Around line 801-818: In LoggerManager.stop (the loop iterating over
self.logger.handlers), avoid acquiring the handler lock before removing it to
prevent lock-order inversion: first call self.logger.removeHandler(handler) to
detach the handler from the logger, then acquire the handler lock
(handler.acquire()), perform flush() and close() inside the lock with exceptions
handled, and finally release the lock in the finally block; keep the existing
try/except around flush/close but reorder operations so removeHandler happens
before handler.acquire()/handler.release().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c8c65852-7abc-4954-bfee-874e62bd2687
📒 Files selected for processing (2)
src/snakemake/logging.pytests/test_logging.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_logging.py
Minor addendum to #4137:
manger.logger, extend tests to cover this case.Handler.close()intry/except.manager.loggerdirectly), acquire lock before flushing/closing. Not necessary for plugin loggers because they are all feed events through aQueueListenerand its thread has already been shut down.QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).AI-assistance disclosure
I used AI assistance for:
Summary by CodeRabbit
Bug Fixes
Tests