Join remaining non-daemon threads before exiting NVDA#16934
Conversation
…ning non-daemon threads. In a perfect world there should be none, but currently there is one if Braile auto detection is in use. If we don't join here, our process will remain active after the mutex is released until that thread completes, which causes any new instance of NvDA started before the process dies, to be very unstable.
WalkthroughThe recent changes enhance the threading management in the Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
| # Log any remaining background threads | ||
| # In a perfect world there should be none. | ||
| # join on any non-daemon non-dummy thread here, | ||
| # Before releasing our mutex, otherwise this process may continue running after the mutex is released. | ||
| # This would cause issues for rpc / nvdaHelper. | ||
| for thr in threading.enumerate(): | ||
| if not thr.daemon and thr is not threading.current_thread(): | ||
| log.info(f"Waiting on {thr}...") | ||
| thr.join() | ||
| log.info(f"Thread {thr.name} complete") | ||
|
|
There was a problem hiding this comment.
Properly handling non-daemon threads before exiting.
The logic for logging and joining non-daemon threads before releasing the mutex ensures that all threads are properly handled, preventing potential issues with lingering threads.
However, consider adding a timeout to the thr.join() call to avoid potential indefinite blocking if a thread does not terminate as expected.
- thr.join()
+ thr.join(timeout=10)
+ if thr.is_alive():
+ log.warning(f"Thread {thr.name} did not terminate within the timeout period.")Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Log any remaining background threads | |
| # In a perfect world there should be none. | |
| # join on any non-daemon non-dummy thread here, | |
| # Before releasing our mutex, otherwise this process may continue running after the mutex is released. | |
| # This would cause issues for rpc / nvdaHelper. | |
| for thr in threading.enumerate(): | |
| if not thr.daemon and thr is not threading.current_thread(): | |
| log.info(f"Waiting on {thr}...") | |
| thr.join() | |
| log.info(f"Thread {thr.name} complete") | |
| for thr in threading.enumerate(): | |
| if not thr.daemon and thr is not threading.current_thread(): | |
| log.info(f"Waiting on {thr}...") | |
| thr.join(timeout=10) | |
| if thr.is_alive(): | |
| log.warning(f"Thread {thr.name} did not terminate within the timeout period.") | |
| else: | |
| log.info(f"Thread {thr.name} complete") |
There was a problem hiding this comment.
Unfortunately, timing out here would simply allow the original problem to occur. The only acceptable alternative here would be to somehow kill off the thread after the timeout with something like TerminateThread, though I feel that this just adds more unpredictability and possible instability. If we timeout and leave the thread alive, the NVDA process will still stay around anyway, with the disadvantage that the mutex is released and things become unstable for any new instance started before this process really exits.
There was a problem hiding this comment.
@michaelDCurran, you make a valid point. Timing out on thread joins would indeed leave the original problem unresolved and could introduce further instability. Ensuring that all non-daemon threads are properly joined before releasing the mutex is the most stable approach. Thank you for the clarification.
Learnings added
Learnt from: michaelDCurran
PR: nvaccess/nvda#16934
File: source/nvda.pyw:0-0
Timestamp: 2024-07-31T08:16:00.130Z
Learning: When handling non-daemon threads before exiting NVDA, it is crucial to join all threads without a timeout to avoid instability and ensure that the process exits cleanly.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
…16944) Improves fix for #16933 Improves upon pr #16934 Summary of the issue: In PR #16934, it was ensured that NVDA's mutex would not be released until all remaining non-daemon threads were joined and completed. Otherwise, the NVDA process may stay around because of remaining background threads, such as the Braille auto detector worker thread. However, the joining of the threads was done after NVDA's message window was destroyed, therefore making it impossible for a new instance of NVDA to locate and kill off the old NVDA if it truly was taking way too long. Description of user facing changes An old NvDA has more chance of being killed off if it is taking too long to exit when a new copy of NVDA is trying to start. Description of development approach Move the joining of the non-daemon threads out of nvda.pyw, and into the bottom of core.main. Also ensure that destroying the message window is the very last action taken. So the ordering of the end of core.main is now: terminate all subsystems Join threads destroy the message window.
Link to issue number:
Fixes #16933
Summary of the issue:
If any non-daemon threads remain when NvDA restarts, the new NVDA instance can be very unstable, including freezing and lack of functionality.
Description of user facing changes
NVDA is no longer unstable after restarting NvDA during a Braille Bluetooth scan.
Description of development approach
Before releasing the mutex in nvda.pyw, log and join all remaining non-daemon threads.
Testing strategy:
Ran steps in #16933, ensuring that the new instance of NvDA is not unstable.
Known issues with pull request:
This addresses the minimum problem of non-daemon threads remaining, such as the Braille display auto detection thread. However, issue #16933 also details other known problems that can occur if an old NvDA process remains when a new one starts. This should no longer ever be possible, but it is worth considering those other issues at some point.
Code Review Checklist:
Summary by CodeRabbit
New Features
Bug Fixes