Skip to content

Join remaining non-daemon threads before exiting NVDA#16934

Merged
seanbudd merged 4 commits into
masterfrom
waitOnBrailleAutoDetect
Aug 1, 2024
Merged

Join remaining non-daemon threads before exiting NVDA#16934
seanbudd merged 4 commits into
masterfrom
waitOnBrailleAutoDetect

Conversation

@michaelDCurran

@michaelDCurran michaelDCurran commented Jul 31, 2024

Copy link
Copy Markdown
Member

Link to issue number:

Fixes #16933

Summary of the issue:

If any non-daemon threads remain when NvDA restarts, the new NVDA instance can be very unstable, including freezing and lack of functionality.

Description of user facing changes

NVDA is no longer unstable after restarting NvDA during a Braille Bluetooth scan.

Description of development approach

Before releasing the mutex in nvda.pyw, log and join all remaining non-daemon threads.

Testing strategy:

Ran steps in #16933, ensuring that the new instance of NvDA is not unstable.

Known issues with pull request:

This addresses the minimum problem of non-daemon threads remaining, such as the Braille display auto detection thread. However, issue #16933 also details other known problems that can occur if an old NvDA process remains when a new one starts. This should no longer ever be possible, but it is worth considering those other issues at some point.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

Summary by CodeRabbit

  • New Features

    • Enhanced threading management for improved application stability and reliability during process termination.
  • Bug Fixes

    • Resolved potential issues with lingering threads that could interfere with application operations.

…ning non-daemon threads. In a perfect world there should be none, but currently there is one if Braile auto detection is in use. If we don't join here, our process will remain active after the mutex is released until that thread completes, which causes any new instance of NvDA started before the process dies, to be very unstable.
@michaelDCurran michaelDCurran requested a review from a team as a code owner July 31, 2024 07:54
@coderabbitai

coderabbitai Bot commented Jul 31, 2024

Copy link
Copy Markdown
Contributor

Walkthrough

The recent changes enhance the threading management in the source/nvda.pyw file, ensuring all non-daemon threads are properly handled during the application's exit process. This prevents potential issues with lingering threads, particularly during Bluetooth scanning for braille displays, thereby improving the stability and reliability of NvDA when restarting.

Changes

Files Change Summary
source/nvda.pyw Added import threading and implemented logging and joining of non-daemon threads during cleanup.

Assessment against linked issues

Objective Addressed Explanation
Ensure all threads are handled on exit to prevent instability during restarts (#16933)
Prevent lingering threads from blocking new instances during Bluetooth scanning (#16933)
Transition background threads to daemon status for improved cleanup on exit (#16933) Uncertainty exists whether all threads have been rewritten as daemon threads.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Comment thread source/nvda.pyw Outdated
Comment on lines +537 to +547
# Log any remaining background threads
# In a perfect world there should be none.
# join on any non-daemon non-dummy thread here,
# Before releasing our mutex, otherwise this process may continue running after the mutex is released.
# This would cause issues for rpc / nvdaHelper.
for thr in threading.enumerate():
if not thr.daemon and thr is not threading.current_thread():
log.info(f"Waiting on {thr}...")
thr.join()
log.info(f"Thread {thr.name} complete")

@coderabbitai coderabbitai Bot Jul 31, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properly handling non-daemon threads before exiting.

The logic for logging and joining non-daemon threads before releasing the mutex ensures that all threads are properly handled, preventing potential issues with lingering threads.

However, consider adding a timeout to the thr.join() call to avoid potential indefinite blocking if a thread does not terminate as expected.

-			thr.join()
+			thr.join(timeout=10)
+			if thr.is_alive():
+				log.warning(f"Thread {thr.name} did not terminate within the timeout period.")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Log any remaining background threads
# In a perfect world there should be none.
# join on any non-daemon non-dummy thread here,
# Before releasing our mutex, otherwise this process may continue running after the mutex is released.
# This would cause issues for rpc / nvdaHelper.
for thr in threading.enumerate():
if not thr.daemon and thr is not threading.current_thread():
log.info(f"Waiting on {thr}...")
thr.join()
log.info(f"Thread {thr.name} complete")
for thr in threading.enumerate():
if not thr.daemon and thr is not threading.current_thread():
log.info(f"Waiting on {thr}...")
thr.join(timeout=10)
if thr.is_alive():
log.warning(f"Thread {thr.name} did not terminate within the timeout period.")
else:
log.info(f"Thread {thr.name} complete")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, timing out here would simply allow the original problem to occur. The only acceptable alternative here would be to somehow kill off the thread after the timeout with something like TerminateThread, though I feel that this just adds more unpredictability and possible instability. If we timeout and leave the thread alive, the NVDA process will still stay around anyway, with the disadvantage that the mutex is released and things become unstable for any new instance started before this process really exits.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelDCurran, you make a valid point. Timing out on thread joins would indeed leave the original problem unresolved and could introduce further instability. Ensuring that all non-daemon threads are properly joined before releasing the mutex is the most stable approach. Thank you for the clarification.


Learnings added
Learnt from: michaelDCurran
PR: nvaccess/nvda#16934
File: source/nvda.pyw:0-0
Timestamp: 2024-07-31T08:16:00.130Z
Learning: When handling non-daemon threads before exiting NVDA, it is crucial to join all threads without a timeout to avoid instability and ensure that the process exits cleanly.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@seanbudd seanbudd merged commit 73f7246 into master Aug 1, 2024
@seanbudd seanbudd deleted the waitOnBrailleAutoDetect branch August 1, 2024 00:39
seanbudd pushed a commit that referenced this pull request Aug 2, 2024
…16944)

Improves fix for #16933
Improves upon pr #16934

Summary of the issue:
In PR #16934, it was ensured that NVDA's mutex would not be released until all remaining non-daemon threads were joined and completed. Otherwise, the NVDA process may stay around because of remaining background threads, such as the Braille auto detector worker thread.
However, the joining of the threads was done after NVDA's message window was destroyed, therefore making it impossible for a new instance of NVDA to locate and kill off the old NVDA if it truly was taking way too long.

Description of user facing changes
An old NvDA has more chance of being killed off if it is taking too long to exit when a new copy of NVDA is trying to start.

Description of development approach
Move the joining of the non-daemon threads out of nvda.pyw, and into the bottom of core.main. Also ensure that destroying the message window is the very last action taken. So the ordering of the end of core.main is now:

terminate all subsystems
Join threads
destroy the message window.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If NvDA is restarted during a braille display auto detection, the new NVDA instance may be very unstable

2 participants