Skip to content

queueHandler: use SimpleQueue rather than Queue#11373

Merged
michaelDCurran merged 3 commits into
masterfrom
i11369
Jul 23, 2020
Merged

queueHandler: use SimpleQueue rather than Queue#11373
michaelDCurran merged 3 commits into
masterfrom
i11369

Conversation

@michaelDCurran

Copy link
Copy Markdown
Member

Link to issue number:

Closes #11369

Summary of the issue:

It is possible for both NVDA and an app (such as Microsoft Word) to both deadlock when typing characters.
The process is as follows:

  1. The usertypes a character in the app.
  2. NVDA's in-process code detects the typed character in the app's main thread and calls back into NVDA via rpc.
  3. NVDA handles the incoming rpc call by queuing a typedCharacter event via eventHandler.queueEvent.
  4. While NVDA is queuing the event function with Queue.put:
    4.1. Python's garbage collector runs and or Python deletes objects that are no longer in scope.
    4.2. One or more of these objects are comtypes COM objects marshalled from the app's main thread that sent the typed character notification.
    4.3. Python calls Release on one of these COM objects which waits upon the app's main thread to release the object. Deadlock.

Even if this were addressed, a second deadlock is possible:

  1. The usertypes a character in the app.
  2. NVDA's in-process code detects the typed character in the app's main thread and calls back into NVDA via rpc.
  3. NVDA handles the incoming rpc call by queuing a typedCharacter event via eventHandler.queueEvent.
  4. queueEvent acquires a lock and increments counts for the event name etc so that eventHandler.isPendingEvents can tell there are events if it were to be called.
    4.1. During the lock, Python's garbage collector runs and or Python deletes objects that are no longer in scope.
    4.2. One or more of these objects are comtypes COM objects marshalled from the app's main thread that sent the typed character notification.
    4.3. Python calls Release on one of these COM objects which waits upon the app's main thread to release the object. Deadlock.

Description of changes

  • queueHandler now uses a queue.SimpleQueue rather than a queue.Queue. SimpleQueue does not use locking, is unbounded and has no task management. We never needed any of that functionality anyways.
  • nvdaHelper.nvdaControlerInternal_typedCharacterNotify now manually queues a call to executeEvent rather than calling queueEvent.
    Both of these changes ensure that no locks are entered during that particular incoming RPC call.

Testing performed

Tried to make MS Word / NVDA freeze by typing very fast into a document for at least 30 seconds. No freeze was experienced.

Known Issues

I don't believe there any drawbacks to these changes, though they may not address the issue as specifically as they could. The root of the problem still remains that

  1. It seems to be impossible to cancel an IUnknown::release COM call.
  2. The Python garbage collector and or out-of-scope deletion can occur at any time in any thread, and for comtypes, this may not be even the thread the COM object was marshalled to.

Changelog

Bug fixes:

  • NVDA should no longer sometimes freeze when rapidly typing in Microsoft Word documents.

…Queue is much more light-wait and reduces the chance of deadlocks.
…aracter notifications rather than using queueEvent as queueEvent may cause a deadlock with the app who sent the typed character notification if Python's garbage collector runs and releases a COM object from that process during queueEvent's lock.
Comment thread source/NVDAHelper.py
focus=api.getFocusObject()
if focus.windowClassName!="ConsoleWindowClass":
eventHandler.queueEvent("typedCharacter",focus,ch=ch)
# Manually queue a call to executeEvent rather than using queueEvent,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered either one of the following:

  1. Adding an additional kwarg to eventHandler.queueEvent that allows bypassing the pending events system. If that kwarg is semi private (e.g. by prefixing it with an underscore), it won't hurt anyone and it simplifies the API.
  2. Tune the behavior of the garbage collector, i.e. by disabling the automatic collection and running the garbage collector manually at certain intervals, within the core pump, etc. Alternatively, we could temporarily disable collection in situations that could cause deadlocks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bramd

bramd commented Jul 14, 2020

Copy link
Copy Markdown
Contributor

@michaelDCurran First of all, glad to see something is done about this long standing issue. From the description I gather this must be a hell to debug/pinpoint the core issue.

Definitely out of scope for this PR, but I guess Python's garbage collector and COM can cause way more hard to catch issues. Ever considered moving COM-interaction out of the Python code base and into a low-level language like C/C++?

@feerrenrut

Copy link
Copy Markdown
Contributor

More information about the introduction of SimpleQueue in Python 3.7: https://bugs.python.org/issue14976

Comment thread source/NVDAHelper.py
# As Currently queueEvent uses a lock and there is a small chance it could deadlock
# If garbage collection within the lock
# caused a COM object from the same thread as the typed character to be released.
queueHandler.queueFunction(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bypasses the pendingEvents mechanism. Perhaps it is worth investigating disabling the garbage collection?

@michaelDCurran

Copy link
Copy Markdown
Member Author

I think it is better to explicitly avoid queueEvent here, rather than changing queueEvent's functionality (even if based on a private argument) to bypass the pending events mechanism. Otherwise, queueEvent's functionality becomes harder to understand and predict.
Re garbage collection: I am currently working on another pr which greatly reduces the chance of reference cycles when interacting with Microsoft word, thus allowing Python to delete objects as soon as they go out of scope, rather than later on, possibly in the wrong thread. I agree that In the future, we should also consider managing calls to gc.collect and disabling automatic garbage collection.
I am going to merge this pr as is and deal with garbage collection separately.

@michaelDCurran michaelDCurran merged commit 45d7cc5 into master Jul 23, 2020
@michaelDCurran michaelDCurran deleted the i11369 branch July 23, 2020 08:52
@nvaccessAuto nvaccessAuto added this to the 2020.3 milestone Jul 23, 2020
@feerrenrut

Copy link
Copy Markdown
Contributor

So this means that specifically for typed character events, the pending events mechanism isn't going to be used anymore? Why not do the same of all types of events?

@LeonarddeR

Copy link
Copy Markdown
Collaborator

Pending events is used for Focus, Caret, ValueChange and TextChange, I believe. Especially caret handling will probably break in a major way if we'd remove that mechanism.

@feerrenrut

Copy link
Copy Markdown
Contributor

Prior to this PR it was also used for typed character events. I'd like to know why is this mechanism important for the events you listed, but not so for the typed character event. This now means that isPendingEvents will return False for "typedCharacter" events

@michaelDCurran

michaelDCurran commented Jul 23, 2020 via email

Copy link
Copy Markdown
Member Author

michaelDCurran added a commit that referenced this pull request Jul 23, 2020
…her than queueEvent (#11416)

* Revert "reduce chance of freeze when typing characters in MS Word (#11373)"

This reverts commit 45d7cc5.

* Fix linting issues

* Switch Queue to SimpleQueue

* Add comment
@michaelDCurran

Copy link
Copy Markdown
Member Author

I have reverted the typedCharacterNotify change in pr #11416, but kept the change from Queue to SimpleQueue. We can try and address the possible freeze in queueEvent when in typedCharacterNotify either with better garbage collection management, avoiding doing in-process typed character processing at all in Windows 10, or by somehow making the in-process part of typedCharacterNotify non-blocking.

@michaelDCurran michaelDCurran changed the title reduce chance of freeze when typing characters in MS Word queueHandler: use SimpleQueue rather than Queue Jul 23, 2020
@codeofdusk

Copy link
Copy Markdown
Contributor

avoiding doing in-process typed character processing at all in Windows 10

This might be worth considering (we already do this for terminals). Can you think of any drawbacks?

@michaelDCurran

michaelDCurran commented Jul 26, 2020 via email

Copy link
Copy Markdown
Member Author

@codeofdusk

Copy link
Copy Markdown
Contributor

newer versions of Windows 10

ToUnicodeEx got the new flag on Windows 10 1607. So all Windows versions released in the past four years.

@michaelDCurran

michaelDCurran commented Jul 26, 2020 via email

Copy link
Copy Markdown
Member Author

@codeofdusk

codeofdusk commented Jul 26, 2020

Copy link
Copy Markdown
Contributor

Correct. But not Windows 7, which we still have a lot of users for.

Yes, but are users of an end of life OS going to update their screen reader? Do you have stats on this? (I imagine probably hard to get them as due to security issues lots of Win7 machines are run exclusively offline)

@josephsl

josephsl commented Jul 26, 2020 via email

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVDA may sometimes freeze within Queue.put

7 participants