Ensure that all in-process log calls never block an app's threads by michaelDCurran · Pull Request #11431 · nvaccess/nvda

michaelDCurran · 2020-07-28T07:35:56Z

Link to issue number:

Summary of the issue:

While NVDA is running code within other processes (such as for visutlaBuffers, displayModels and other helper functions), it occasionally logs messages back to NvDA for debugging purposes. Log messages are sent to NvDA via rpc. But currently these rpc calls block while the log message is being sent to NVDA, which may mean that an app's main thread could block for a time or in the worst case deadlock with NvDA, if the Python garbage collector in NVDA happens to run within the incoming rpc call and happens to release a COM object for the thread the log call originated from.

Description of how this pull request fixes the issue:

logMessage (in NVDAHelperRemote) will now do the following:

First it will log the message via the Windows debugging API (so that it shows in a debugger if conected). Previously this was done after the log message was sent to NVDA.
If NVDA's inproc manager thread is not yet running, or the current thread is not the inproc manager thread, then the log message is put in a queue for later retrieval in NVDA's inproc manager thread, and an APC function is queued to the inproc manager thread for later fetching and sending to NVDA.
If the current thread is the inproc manager thread, then any previously queued log messages are fetched and sent to NVDA, then the current log message is sent to NVDA. We deliberately process any previously queued messages before sending the current message to ensure that log messages remain in the correct order.

Just before the inproc manager thread finally exits, we also process any queued log messages to catch any queued after the manager thread main loop completed.

Testing performed:

Ran NVDA, interacting with Microsoft Word, Notepad and the Run dialog.
Added some temporary log calls in various places in the in-process code, such as dllMain, and when handing typed characters in the app's main thread, to ensure that message ordering is correct, and that log calls were always happening from the manager thread.

Known issues with pull request:

None.

Change log entry:

Bug fixes:

NVDA should no longer sometimes freeze in Microsoft Word when rapidly arrowing up and down.

… is initialized.

LeonarddeR

If i'm correct, we no longer limit inproc log messages to a level higher than debug, right? Is this intentional?

michaelDCurran · 2020-07-28T10:16:49Z

@LeonarddeR Ah yes, I should have mentioned I removed the limitation around not logging debug level to NVDA. This limitation was originally there as there are so many debug calls and they would majorly slow down an app's main thread (say when debuggind displayModel) as each call would contact NvDA from that thread. Now as it is in a separate thread it shouldn't be so bad. However, note that by default (and therefore in all official builds of NVDA) the nvdaHelperLogLevel is lower than debug, and therefore the LOG_DEBUG macro is a no-op anyway. In other words, even now, to get messages at level debug, you still need to compile NvDA with nvdaHelperLogLevel at debug or higher.

michaelDCurran · 2020-07-28T10:17:16Z

@LeonarddeR I have fixed the indentation now.

michaelDCurran · 2020-07-28T21:43:23Z

@feerrenrut thanks for the useful review. I think the main reason why I had maxCount as an argument in the first place came from the fact I was trying to avoid an extra function and using log_flushQueue also as the apc function.
I have refactored things based on your feedback. Now:

log_flushQueue no longer takes a maxCount argument. It just flushes the number of items available in the queue. Note though it does still only call queue.size once at the beginning, one wouldn't want the possibility that another thread kept adding to the queue while flushQueue was fetching and it just never finished.
There is now a dedicated apc function which just calls log_flushQueue.
log_terminate has been removed and log_flushQueue is used instead.
log_flushQueue is also now called near the top of the manager thread once to fetch any previously queued log messages before the manager thread was started.
I hope this addresses all of your feedback.

feerrenrut · 2020-07-29T07:35:03Z

Yes, these changes look good.

Note though it does still only call queue.size once at the beginning, one wouldn't want the possibility that another thread kept adding to the queue while flushQueue was fetching and it just never finished.

I need to look into how APC works to really make assertions around this, but if these two functions can run concurrently then I'm concerned that there should be concurrency control on on this deque. Check the thread safety header on the following page also: https://en.cppreference.com/w/cpp/container

I'll come back to this.

michaelDCurran · 2020-07-29T08:31:35Z

We can be sure that two APC functions of ours will never run at the same time, as they are all queued to the same thread. But, certainly two threads could push to the queue at the same time, or a thread could push, and and the manager thread could pop at the same time. I always assumed that c++ containers (at least Windows implementations of them) had a lock where necessary, so that (in the case of deque) push_back and pop_front could never run concurrently, but I could be wrong.

feerrenrut · 2020-07-29T11:49:19Z

certainly two threads could push to the queue at the same time

I think we will have to protect this deque from this. See the detailed rules under the "thread safety" header on the following page also: https://en.cppreference.com/w/cpp/container

The gist of it is that reading from the container (eg const member functions) are fine, modifying the container not so.

feerrenrut · 2020-07-29T12:19:28Z

There seems to be two thread safety problems:

Flush can happen at the same time as items being added, these both modify the queue.
Two threads could add items to the queue at the same time.

To prevent this we need to introduce locks. Preventing multiple threads from modifying the queue. The main concern is to watch for deadlocks, though data races should be kept in mind. By ensuring we only hold a lock while making changes to the queue, we can prevent deadlocks. The trick to this is localizing the code that needs a lock.

So, I see two options:

Use two queues (it may no longer make sense to use deque specifically, but that is another concern). Have an active receive queue, and an inactive queue. When flush is called a lock is acquired just while they are swapped. Flush then empties the inactive queue. The same lock is acquired in log_message to ensure the active / inactive queues aren't swapped during the insertion/append/push.
Continue with a single queue, but use a lock to protect during insert, and during the flush. While you could lock for each call to pop in flush, I'm not really comfortable to recommend it. Instead, after carefully considering the thread safety rules for STL containers, it might be ok to iterate through the elements logging them (up to maxCount) then lock and remove them in one call.

Assumptions:

Flush is only executed on the inproc Manager Thread, never concurrently. It would be good if we could ensure this by checking at the start of flush, and if not queuing it with APC and exiting early.

michaelDCurran · 2020-07-29T22:40:44Z

@feerrenrut in log_flushQueue I now lock and swap. And in logMessage I now also lock.

feerrenrut

I think this generally looks good. Just a couple of small questions, minor suggestions.

…ally, just let them all get destroied when the queue is destroied. Co-authored-by: Reef Turner <feerrenrut@users.noreply.github.com>

…r thread, queue the call back to the manager thread with apc. Co-authored-by: Reef Turner <feerrenrut@users.noreply.github.com>

AppVeyorBot · 2020-07-30T09:31:47Z

Build (for testing PR)

See test results for failed build of commit 5809d3a675

* Only try to queue a call to log_flushQueue via APC if the manager thread actually exists yet. * Declare log_flushQueue_apcFunc early so it actualy compiles :) * Remove a redundant blank line.

michaelDCurran · 2020-07-30T09:48:33Z

@feerrenrut I committed your suggestions but made a couple of small changes: We can only queue via APC if the manager thread actually exists yet. And log_flushQueue_apcFunc needed to be declared first. Does this all look okay now?

feerrenrut

Thanks @michaelDCurran

michaelDCurran added 2 commits July 28, 2020 17:08

Ensure that all in-process log calls never block an app's threads.

3a053b9

nvdaHelperRemote logMessage: only queue via APC if the manager thread…

fdebc32

… is initialized.

michaelDCurran requested a review from feerrenrut July 28, 2020 07:53

LeonarddeR previously approved these changes Jul 28, 2020

View reviewed changes

Comment thread nvdaHelper/remote/log.cpp Outdated

Fix indentation

008b745

feerrenrut suggested changes Jul 28, 2020

View reviewed changes

Comment thread nvdaHelper/remote/log.cpp Outdated

Comment thread nvdaHelper/remote/log.cpp Outdated

Comment thread nvdaHelper/remote/log.cpp Outdated

Comment thread nvdaHelper/remote/log.cpp

nvdaHelper logging support: refactor for readability.

4a235b2

nvdaHelper logging: lock the log queue when inserting / fetching.

1e32a90

feerrenrut reviewed Jul 30, 2020

View reviewed changes

Comment thread nvdaHelper/remote/log.cpp

Comment thread nvdaHelper/remote/log.cpp Outdated

michaelDCurran and others added 2 commits July 30, 2020 19:27

nvdaHelper logging: rather than popping items from tempQueue individu…

a93c004

…ally, just let them all get destroied when the queue is destroied. Co-authored-by: Reef Turner <feerrenrut@users.noreply.github.com>

nvdaHelper logging: if log_flushQueue is called outside of the manage…

04a2ed2

…r thread, queue the call back to the manager thread with apc. Co-authored-by: Reef Turner <feerrenrut@users.noreply.github.com>

nvdaHelper logging:

ac282a8

* Only try to queue a call to log_flushQueue via APC if the manager thread actually exists yet. * Declare log_flushQueue_apcFunc early so it actualy compiles :) * Remove a redundant blank line.

feerrenrut previously approved these changes Jul 30, 2020

View reviewed changes

Update What's new

8de91b2

michaelDCurran dismissed stale reviews from feerrenrut and LeonarddeR via 8de91b2 July 30, 2020 21:12

michaelDCurran merged commit 10976a7 into master Jul 30, 2020

michaelDCurran deleted the asyncInprocLogging branch July 30, 2020 21:34

nvaccessAuto added this to the 2020.3 milestone Jul 30, 2020

Adriani90 mentioned this pull request Aug 2, 2020

NVDA exits automatically in MS Word when working with this document #9463

Closed

AAClause mentioned this pull request Sep 13, 2020

Crash with Microsoft Word 365 #11600

Closed

LeonarddeR mentioned this pull request Oct 19, 2020

Convert NVDAHelper LocalWin10 to C++/WinRT #11768

Merged

20 tasks

Uh oh!

Conversation

michaelDCurran commented Jul 28, 2020

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

Uh oh!

LeonarddeR left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelDCurran commented Jul 28, 2020

Uh oh!

michaelDCurran commented Jul 28, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelDCurran commented Jul 28, 2020

Uh oh!

feerrenrut commented Jul 29, 2020

Uh oh!

michaelDCurran commented Jul 29, 2020 via email

Uh oh!

feerrenrut commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feerrenrut commented Jul 29, 2020

Uh oh!

michaelDCurran commented Jul 29, 2020

Uh oh!

feerrenrut left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AppVeyorBot commented Jul 30, 2020

Uh oh!

michaelDCurran commented Jul 30, 2020

Uh oh!

feerrenrut left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feerrenrut commented Jul 29, 2020 •

edited

Loading