Skip to content

Protect functions from using terminated OneCore instances#13569

Closed
seanbudd wants to merge 11 commits into
betafrom
preventOneCoreConfigCrash
Closed

Protect functions from using terminated OneCore instances#13569
seanbudd wants to merge 11 commits into
betafrom
preventOneCoreConfigCrash

Conversation

@seanbudd

@seanbudd seanbudd commented Apr 1, 2022

Copy link
Copy Markdown
Member

Link to issue number:

N/A

Summary of the issue:

NVDA can crash when resetting the configuration to factory defaults.
This is because NVDA terminates OneCore when resetting the config.
However, a race condition exists with the async function speak, where the python callback uses a terminated OneCore instance.

Steps to reproduce:

  1. Run NVDA
  2. In the Keyboard section of the Preferences dialog, turn "Speak command keys" on.
  3. Reset your configuration to factory defaults
  4. Note that NVDA crashes

Description of how this pull request fixes the issue:

Protects usages of the OneCore instance by confirming the instance has not been terminated.
Wait until speech callbacks are completed before re-initializing.

Testing strategy:

Manual testing

  1. Run NVDA
  2. In the Keyboard section of the Preferences dialog, turn "Speak command keys" on.
  3. Reset your configuration to factory defaults
  4. Note that NVDA does not crash.

Known issues with pull request:

Other, somewhat related bugs, are logged as Errors during this process. See: #13580

Change log entries:

Bug fixes

- NVDA no longer crashes when resetting the NVDA configuration to factory defaults while speak command keys is on. (#13569)

Code Review Checklist:

  • Pull Request description:
    • description is up to date
    • change log entries
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • API is compatible with existing add-ons.
  • Documentation:
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English

@seanbudd seanbudd requested a review from a team as a code owner April 1, 2022 00:24
@seanbudd seanbudd requested review from feerrenrut and michaelDCurran and removed request for a team April 1, 2022 00:24
@seanbudd seanbudd changed the title Prevent one core config crash Protect functions from using terminated OneCore instances Apr 1, 2022
@seanbudd seanbudd marked this pull request as draft April 1, 2022 00:32
@seanbudd seanbudd self-assigned this Apr 1, 2022
@LeonarddeR

Copy link
Copy Markdown
Collaborator

Is it necessary to apply a similar approach to the OCR code?

@seanbudd

seanbudd commented Apr 4, 2022

Copy link
Copy Markdown
Member Author

Is it necessary to apply a similar approach to the OCR code?

Maybe, but for a given handle uwpOcr_initialize and uwpOcr_terminate are always called sequentially. See uwpOcr.UwpOcr.recognize. I don't think you can reproduce a crash with OCR, at least from NVDA core.

Edit: Actually, this could be the cause of #9472 and #13189

@seanbudd seanbudd marked this pull request as ready for review April 4, 2022 04:51
@AppVeyorBot

This comment was marked as off-topic.

@AppVeyorBot

This comment was marked as off-topic.

@seanbudd seanbudd removed the request for review from michaelDCurran April 5, 2022 03:53
@seanbudd seanbudd modified the milestones: 2022.2, 2022.1 Apr 5, 2022
@AppVeyorBot

This comment was marked as off-topic.

@AppVeyorBot

This comment was marked as off-topic.

@AppVeyorBot

This comment was marked as off-topic.

@seanbudd seanbudd marked this pull request as draft April 8, 2022 03:22
@AppVeyorBot

This comment was marked as off-topic.

@seanbudd seanbudd marked this pull request as ready for review April 8, 2022 05:35
@AppVeyorBot

Copy link
Copy Markdown
  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit 4b1193ab51

using namespace winrt::Windows::Foundation::Collections;
using winrt::Windows::Foundation::Metadata::ApiInformation;

std::recursive_mutex InstanceManager::_instanceStateMutex;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it is only waitUntilReadyForInitialization / initializeNewInstance that requires this to be recursive?

Comment on lines +48 to +54
/*
When initialized, increases the count of active speech threads
waiting on a callback.
When exiting the scope of initialization, decreases the count
of active speech threads and deletes the OneCore instance if it has
been terminated and no speech callbacks are waiting.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ docs (like python doc strings) are typically written on the line before the declaration, rather than within the definition.

It is also helpful to have reasoning for the implementation within the definition.

Comment on lines +57 to +59
class InstanceManager {
private:
static std::recursive_mutex _instanceStateMutex;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and SpeakThreadGuard don't need to be in this header. They aren't used outside of the oneCoreSpeech.cpp compilation unit, and don't need to have their compilation tied to oneCoreSpeech. We'll likely need them separate for OCR.

Waits until an instance has been fully terminated,
and then (once speech callbacks have finished) deleted.
*/
std::lock_guard g(_instanceStateMutex);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can cause a deadlock. This is acquiring the lock potentially while there is an active SpeakThreadGuard due to OcSpeech::speak which can't terminate until it acquires the lock to delete the instance and signal _readyForInitialization

Instead, I'd make waitUntilReadyForInitialization transfer ownership of the lock back to the caller after it is acquired safely.

@seanbudd

Copy link
Copy Markdown
Member Author

Closing as a new approach is required

@seanbudd seanbudd closed this Apr 19, 2022
feerrenrut added a commit that referenced this pull request Apr 26, 2022
Supersedes PR #13569

Summary of the issue:
NVDA was crashing when resetting the configuration to factory defaults.
NVDA terminates the ocSpeech native module when resetting the config, however there is a pending callback to a now deleted python function.

Steps to reproduce:
- Run NVDA
- In the Keyboard section of the Preferences dialog, turn "Speak command keys" on.
- Reset your configuration to factory defaults
- Note that NVDA crashes

Description of fix:
Changes the design of the ocSpeech module.
Both initialization and termination are blocked if a callback is about to happen.
The "tokens" for prior initialization of ocSpeech are collected by the ocSpeech system.
When an async task reaches completion, the origin token is checked for validity, the callback is not called for a now invalid token.
Initialization now requires the callback to be specified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants