Skip to content

Prevent repetitive crashes by only allowing NVDA to restart from a crash three times in two minute.#19175

Merged
seanbudd merged 23 commits into
nvaccess:masterfrom
derekriemer:bootloop-remove
Nov 12, 2025
Merged

Prevent repetitive crashes by only allowing NVDA to restart from a crash three times in two minute.#19175
seanbudd merged 23 commits into
nvaccess:masterfrom
derekriemer:bootloop-remove

Conversation

@derekriemer

Copy link
Copy Markdown
Collaborator

Link to issue number:

Closes #19133

Summary of the issue:

If NVDA crashes somehow on startup, the watchdog restarts NVDA. This issue limits how many times the watchdog will atttempt to recover from crashes by preventing the crash handler from booting if the application has been in a boot loop of 3 or more crashes in one minute.

Description of user facing changes:

When NVDA gets itself into a crash loop, which is very rare, it can crash many times very rapidly. So rapidly that a user's keyboard is effectively locked from use. This issue prevents NVDA from crashing on repeat. Since NVDA often boots at startup, failure to gracefully bail can lock the users computer up every time the user logs in.

Description of developer facing changes:

The watchdog prevents crash handling code from booting on the fourth crash, by simply recording the timestamp of the last 3 crashes in a text file along side the crash dmp file, and then using the timestamps in a rolling window to decide if NVDA should start the crash handler, or gracefully allow shutdown.

Description of development approach:

Testing strategy:

Tested with a portable copy locally. I don't see any tests for the watchdog.

Known issues with pull request:

Code Review Checklist:

  • [ x ] Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • [ x ] Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • [ x ] UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • [ x ] API is compatible with existing add-ons.
  • [ x ] Security precautions taken.

@derekriemer derekriemer requested a review from a team as a code owner November 6, 2025 00:22
@derekriemer derekriemer requested a review from seanbudd November 6, 2025 00:22
Comment thread source/watchdog.py Outdated
Comment thread source/watchdog.py Outdated
Comment thread source/watchdog.py Outdated
Comment thread source/watchdog.py Outdated
derek riemer added 6 commits November 6, 2025 06:41
…closes nvaccess#19133)

Check a file for the last known crash time, and number of crashes within that time. if that time is less than 1 minute with 3 prior crashes, don't start the crash subsystem.

This fixes nvaccess#19133 by ensuring that the crash subsystem cannot  restore NVDA after another such crash, ensuring that issues that could cause NVDA to crash at boot do not cause it to boot in circles forever.
* Just record everything in if statement, no need for variable
* clean up order of statements so that the code reads more correctly.
@derekriemer derekriemer requested a review from seanbudd November 6, 2025 14:57
@derekriemer

Copy link
Copy Markdown
Collaborator Author

Trying to figure out if tests are flakey, because output is foreign to me and they won't locally work due to needing something on localhost.

pre-commit.ci run

@derekriemer

Copy link
Copy Markdown
Collaborator Author

@seanbudd I haven't worked with NVDA's system tests yet. I can successfully run NVDA and build a launcher, but can't figure out how to rerun this test suite. I'm still new to the github interfaces presubmit checks, my previous job used very custom workflows. Am I missing something obvious in this tests failure or is this simply a flake?

@derekriemer

Copy link
Copy Markdown
Collaborator Author

pre-commit.ci run

@CyrilleB79

Copy link
Copy Markdown
Contributor

@derekriemer you've got the following error in the during installation:

CRITICAL - __main__ (15:26:00.263) - MainThread (5376):
core failure
Traceback (most recent call last):
  File "nvda.pyw", line 309, in <module>
  File "core.pyc", line 1028, in main
  File "watchdog.pyc", line 422, in initialize
  File "watchdog.pyc", line 163, in _loadRecentCrashTimestamps
TypeError: _path_exists: path should be string, bytes, os.PathLike or integer, not property

@derekriemer

Copy link
Copy Markdown
Collaborator Author

@CyrilleB79 how do I find logs for the tests? I'm probably missing something obvious. I can now repro locally, should have run some manual tests, but I'm surprised I can't find that traceback when I click the startup/shutdown test.

derek riemer added 3 commits November 6, 2025 15:46
… rest of the file. I've never used dataclasses to collect consts like this so I originally forgot to instanciate, then started to refactor it so that each record returned a CrashStats to compare with consts but then realized the reviewer meant use a dataclass as a collection to wrap the crash stats consts, for dependency injection and ease of use.

@seanbudd seanbudd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @derekriemer - just minor suggestions

Comment thread source/watchdog.py Outdated
Comment thread source/watchdog.py Outdated
Comment thread source/watchdog.py Outdated
Comment thread source/watchdog.py Outdated
@seanbudd seanbudd marked this pull request as draft November 7, 2025 02:38
derekriemer and others added 3 commits November 6, 2025 23:23
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
@derekriemer derekriemer changed the title Prevent obsessive crashes by only allowing NVDA to restart from a crash three times in one minute. Prevent obsessive crashes by only allowing NVDA to restart from a crash three times in two minute. Nov 7, 2025
…ash handling logic there, call it from watchdog.

This is step 1 of a refactor, which I'll complete after I merge the PR, and can write a test of the watchdog's testable code.
* The watchdog would be split into a watchdog directory, exported members in __init__.py
* the cancelable thread stuff would be part of a Cancelable file.
* the crash stats code would live in crash_stats.py
* the crash handler would be in crash_handler.py
* the remainder would get split as necessary, such as init, terminate, etc being in watchdog.py and reexported
@seanbudd seanbudd changed the title Prevent obsessive crashes by only allowing NVDA to restart from a crash three times in two minute. Prevent repetitive crashes by only allowing NVDA to restart from a crash three times in two minute. Nov 11, 2025
Comment thread source/watchdog.py Outdated
Comment thread source/watchdog.py Outdated
@derekriemer

Copy link
Copy Markdown
Collaborator Author

@seanbudd I've addressed review comments. Not sure how to mark review ready for another pass in gh.

@seanbudd seanbudd marked this pull request as ready for review November 12, 2025 00:21
Copilot AI review requested due to automatic review settings November 12, 2025 00:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements crash loop detection to prevent NVDA from repeatedly restarting when it crashes on startup. The mechanism tracks crash timestamps and disables automatic crash recovery when 3 or more crashes occur within a 120-second window.

Key changes:

  • Created new crashHandler.py module with crash tracking logic and moved crash handling code from watchdog.py
  • Modified watchdog.py to check for crash loops before installing the crash handler
  • Updated changelog to document the new feature

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
user_docs/en/changes.md Added changelog entry documenting crash loop detection feature
source/watchdog.py Refactored to use new crash handler module and added crash loop detection logic
source/crashHandler.py New module containing crash handling logic, crash tracking, and crash event persistence

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread user_docs/en/changes.md Outdated
Comment thread source/utils/_crashHandler.py
Comment thread source/crashHandler.py
Comment thread source/watchdog.py Outdated
Comment thread source/crashHandler.py Outdated
Comment thread source/crashHandler.py Outdated
Comment thread source/utils/_crashHandler.py
Comment thread source/watchdog.py Outdated
Comment thread source/crashHandler.py
Comment thread source/crashHandler.py Outdated
derekriemer and others added 3 commits November 11, 2025 18:32
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
@derekriemer

Copy link
Copy Markdown
Collaborator Author

how in the world do I view the reason for the pyrite failure? It runs successfully on my machine, and when I clicked through that mess, the pyrite process seemed to exit with no output, even when I view the raw logs. Is there some magic github panel I don't know about?

…asn't saved, but wasn't marking it clearly and I had to manually tell vscode to revert it so it looked like the utils.crashHandler was correct.
@seanbudd

Copy link
Copy Markdown
Member

@derekriemer - you can view here: https://github.com/nvaccess/nvda/actions/runs/19284281460/attempts/1#summary-55142044998

or via running runlint.bat locally

@derekriemer

Copy link
Copy Markdown
Collaborator Author

H101
ARIA details role :: Test aria details roles being announced on di... | FAIL |
102
Speech did not finish before timeout
103

104

Can you rerun this to see if it's a flake? I don't have permission.I'll try running locally but I can't figure out a way to run tests that doesn't take over my NVDA locally, would be nice if I can figure out how to dockerize the test harness.

@CyrilleB79

Copy link
Copy Markdown
Contributor

In case I suspect a flake, I push an empty commit to re-run the tests:

git commit --allow-empty -m "Bump CI"
git push

@seanbudd

Copy link
Copy Markdown
Member

Thanks @derekriemer

@seanbudd seanbudd merged commit 1101f8a into nvaccess:master Nov 12, 2025
29 checks passed
@github-actions github-actions Bot added this to the 2026.1 milestone Nov 12, 2025
@derekriemer derekriemer deleted the bootloop-remove branch November 14, 2025 07:37
SaschaCowley added a commit that referenced this pull request Nov 19, 2025
…hen running in secure mode (#19238)

Fixes #19216
Follow-up #19175 

### Summary of the issue:

NVDA was failing to start on secure screens, unless the serviceDebug
global parameter was set.

### Description of user facing changes:

NVDA works on secure desktops again.

### Description of developer facing changes:

None

### Description of development approach:

The issue was caused by `utils._crashHandler.CrashStats.crashStatsPath`
assuming that `globalVars.appArgs.logFileName` would always represent a
path. Notwithstanding this, it also wrote to disk, even when
`NVDAState.shouldWriteToDisk` returned `False` (i.e. in scure mode or
running from the launcher).

1. Added several defensive measures to the new `utils._crashHandler`
module.
2. Changed `watchdog.initialize` to no longer add
`utils._crashHandler.crashHandler` as an unhandled exception filter when
running in secure mode.

### Testing strategy:

Ran from source, and executed the following in the python console to
ensure automatic restarts on crash still work as expected:

```py
import ctypes;ctypes.windll.kernel32.DebugBreak()
```

Modified NVDA to allow the scratchpad in secure mode, built a
self-signed launcher, and installed it.

Hit `alt+control+delete`, and observed that NVDA worked as expected.

Added the following global plugin to the scratchpad of the user and
system config:

```py
from globalPluginHandler import GlobalPlugin
import ctypes
from tones import beep
from core import postNvdaStartup

class GlobalPlugin(GlobalPlugin):
	def __init__(self):
		super().__init__()
		postNvdaStartup.register(self.crash)

	def crash(self):
		beep(500,100)
		ctypes.windll.kernel32.DebugBreak()
```

Restarted NVDA, and observed that it started, a tone was heard, and it
crashed, and this only happened 4 times.

Hit `alt+control+delete`, and observed that NVDA started, a tone was
heard, NVDA crashed, and did not restart.

### Known issues with pull request:

When NVDA crashes on the secure desktop, no feedback is given to the
user. However, this is not new.
seanbudd pushed a commit that referenced this pull request Dec 9, 2025
Closes #19333

Fix-up of #19175.
Summary of the issue:

NVDA was crashing at startup when using --no-logging command line flag.
Description of user facing changes:

NVDA can start again when using the --nologging flag.
Description of developer facing changes:

N/A
Description of development approach:

When loading the crash list file, also test if its path is None since this is the case when using --no-logging flag.
SaschaCowley added a commit that referenced this pull request Dec 12, 2025
Follow-up #19175

### Summary of the issue:

If NVDA crashes when running from source, git picks up
`nvda_crash_stats.txt` as an untracked file.

### Description of user facing changes:

None

### Description of developer facing changes:

Git will ignore `nvda_crash_stats.txt`, meaning it's much less likely to
be accidentally committed and doesn't clutter our git output.

### Description of development approach:

Added `source/nvda_crash_stats.txt` to `.gitignore`.

### Testing strategy:

Ran `git status`. Verified that `nvda_crash_stats` was no longer showing
as an untracked file.

### Known issues with pull request:

None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If NVDA crashes more than 3 times in a row, do not attempt to restart

4 participants