QuickNav Text paragraph navigation#16031
Conversation
|
Also looking for feedback from the international community on how does this PR do with your languages. I'll leave this PR as a draft for a few days to collect and address this feedback. Specifically, tell us how does this PR work with your language and also would be great to hear punctuation rules in your language if different from Latin. Here are major languages that I'm interested in:
|
|
cc: @cary-rowen, @nishimotz, @dpy013, @josephsl for chinese, japanese and corean at least. |
|
@mltony P and SHIFT+P conflict with the go to next/previous dialog commands in your BrowserNav add-on. Any chance of also updating BrowserNav? |
|
@amirsol81, Right, I will soon remove P command from BrowserNav to avoid keystroke collisions. You should use built-in QuickNav O instead of BrowserNav P which does the same thing. |
|
The Windows Uniscribe API is used for multilingual processing of word and sentence segmentation. |
CyrilleB79
left a comment
There was a problem hiding this comment.
I had pending comments from a week ago; sorry, I release it only now.
Regarding my general impression on the feature, I will add it in a separate message.
| # We have a positive lookBehind \w that resolves to a text character in any language, | ||
| # coupled with negative lookBehind \d that excludes digits. | ||
| lookBehind=r'(?<=\w)(?<!\d)', | ||
| # In some cases quote or closing parenthesis might appear right before or right after text punctuation. |
There was a problem hiding this comment.
Aren't single quotes sometimes used in English? ( ' or ’ )
Not being a native English writer/reader, I have not the answer.
There was a problem hiding this comment.
Neither am I. In my understanding single quotes are typically used as apostrophe in words that are glued together, like I'm or don't and also in what's left of Genetive case in English, like Martin's - I think that's called posessive form. I can come up with a sentence that would end with apostrophe followed by period, like:
Yesterday caucus was republicans'.
But not sure if that's even correct grammar, and if it is, then this type of sentences would be exceedingly rare, so probably not worth including into regex.
|
|
||
| DEFAULT_TEXT_PARAGRAPH_REGEX = ( | ||
| r"({lookBehind}{optQuote}{punc}{optQuote}{optWiki}{lookAhead}|{cjk})".format( | ||
| # Look behind clause ensures that we have a text character before text punctuation mark. |
There was a problem hiding this comment.
This assumption is not always correct. For example in French:
The correct usage is to use no-break space before question mark (?), exclaim (!), semi-colon (;) or colon (:). A very common usage is also to replace this no-break space by a normal space, or even with no space at all due to the influence of English writing.
E.g:
Comment ça va ?
or
Comment ça va ?
or
Comment ça va?
There was a problem hiding this comment.
Am I right that in French you don't put a space before period, only before question mark and exclamation mark? If so then I adjusted regex by relaxing condition for exclamation mark and question mark. Hope that should work better.
| wx.OK | wx.ICON_ERROR, | ||
| self, | ||
| ) | ||
| self.textParagraphRegexEdit.SetFocus() |
There was a problem hiding this comment.
According to #15897, focusing the invalid field causes an issue in case you have changed panel when you validate the settings dialog.
Thus, this instruction should be removed.
|
|
||
| # Translators: This is the label for a textfield in the | ||
| # browse mode settings panel. | ||
| textParagraphRegexLabelText = _("Regular expression for text paragraph navigation") |
There was a problem hiding this comment.
I wonder if this parameter should be here or in advanced settings. Who is likely to change this value?
There was a problem hiding this comment.
Fair enough - moved this to advanced settings. Not sure who would want to change this, but it seems to me that we'd rather keep this configurable.
| wxCtrlClass=wx.TextCtrl, | ||
| size=(self.scaleSize(300), -1), | ||
| ) | ||
| self.textParagraphRegexEdit.SetValue(config.conf["virtualBuffers"]["textParagraphRegex"]) |
There was a problem hiding this comment.
Could the value of the config be checked for validity before being set to this field? If not valid, an error could be logged and the value reset to default.
Of course, a wrong value would land in this field only if the user has manually edited the config file or the config through a command in the Python Console, which are not allowed scenarios.
But this would make this dialog more robust for any tester.
There was a problem hiding this comment.
If a user edits NVDA config file manually - they're on their own risk. I already have a validator in isValid() function.
| qn( | ||
| "textParagraph", key="p", | ||
| # Translators: Input help message for a quick navigation command in browse mode. | ||
| nextDoc=_("moves to the next text paragraph"), |
There was a problem hiding this comment.
Even if I have no better suggestion for now, I have the feeling that "Text paragraph" is too vague (here as well as in the GUI, the Change log and the User Guide).
There should be at least a place, maybe only in the User Guide, describing what a "Text paragraph" is really regarding this feature. If I understand correctly, as part of this functionality, a "Text paragraph" is a paragraph containing at least one sentence.
And, if possible, a more precise description of this option would be nice. @XLTechie, any idea?
"Move to the next text paragraph containing at least one sentence" may be more precise, but maybe something better should be thought.
Also in the user guide, an indication that it's an algorithm trying to cover the majority of situations, but which may fail in some rare cases would be interesting.
There was a problem hiding this comment.
I added a new section in the documentation explaining use cases and definition of text paragraph and also added a sentence regarding not being 100% accurate.
|
Regarding the general experience of this feature and after testing with the add-on, I am not very comfortable using this feature. In many cases, I fear to miss some useful information. And even many times, I miss it, e.g.:
Though, I acknowledge the fact that this feature is needed. And the command to jump to a non-link block of text (N key) is probably less useful today than in the web as it was 20 years ago. To summarize, N key catches too much text, and the current feature does not catch enough. I wonder if we could still enhance the algo to find interesting text. But I have no better suggestion, hence this comment so that any suggestion is made by other people if they have. If no better suggestion is found, I am still in favor of merging this PR, since it seems to be an improvement for many people. |
|
By the way, if the regexp for a sentence separator is validated in this PR, we should consider it as also valid for #8518 if it is implemented one day. |
|
Writing also here: Regarding key assignation: Maybe we should merge this PR and #16050 with unassigned gestures and test them during alpha stage. Then at the end of alpha stage, we can figure if one or both of these feature need to have an assigned gesture and add them. |
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
See test results for failed build of commit b594079fd2 |
|
I would have detailed comments to do on this PR, but first, there are more general questions to answer. Question 1: Naming / documenting In this PR, we want to navigate to paragraphs containing at least one sentence. The sentences are detected by the mean of a regexp detecting their end, mainly the punctuation. This regexp is made configurable in advanced options. Question 2: In NVDA there is already an algorithm using the detection of sentence end in Question 3: Does NV Access (cc @seanbudd) agree with the configurability of this regex in the advanced options? Question 4: Could this feature be generalized and be available not only in browse mode but also in any text? E.g. Word (browse mode off), WordPad. If it's too difficult though, it may not be worth spending time on this point since the main use case remains from far the navigation on the web. |
|
With regard to "text paragraphs", in the words of the great Jack Nicholson: is
there any other kind?
Should we really be creating a new class of paragraph, from the general way
people already understand paragraphs, to "text paragraph", which is this new
thing with a slightly more subtle meaning?
I too am not fond of this terminology.
What the function seems to be doing, is looking for the next "probable
paragraph". That is: a thing that sort of looks like a paragraph, even if it's
not defined as such structurally.
It's still likely to be a paragraph, in the classical definition of a paragraph.
I'd much rather call it a "probable paragraph", or something similar that
implies the standard definition of paragraph, and our attempt to find one.
Of course, in web/Twitter/etc. comments, it's common to misuse, not use, or
otherwise pervert the rules of, punctuation. So this is never going to be
entirely accurate; there will always be the potential to miss something.
I can imagine this feature morphing into an AI recognition mechanism one day
that will be much more accurate, but that isn't happening today.
Even including at least one sentence is not required to be a paragraph (take an
excerpted quotation, for example), but the line has to be drawn somewhere, and
requiring a sentence is still better than the N command.
@CyrilleB79 I suspect you wouldn't need this for Word. That is, if the P command
was implemented to jump to next paragraph in Word, I think it wouldn't need to
be nearly so fuzzy as this. I have no idea how you would detect one at the code
level for Word, but Word already has a good idea what a paragraph is, and there
is likely a way to ask it. Wordpad doesn't use a virtual buffer, so doing it
there would be different in several ways.
So while the feature might be useful in other contexts, I would bet the
implementation would have to be radically different for each, and beyond the
scope of this PR.
|
That's a valid concern, however, the way I view this feature is that it allows you to quickly find the beginning of the article where you would want to start reading. Then you can use
For actual sentence navigation there will be a few more tweaks needed, but yes, we can keep single regexp for both use cases.
Style navigation is a much more specialized command for advanced users. That's according to TextNav usage vs style navigation in BrowserNav. That's why I prefer text paragraph navigation to have a default key binding.
I am open to this as long as we can find a better name for it.
I added a sentence in the doc emphasizing the difference.
Let me prepare a PR for sentence navigation - it will come with the fanciest regex to cover most edge cases, then we can consider converging on that one.
This feature should work in word in browse mode. WordPad doesn't appear to have browse mode, so maybe that's a separate question we can consider - why not bring browse mode to wordPad? Although given its limited usage it's probably not a very important issue. |
Interesting that you mention that, because I had some thoughts along the same lines. A prototype can be implemented even today - we can ask ChatGPT to skip all the clutter - we'd have to define clutter for it, and then ask it to identify first readable paragraph in some way we can parse. Or alternatively ask ChatGPT to rewrite the page removing all the clutter. I wish OpenAI could create a free account for NVAccess - this way we'd be able to bring these features to all users without pushing them to set up their developer accounts, which would be too complicated for the majority. Or alternatively we can consider running LLM locally - but I haven't looked much into this, I suspect it's going to be too slow for a typical consumer laptop without GPU acceleration. |
Qchristensen
left a comment
There was a problem hiding this comment.
Looks interesting and useful, thanks!
|
@mltony wrote:
I wish OpenAI could create a free account for NVAccess - this way we'd be able to bring these features to all users without pushing them to set up their developer accounts
Yes. They've done something like it for Be My Eyes, so I don't imagine it's
impossible if somebody at NV Access wanted to pursue it.
running LLM locally - but I haven't looked much into this, I suspect it's going to be too slow for a typical consumer laptop without GPU acceleration.
Maybe, but maybe not. There are downloadable open source LLMs out there you can
find on GitHub. I don't know that the problem is going to be the CPU/GPU speed
so much, but also the download size is a big consideration. You need the
training data. The last one I saw was 4 gig.
However, with the new Open AI store concept, maybe there is some way to create a
customized LLM with limited training data on what web pages look like. This is
not my specialty for certain.
Anyway, people don't like comments on closed PRs, so I don't want to continue
this further, but I thought you deserved an answer.
|
|
Again, I have forgotten to publish pending comments on this PR... Here are they:
It's clearer to read the code
This factorization has not been done in this PR.
|
|
@CyrilleB79, |
Closes nvaccess#15998 Summary of the issue: Add QuickNav text paragraph navigation Description of user facing changes Added QuickNav gesture p to jump to next/previous text paragraph in browse mode. Description of development approach In BrowseModeTreeInterceptor added function _iterSimilarParagraph that finds next/previous paragraph that satisfies condition defined by a lambda function. I will reuse this function in later PRs to implement vertical navigation. Added a new clause in BrowseModeTreeInterceptor._quickNavScript() that handles text paragraph case and calls function defined in the previous bullet. For text criteria I ended up implementing a user configurable regex. Its initial value is defined in DEFAULT_TEXT_PARAGRAPH_REGEX variable in configSpec.py:11. It is user-configurable in Browse Mode page in NVDA options. The rationale for using a regex instead of plain-text search is to reduce the number of false positives. For example, for period character we check that it follows a word character \w and is followed by space character or end of string. There are a few more rules for edge cases in the code. I don't include comma, colon and semicolon punctuation marks in the regex due to high number of false positives. The regex also accounts for CJK languages by checking for full-width punctuation marks.
Fix-up of #16031 Fix-up of other older PRs. Summary of the issue: Various little issues in advanced settings panel: in advanced settings, option regex for text paragraph navigation feature: restoring default advanced settings do not restore this field a wrong regex syntax in this field do not trigger the error message as expected the code to restore defaults in advanced settings panel is working by chance for some combo boxes, just because the default option is the first one. Description of user facing changes restoring advanced settings will now restore the Regex for Text paragraph navigation option a wrong regex syntax for the Regex for Text paragraph navigation will now trigger an error message Also while at it, fixed: a translator's comment a call to self.Layout that probably needs to be called after all the controls are created
Closes nvaccess#15998 Summary of the issue: Add QuickNav text paragraph navigation Description of user facing changes Added QuickNav gesture p to jump to next/previous text paragraph in browse mode. Description of development approach In BrowseModeTreeInterceptor added function _iterSimilarParagraph that finds next/previous paragraph that satisfies condition defined by a lambda function. I will reuse this function in later PRs to implement vertical navigation. Added a new clause in BrowseModeTreeInterceptor._quickNavScript() that handles text paragraph case and calls function defined in the previous bullet. For text criteria I ended up implementing a user configurable regex. Its initial value is defined in DEFAULT_TEXT_PARAGRAPH_REGEX variable in configSpec.py:11. It is user-configurable in Browse Mode page in NVDA options. The rationale for using a regex instead of plain-text search is to reduce the number of false positives. For example, for period character we check that it follows a word character \w and is followed by space character or end of string. There are a few more rules for edge cases in the code. I don't include comma, colon and semicolon punctuation marks in the regex due to high number of false positives. The regex also accounts for CJK languages by checking for full-width punctuation marks.
Fix-up of nvaccess#16031 Fix-up of other older PRs. Summary of the issue: Various little issues in advanced settings panel: in advanced settings, option regex for text paragraph navigation feature: restoring default advanced settings do not restore this field a wrong regex syntax in this field do not trigger the error message as expected the code to restore defaults in advanced settings panel is working by chance for some combo boxes, just because the default option is the first one. Description of user facing changes restoring advanced settings will now restore the Regex for Text paragraph navigation option a wrong regex syntax for the Regex for Text paragraph navigation will now trigger an error message Also while at it, fixed: a translator's comment a call to self.Layout that probably needs to be called after all the controls are created
Link to issue number:
Closes #15998
Summary of the issue:
Add QuickNav text paragraph navigation
Description of user facing changes
pto jump to next/previous text paragraph in browse mode.Description of development approach
BrowseModeTreeInterceptoradded function_iterSimilarParagraphthat finds next/previous paragraph that satisfies condition defined by a lambda function. I will reuse this function in later PRs to implement vertical navigation.BrowseModeTreeInterceptor._quickNavScript()that handles text paragraph case and calls function defined in the previous bullet.DEFAULT_TEXT_PARAGRAPH_REGEXvariable inconfigSpec.py:11.\wand is followed by space character or end of string. There are a few more rules for edge cases in the code.Testing strategy:
Known issues with pull request:
N/A
Code Review Checklist: