Add symbols from Orca by sthibaul · Pull Request #11110 · nvaccess/nvda

sthibaul · 2020-05-05T00:45:17Z

Link to issue number:

fixes #11105

Summary of the issue:

Some symbols have been seen useful in Orca and are missing in NVDA

Description of how this pull request fixes the issue:

This adds them to source/locale/en/symbols.dic

Testing performed:

The resulting file processes fine.

Known issues with pull request:

This adds a series of symbols to translate. Orca translations can be used as a source if translators wish. I will attach a tarball containing them.

Change log entry:

New features
Adds more symbols. (#11105)

sthibaul · 2020-05-05T00:45:55Z

Here are the Orca translations, in case translators wish to reuse them:
orca.zip

codeofdusk · 2020-05-05T02:16:44Z

NVDA seems to be able to read these at least when reading by character...

Adriani90 · 2020-05-05T07:20:31Z

If they are not in NVDA yet, tese symbols need also to be added to the symbols and interpunctuation dialog. Actually there are a lot of symbols.
Adding new symbols to symbols.dic makes them translatable and defines the default value.
I don't know exactly if the symbols are added automatically to the dialog once they are added to symbols.dic, but I think this might need to be done manually unfortunately.

josephsl · 2020-05-05T07:30:22Z

Hi, punctuation/symbol pronunciation dialog will pick up symbols from language-specific symbols.dic, which ultimately depends on willingness from translators to translate new symbols once they show up on translations workflow. Thanks.

CyrilleB79 · 2020-05-05T08:38:57Z

Just a question: is there some guideline or some common usage to decide which symbol is added in symbol.dic and which should not? Is there a big performance penalty to add many symbols?
I have raised theses questions in #11105 with some more explanations and example. Please let's answer and comment there.

k-kolev1985 · 2020-05-05T17:46:23Z

Here are the Orca translations, in case translators wish to reuse them:
orca.zip

May I ask why the engb (english) one is so small/short? There are only a few entries in it. The russian one is also not complete. I want those two (at least the english one) for reference while discussing them with my colleagues for the bulgarian translation. Thanks in advance!

sthibaul · 2020-05-05T18:01:50Z

May I ask why the engb (english) one is so small/short?

It's just because my script ignores the strings identical with the "en" version

josephsl · 2020-05-05T18:04:40Z

Hi, in other words, what translators would have seen in symbols changes folder for a given revision minus diff signs. Thanks.

Adriani90 · 2020-05-05T18:39:32Z

@josephsl wrote:

Hi, punctuation/symbol pronunciation dialog will pick up symbols from language-specific symbols.dic, which ultimately depends on willingness from translators to translate new symbols once they show up on translations workflow. Thanks.

But in the punctuation and symbol pronounciation dialog, there are more than 3.000 symbols. Where are they fetched from? In the symbols.dic are only about 200 lines. If they are fetched from Windows itself, I think most of them should be already translated (i.e. emojis). What is exactly the purpose of the symbols.dic then?

josephsl · 2020-05-05T18:42:09Z

Hi, some of them come from symbols.dic, others are emoji characters coming from CLDR database. Thanks.

feerrenrut · 2020-05-07T07:18:39Z

+¼	one fourth	none
+½	one half	none
+¾	three fourths	none


I believe "one quarter" and "three quarters" is much more popular than the also acceptable "fourth" version. I propose this be changed.
https://english.stackexchange.com/questions/103188/three-quarters-vs-three-fourths

CyrilleB79 · 2020-05-07T07:41:24Z

I am copy/pasting my comment from #11105 since it seems that the discussion is taking place here.

Just a question: is there some guideline or some common usage to decide which symbol is added in symbol.dic and which should not? Is there a big performance penalty to add many symbols?
I raise these questions because anyone may want to add more symbol, e.g. see #11015.

IMO when one asks new symbol inclusion, he/she should give a justification. For example:
* ¼ (one fourth): seems already supported by synthesizers. So give a synthesizer example where this symbol is badly or not announced

* all subscript and superscript: give an example of real life where they are used
  Maybe also the level should be justified.

@feerrenrut could you comment on this please regarding NVAccess point of view? Comments from other people are also welcome.

feerrenrut · 2020-05-08T14:38:31Z

I'm not aware of performance issues with this. I would prefer to address performance problems with the implementation rather than reduce our supported symbols. Even if most synths supply these symbols they may not do so adequately across all languages, or we/a user might not like how they announce the symbol. This gives users the option to use the NVDA supplied symbol name or the rely on the synth. It also lets us adjust symbol names as appropriate, a good example is the Mac command symbol / looped square ⌘ which I believe was added not so long ago.

I hope this answers your question!

CyrilleB79 · 2020-05-08T21:10:00Z

Yes, thanks for your answer.

AAClause · 2020-05-09T06:22:58Z

Is there a big performance penalty to add many symbols?

According my tests, the number of symbols is not significant. I tried with a string of 1048543 symbols in several locales. However, I don't know if my script is really reliable. There are most likely better ways to measure this.

My script

import globalVars, sys
path = globalVars.appArgs.configPath + "/lib"
if not path in sys.path: sys.path.append(path)

import timeit
import characterProcessing

text = ''.join([chr(c) for c in range (0x20, 0xfffff)])

def test(lang):
	res = characterProcessing.processSpeechSymbols(lang, text, characterProcessing.SYMLVL_SOME)

number = 20
langs = ["ar", "bg", "cs", "da", "fr", "hr", "it", "ja", "ko", "nl", "pl", "ru", "tr", "zh", "zh_TW"]
for lang in langs:
	try:
		res = timeit.timeit(stmt=f"test('{lang}')", number=number, globals=globals())
		b, u = characterProcessing._getSpeechSymbolsForLocale(lang)
		symbolsNumber = len(b.symbols) + len(u.symbols)
		print(f"{lang} -> {symbolsNumber} symbols, total time: {res}, average time: {res/number}")
	except BaseException as err:
		print(f"error with {lang} locale: {err}")

Here's one of my outputs:

ar -> 3376 symbols, total time: 221.0103557, average time: 11.050517785
bg -> 3445 symbols, total time: 214.06203370000003, average time: 10.703101685000002
cs -> 3396 symbols, total time: 218.30669819999991, average time: 10.915334909999995
da -> 3445 symbols, total time: 217.11858540000003, average time: 10.85592927
fr -> 3845 symbols, total time: 238.7677773999999, average time: 11.938388869999994
hr -> 3446 symbols, total time: 217.14729369999986, average time: 10.857364684999993
it -> 3446 symbols, total time: 215.80777239999998, average time: 10.790388619999998
ja -> 3968 symbols, total time: 230.4640202999999, average time: 11.523201014999994
ko -> 31709 symbols, total time: 213.701642, average time: 10.685082099999999
nl -> 3441 symbols, total time: 217.8810311999996, average time: 10.89405155999998
pl -> 3448 symbols, total time: 213.60206610000023, average time: 10.680103305000012
ru -> 3493 symbols, total time: 212.6139294999998, average time: 10.63069647499999
tr -> 3672 symbols, total time: 214.91899090000015, average time: 10.745949545000007
zh -> 152 symbols, total time: 214.42888690000018, average time: 10.721444345000009
zh_TW -> 4246 symbols, total time: 211.3798428, average time: 10.56899214

all subscript and superscript: give an example of real life where they are used

In this case, we can remove many symbols :)

Adriani90 · 2020-05-09T06:30:42Z

Such symbols are for example often used when writing equations in LateX. Nearly every blind person who uses a laptop or a computer in the school or university works with LateX in Mathematics, Phisics or Chemistry. Von meinem iPhone gesendet

…

Am 09.05.2020 um 08:23 schrieb André-Abush Clause ***@***.***>: Is there a big performance penalty to add many symbols? According my tests, the number of symbols is not significant. I tried with a string of 65335 symbols in several locales. However, I don't know if my script is really reliable. There are most likely better ways to measure this. My script Here's one of my outputs: ar -> 3376 symbols, total time: 221.0103557, average time: 11.050517785 bg -> 3445 symbols, total time: 214.06203370000003, average time: 10.703101685000002 cs -> 3396 symbols, total time: 218.30669819999991, average time: 10.915334909999995 da -> 3445 symbols, total time: 217.11858540000003, average time: 10.85592927 fr -> 3845 symbols, total time: 238.7677773999999, average time: 11.938388869999994 hr -> 3446 symbols, total time: 217.14729369999986, average time: 10.857364684999993 it -> 3446 symbols, total time: 215.80777239999998, average time: 10.790388619999998 ja -> 3968 symbols, total time: 230.4640202999999, average time: 11.523201014999994 ko -> 31709 symbols, total time: 213.701642, average time: 10.685082099999999 nl -> 3441 symbols, total time: 217.8810311999996, average time: 10.89405155999998 pl -> 3448 symbols, total time: 213.60206610000023, average time: 10.680103305000012 ru -> 3493 symbols, total time: 212.6139294999998, average time: 10.63069647499999 tr -> 3672 symbols, total time: 214.91899090000015, average time: 10.745949545000007 zh -> 152 symbols, total time: 214.42888690000018, average time: 10.721444345000009 zh_TW -> 4246 symbols, total time: 211.3798428, average time: 10.56899214 all subscript and superscript: give an example of real life where they are used In this case, we can remove many symbols :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

lukaszgo1 · 2020-05-09T09:07:46Z

@Adriani90 wrote:

Such symbols are for example often used when writing equations in LateX. Nearly every blind person who uses a laptop or a computer in the school or university works with LateX in Mathematics, Phisics or Chemistry.

The subscripts symbols are not used in LaTeX as such. When writing in LaTeX the particular notation for them is used, and LaTeX generated pdf's containing math are not accessible enough to be readable.

lukaszgo1 · 2020-05-09T09:15:40Z

¼ (one fourth): seems already supported by synthesizers. So give a synthesizer example where this symbol is badly or not announced

With some Polish SAPI 5 synthesizers for these fractions only
numerator is being read, so having them in NVDA would be beneficial.

Adriani90 · 2020-05-09T09:19:02Z

While this applies to pdf probably, there are also other formats you can export latex code to, so I think it is good to have the symbols reported.

gregjozk · 2020-05-09T11:31:22Z

If I can summarise, most of us agree, that it could be benefitial to users, if Orca's and such other symbols were included in NVDA? Personally I vote for inclusion Orcas and other symbols, which has been used often, but were not included in lcdr. BTW what's the relationship between NVDA's symbols and included LCDR?

DrSooom · 2020-05-10T04:37:07Z

@sthibaul: If you have additional time,, feel free to use my uploaded files regarding issue #6341 as well.

feerrenrut · 2020-05-13T09:19:17Z

I'd like to accept this PR, though I am just waiting on the following line comment to be addressed:

I believe "one quarter" and "three quarters" is much more popular than the also acceptable "fourth" version. I propose this be changed.
https://english.stackexchange.com/questions/103188/three-quarters-vs-three-fourths

sthibaul · 2020-05-13T09:30:50Z

I'd like to accept this PR, though I am just waiting on the following line comment to be addressed:

Ok, fixed so.

Fixes nvaccess#11105

feerrenrut

Thanks @sthibaul

LeonarddeR

I should have looked at this pr before merge, as I consider some of these symbol pronunciations too verbose and inconsistent to others. Examples below

LeonarddeR · 2020-05-19T12:20:10Z

 ¥	Yen	all	norep
 ₹	Rupee	some	norep
+ƒ	florin	all	norep
+¤	currency sign	all	norep


I'd say just currency is enough

LeonarddeR · 2020-05-19T12:21:19Z

+¦	broken bar	most
 ~	tilda	most
+¡	inverted exclamation point	some
+¿	inverted question mark	some


Inconsistent with question mark, which we pronounce as question

LeonarddeR · 2020-05-19T12:21:54Z

+‡	double dagger	some
+‣	triangular bullet	none
+✗	x-shaped bullet	none
+	object replacement character	none


Too verbose, see #11177

DrSooom · 2020-05-19T13:20:29Z

@LeonarddeR: Wouldn't it better to add a secondary name field for Unicode characters, which are only spoken in letter navigation? I already planned to create such an issue. This would save us tons of hours of discussions, as then a Unicode character is spoken with its short name in normal navigation and only with its detail name in letter navigation.

feerrenrut suggested changes May 7, 2020

View reviewed changes

sthibaul force-pushed the orca branch from 76117a9 to 5d643d1 Compare May 13, 2020 09:30

feerrenrut reviewed May 13, 2020

View reviewed changes

Comment thread source/locale/en/symbols.dic Outdated

Add symbols from Orca

97806a8

Fixes nvaccess#11105

sthibaul force-pushed the orca branch from 5d643d1 to 97806a8 Compare May 13, 2020 14:23

feerrenrut approved these changes May 15, 2020

View reviewed changes

feerrenrut added 2 commits May 15, 2020 09:55

Update changes file for PR nvaccess#11110

33f0473

Merge remote-tracking branch 'origin/master' into HEAD

609dd37

feerrenrut merged commit 1c4519a into nvaccess:master May 15, 2020

nvaccessAuto added this to the 2020.2 milestone May 15, 2020

sthibaul deleted the orca branch May 15, 2020 08:05

LeonarddeR mentioned this pull request May 19, 2020

NVDA announces document replacement character for Firefox documents #11177

Closed

LeonarddeR reviewed May 19, 2020

View reviewed changes

lukaszgo1 mentioned this pull request Aug 17, 2020

Characters are not being interpreted correctly #11502

Closed

Uh oh!

Conversation

sthibaul commented May 5, 2020

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

Uh oh!

sthibaul commented May 5, 2020

Uh oh!

codeofdusk commented May 5, 2020

Uh oh!

Adriani90 commented May 5, 2020

Uh oh!

josephsl commented May 5, 2020 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CyrilleB79 commented May 5, 2020

Uh oh!

k-kolev1985 commented May 5, 2020

Uh oh!

sthibaul commented May 5, 2020

Uh oh!

josephsl commented May 5, 2020 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adriani90 commented May 5, 2020

Uh oh!

josephsl commented May 5, 2020 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feerrenrut May 7, 2020

Choose a reason for hiding this comment

Uh oh!

CyrilleB79 commented May 7, 2020

Uh oh!

feerrenrut commented May 8, 2020

Uh oh!

CyrilleB79 commented May 8, 2020

Uh oh!

AAClause commented May 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adriani90 commented May 9, 2020 via email

Uh oh!

lukaszgo1 commented May 9, 2020

Uh oh!

lukaszgo1 commented May 9, 2020

Uh oh!

Adriani90 commented May 9, 2020 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gregjozk commented May 9, 2020 via email • edited by feerrenrut Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrSooom commented May 10, 2020

Uh oh!

feerrenrut commented May 13, 2020

Uh oh!

sthibaul commented May 13, 2020

Uh oh!

Uh oh!

feerrenrut left a comment

Choose a reason for hiding this comment

Uh oh!

LeonarddeR left a comment

Choose a reason for hiding this comment

Uh oh!

LeonarddeR May 19, 2020

Choose a reason for hiding this comment

Uh oh!

LeonarddeR May 19, 2020

Choose a reason for hiding this comment

Uh oh!

LeonarddeR May 19, 2020

Choose a reason for hiding this comment

Uh oh!

DrSooom commented May 19, 2020

josephsl commented May 5, 2020 via email •

edited by feerrenrut

Loading

josephsl commented May 5, 2020 via email •

edited by feerrenrut

Loading

josephsl commented May 5, 2020 via email •

edited by feerrenrut

Loading

AAClause commented May 9, 2020 •

edited

Loading

Adriani90 commented May 9, 2020 via email •

edited by feerrenrut

Loading

gregjozk commented May 9, 2020 via email •

edited by feerrenrut

Loading