Skip to content

Add symbols from Orca#11110

Merged
feerrenrut merged 3 commits into
nvaccess:masterfrom
sthibaul:orca
May 15, 2020
Merged

Add symbols from Orca#11110
feerrenrut merged 3 commits into
nvaccess:masterfrom
sthibaul:orca

Conversation

@sthibaul

@sthibaul sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor

Link to issue number:

fixes #11105

Summary of the issue:

Some symbols have been seen useful in Orca and are missing in NVDA

Description of how this pull request fixes the issue:

This adds them to source/locale/en/symbols.dic

Testing performed:

The resulting file processes fine.

Known issues with pull request:

This adds a series of symbols to translate. Orca translations can be used as a source if translators wish. I will attach a tarball containing them.

Change log entry:

New features
Adds more symbols. (#11105)

@sthibaul

sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor Author

Here are the Orca translations, in case translators wish to reuse them:
orca.zip

@codeofdusk

Copy link
Copy Markdown
Contributor

NVDA seems to be able to read these at least when reading by character...

@Adriani90

Copy link
Copy Markdown
Collaborator

If they are not in NVDA yet, tese symbols need also to be added to the symbols and interpunctuation dialog. Actually there are a lot of symbols.
Adding new symbols to symbols.dic makes them translatable and defines the default value.
I don't know exactly if the symbols are added automatically to the dialog once they are added to symbols.dic, but I think this might need to be done manually unfortunately.

@josephsl

josephsl commented May 5, 2020 via email

Copy link
Copy Markdown
Contributor

@CyrilleB79

Copy link
Copy Markdown
Contributor

Just a question: is there some guideline or some common usage to decide which symbol is added in symbol.dic and which should not? Is there a big performance penalty to add many symbols?
I have raised theses questions in #11105 with some more explanations and example. Please let's answer and comment there.

@k-kolev1985

Copy link
Copy Markdown
Contributor

Here are the Orca translations, in case translators wish to reuse them:
orca.zip

May I ask why the engb (english) one is so small/short? There are only a few entries in it. The russian one is also not complete. I want those two (at least the english one) for reference while discussing them with my colleagues for the bulgarian translation. Thanks in advance!

@sthibaul

sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor Author

May I ask why the engb (english) one is so small/short?

It's just because my script ignores the strings identical with the "en" version

@josephsl

josephsl commented May 5, 2020 via email

Copy link
Copy Markdown
Contributor

@Adriani90

Copy link
Copy Markdown
Collaborator

@josephsl wrote:

Hi, punctuation/symbol pronunciation dialog will pick up symbols from language-specific symbols.dic, which ultimately depends on willingness from translators to translate new symbols once they show up on translations workflow. Thanks.

But in the punctuation and symbol pronounciation dialog, there are more than 3.000 symbols. Where are they fetched from? In the symbols.dic are only about 200 lines. If they are fetched from Windows itself, I think most of them should be already translated (i.e. emojis). What is exactly the purpose of the symbols.dic then?

@josephsl

josephsl commented May 5, 2020 via email

Copy link
Copy Markdown
Contributor

Comment thread source/locale/en/symbols.dic Outdated
Comment on lines +233 to +235
¼ one fourth none
½ one half none
¾ three fourths none

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe "one quarter" and "three quarters" is much more popular than the also acceptable "fourth" version. I propose this be changed.
https://english.stackexchange.com/questions/103188/three-quarters-vs-three-fourths

@CyrilleB79

Copy link
Copy Markdown
Contributor

I am copy/pasting my comment from #11105 since it seems that the discussion is taking place here.

Just a question: is there some guideline or some common usage to decide which symbol is added in symbol.dic and which should not? Is there a big performance penalty to add many symbols?
I raise these questions because anyone may want to add more symbol, e.g. see #11015.

IMO when one asks new symbol inclusion, he/she should give a justification. For example:

* ¼ (one fourth): seems already supported by synthesizers. So give a synthesizer example where this symbol is badly or not announced

* all subscript and superscript: give an example of real life where they are used
  Maybe also the level should be justified.

@feerrenrut could you comment on this please regarding NVAccess point of view? Comments from other people are also welcome.

@feerrenrut

Copy link
Copy Markdown
Contributor

I'm not aware of performance issues with this. I would prefer to address performance problems with the implementation rather than reduce our supported symbols. Even if most synths supply these symbols they may not do so adequately across all languages, or we/a user might not like how they announce the symbol. This gives users the option to use the NVDA supplied symbol name or the rely on the synth. It also lets us adjust symbol names as appropriate, a good example is the Mac command symbol / looped square ⌘ which I believe was added not so long ago.

I hope this answers your question!

@CyrilleB79

Copy link
Copy Markdown
Contributor

Yes, thanks for your answer.

@AAClause

AAClause commented May 9, 2020

Copy link
Copy Markdown
Contributor

Is there a big performance penalty to add many symbols?

According my tests, the number of symbols is not significant. I tried with a string of 1048543 symbols in several locales. However, I don't know if my script is really reliable. There are most likely better ways to measure this.

My script
import globalVars, sys
path = globalVars.appArgs.configPath + "/lib"
if not path in sys.path: sys.path.append(path)

import timeit
import characterProcessing

text = ''.join([chr(c) for c in range (0x20, 0xfffff)])

def test(lang):
	res = characterProcessing.processSpeechSymbols(lang, text, characterProcessing.SYMLVL_SOME)

number = 20
langs = ["ar", "bg", "cs", "da", "fr", "hr", "it", "ja", "ko", "nl", "pl", "ru", "tr", "zh", "zh_TW"]
for lang in langs:
	try:
		res = timeit.timeit(stmt=f"test('{lang}')", number=number, globals=globals())
		b, u = characterProcessing._getSpeechSymbolsForLocale(lang)
		symbolsNumber = len(b.symbols) + len(u.symbols)
		print(f"{lang} -> {symbolsNumber} symbols, total time: {res}, average time: {res/number}")
	except BaseException as err:
		print(f"error with {lang} locale: {err}")

Here's one of my outputs:

ar -> 3376 symbols, total time: 221.0103557, average time: 11.050517785
bg -> 3445 symbols, total time: 214.06203370000003, average time: 10.703101685000002
cs -> 3396 symbols, total time: 218.30669819999991, average time: 10.915334909999995
da -> 3445 symbols, total time: 217.11858540000003, average time: 10.85592927
fr -> 3845 symbols, total time: 238.7677773999999, average time: 11.938388869999994
hr -> 3446 symbols, total time: 217.14729369999986, average time: 10.857364684999993
it -> 3446 symbols, total time: 215.80777239999998, average time: 10.790388619999998
ja -> 3968 symbols, total time: 230.4640202999999, average time: 11.523201014999994
ko -> 31709 symbols, total time: 213.701642, average time: 10.685082099999999
nl -> 3441 symbols, total time: 217.8810311999996, average time: 10.89405155999998
pl -> 3448 symbols, total time: 213.60206610000023, average time: 10.680103305000012
ru -> 3493 symbols, total time: 212.6139294999998, average time: 10.63069647499999
tr -> 3672 symbols, total time: 214.91899090000015, average time: 10.745949545000007
zh -> 152 symbols, total time: 214.42888690000018, average time: 10.721444345000009
zh_TW -> 4246 symbols, total time: 211.3798428, average time: 10.56899214

all subscript and superscript: give an example of real life where they are used

In this case, we can remove many symbols :)

@Adriani90

Adriani90 commented May 9, 2020 via email

Copy link
Copy Markdown
Collaborator

@lukaszgo1

Copy link
Copy Markdown
Contributor

@Adriani90 wrote:

Such symbols are for example often used when writing equations in LateX. Nearly every blind person who uses a laptop or a computer in the school or university works with LateX in Mathematics, Phisics or Chemistry.

The subscripts symbols are not used in LaTeX as such. When writing in LaTeX the particular notation for them is used, and LaTeX generated pdf's containing math are not accessible enough to be readable.

@lukaszgo1

Copy link
Copy Markdown
Contributor
  • ¼ (one fourth): seems already supported by synthesizers. So give a synthesizer example where this symbol is badly or not announced

With some Polish SAPI 5 synthesizers for these fractions only
numerator is being read, so having them in NVDA would be beneficial.

@Adriani90

Adriani90 commented May 9, 2020 via email

Copy link
Copy Markdown
Collaborator

@gregjozk

gregjozk commented May 9, 2020 via email

Copy link
Copy Markdown
Contributor

@DrSooom

DrSooom commented May 10, 2020

Copy link
Copy Markdown

@sthibaul: If you have additional time,, feel free to use my uploaded files regarding issue #6341 as well.

@feerrenrut

Copy link
Copy Markdown
Contributor

I'd like to accept this PR, though I am just waiting on the following line comment to be addressed:

I believe "one quarter" and "three quarters" is much more popular than the also acceptable "fourth" version. I propose this be changed.
https://english.stackexchange.com/questions/103188/three-quarters-vs-three-fourths

@sthibaul

Copy link
Copy Markdown
Contributor Author

I'd like to accept this PR, though I am just waiting on the following line comment to be addressed:

Ok, fixed so.

Comment thread source/locale/en/symbols.dic Outdated

@feerrenrut feerrenrut left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sthibaul

@feerrenrut feerrenrut merged commit 1c4519a into nvaccess:master May 15, 2020
@nvaccessAuto nvaccessAuto added this to the 2020.2 milestone May 15, 2020
@sthibaul sthibaul deleted the orca branch May 15, 2020 08:05

@LeonarddeR LeonarddeR left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have looked at this pr before merge, as I consider some of these symbol pronunciations too verbose and inconsistent to others. Examples below

¥ Yen all norep
₹ Rupee some norep
ƒ florin all norep
¤ currency sign all norep

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just currency is enough

¦ broken bar most
~ tilda most
¡ inverted exclamation point some
¿ inverted question mark some

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent with question mark, which we pronounce as question

‡ double dagger some
‣ triangular bullet none
✗ x-shaped bullet none
 object replacement character none

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too verbose, see #11177

@DrSooom

DrSooom commented May 19, 2020

Copy link
Copy Markdown

@LeonarddeR: Wouldn't it better to add a secondary name field for Unicode characters, which are only spoken in letter navigation? I already planned to create such an issue. This would save us tons of hours of discussions, as then a Unicode character is spoken with its short name in normal navigation and only with its detail name in letter navigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrating Orca's symbol translations?