Add Arabic and Chinese punctuation symbols#13661
Conversation
…nctuations Fixes nvaccess#12097 Fixes nvaccess#12086
|
Have tested with Arabic and it works, the only issue is the Arabic question mark does not trigger the pitch change in Espeak and other synths, only the normal question mark does. Is there some way to make them both behayve the same? Thanks. |
|
Actually do you have any idea why these characters are part of CLDR? And why not characters such as English punctuation (dot, comma, question mark, etc.)? I do not understand this Western-centric implementation. An alternative implementation in NVDA may be considered: to disable CLDR processing for characters that are unicode punctuation. Have you investigated this path? |
Actually, I don't know why either. Is there currently a better solution than CLDR? @Mazen428 said:
Does the pitch issue continues after disabling the “Include Unicode Consortium data (including emoji) when processing characters and symbols” option? |
CyrilleB79
left a comment
There was a problem hiding this comment.
Hi
If the strategy of modifying the symbol file is validated, there are many things to change anyway.
| # identifier regexp | ||
| # Sentence endings. | ||
| . sentence ending (?<=[^\s.])\.(?=[\"'”’)\s]|$) | ||
| 。 sentence ending (?<=[^\s.])\.(?=[\"'”’)\s]|$) |
There was a problem hiding this comment.
Does the concept of sentence ending period really exist for ideographic period, i.e. is ideographic period used elsewhere than to end a sentence in the languages using it? And if yes, does this regexp really match its usage in this types of languages, i.e. should it be followed by a space and are the simple/double quotes used as in latin writing? I doubt it.
In any case, even if all of these assumptions were true, the regexp does not contain the ideographic period, so it is not correct.
If there is no concept of sentence ending period in languages using ideographs, you should remove the regexp instead and set the 'preserve' parameter to 'always' to avoid pitch issues of the synth in the sentence prosody.
At last, the same regexp is used two times, for normal period and ideographic period. I do not know which rule will be used by NVDA in this case; anyway, it does not make sense.
There was a problem hiding this comment.
Let's decide what to do after users test it.
There was a problem hiding this comment.
I do not know what user can test if ideographic period and English period are mapped to the same regexp. Please clarify it because it makes no sense to me.
There was a problem hiding this comment.
I think the ideographic period should be treated the same as the Latin period. Because the ideographic period is only used in certain languages. However, the Latin period is also used in these languages.
There was a problem hiding this comment.
I think the ideographic period should be treated the same as the Latin period. Because the ideographic period is only used in certain languages. However, the Latin period is also used in these languages.
This is not the point. But it seems we do not understand each other.
What I am saying is that this does not make sense to use the same regexp for two complex symbols in the section complexSymbols:.
You have associated the same regexp for latin and ideographic sentence ending period.
When the text will be parsed and if the corresponding regexp is recognized, NVDA will report it either as latin sentence ending period or as ideographic sentence ending period, but it will choose one way to report. And the other way to report will never be used.
To be more concrete, try to associate a dummy pronunciation to latin sendence ending period and another one to ideographic sentence ending period. Then, try to generate a text to have each case reported. You will never be able to create such text since both regexps are the same.
I hope to have clarified my point now.
There was a problem hiding this comment.
I invited you as a collaborator. Please help me.
| 。 sentence ending (?<=[^\s.])\.(?=[\"'”’)\s]|$) | ||
| ! sentence ending (?<=[^\s!])\!(?=[\"'”’)\s]|$) | ||
| ? sentence ending (?<=[^\s?])\?(?=[\"'”’)\s]|$) | ||
| ؟ sentence ending (?<=[^\s?])\?(?=[\"'”’)\s]|$) |
There was a problem hiding this comment.
Same comments as above for this regexp.
| ؟ sentence ending (?<=[^\s?])\?(?=[\"'”’)\s]|$) | ||
| # Phrase endings. | ||
| ; phrase ending (?<=[^\s;]);(?=\s|$) | ||
| ؛ phrase ending (?<=[^\s;]);(?=\s|$) |
|
|
||
| # Complex symbols | ||
| . sentence ending dot all always | ||
| 。 sentence ending dot all always |
There was a problem hiding this comment.
In English, should have a name indicating the charater name and allowing to differenciate it from English one.
| 。 sentence ending dot all always | |
| 。 sentence ending ideographic period all always |
| 。 sentence ending dot all always | ||
| ! sentence ending bang all always | ||
| ? sentence ending question all always | ||
| ؟ sentence ending question all always |
| ? sentence ending question all always | ||
| ؟ sentence ending question all always | ||
| ; phrase ending semi most always | ||
| ؛ phrase ending semi most always |
| ⌘ mac Command key none | ||
|
|
||
| #Locale specific punctuations | ||
| 。 all |
There was a problem hiding this comment.
Missing replacement.
| 。 all | |
| 。 ideographic period all |
| ، comma all always | ||
| ؟ question all | ||
| ؛ semi most | ||
| 、 comma all always |
There was a problem hiding this comment.
Idem for all these ones: use distinct names with respect to English:
See test results for failed build of commit 716b687f6e |
|
@Mazen428, please test again. |
|
@OzancanKaratas, you write:
For now, I do not need to suggest some code.
And also a new general question I am thinking too: Side note: regarding collaboration, I do not need to be invited to your NVDA repo as a collaborator; I can suggest you some code in this PR if needed. |
|
Actually it makes more sense to remove the problematic marks from the CLDR dictionary. However, I don't know exactly how to do this. Because we need to be able to understand what these marks mean when shown in other languages. For example, although the Latin apostrophe is found in both the CLDR and NVDA dictionary, NVDA primarily uses its own dictionary. This is why I changed the NVDA symbols dictionary. |
|
Actually, I just realize that all the punctuation symbols are in the cldr.dic file, ideographic and arabic ones, but also Latin ones. Thus this changes a bit my questions. You write:
No I think that cldr.dic is auto-generated and mirrors what is determined in the Unicode Consortium. Thus this source should remain as is. So the question that still stands is which symbol files need to be modified?
I assume that Latin punctuation symbols are commonly used in Arabic or Chinese whereas the contrary is not true: Arabic and Chinese punctuation symbols are not used in Latin-writing languages. If this assumption is confirmed, the second solution should be implemented, probably by the translators of these languages in their symbol files. Once the targeted symbol file will be identified, we will be able to discuss how to modify its content. |
|
I saw the email you sent to the nvda-translations group. I think for now we should wait for responses and keep this pull request as a draft. |
|
What's the status here? |
|
Problematic symbols should be translated by translators who speak the relevant language. There are no changes to be made in the main symbol file. But I keep this pull request open in case it needs to be changed. |
|
This PR should be closed if the approach is invalid. Feel free to reopen it or a new PR if that changes. |
|
Can you remove the regex matches? It would be good to add the symbols to /en/symbols.dic, but not the semantic matching. I would suggest following Cyrille's review comments |
|
Thanks @OzancanKaratas |
Link to issue number:
Fixes #12097
Fixes #12086
Summary of the issue:
Some punctuation marks in Chinese and Arabic are treated as emoji characters by NVDA.
Description of how this pull request fixes the issue:
This pull request adds those symbols into NVDA's symbols dictionary.
Testing strategy:
Manual test: Chinese and Arab users should download the AppVeyor build and test.
Known issues with pull request:
Wait for test results.
Change log entries:
Bug fixes
Code Review Checklist: