Inherit symbol level for languages other than English by CyrilleB79 · Pull Request #4 · nvaccess/nvda-cldr

CyrilleB79 · 2022-12-05T16:36:04Z

Link to issue number:

Summary of the issue:

Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place.
But there is a Hindi CLDR file. Thus the symbol level for symbols such as common punctuation (dot, question marke, etc.) is the one of CLDR, i.e. none.

This is not adapted and it would be better to take advantage of the symbol levels that are defined in the English symbol file.

Description of user facing changes

In NVDA, locale CLDR dic file inherits symbol levels from the files coming after, i.e. English symbols and English CLDR. In case the locale symbol file does not define a character's level, this allows to:

Use the level for this symbol if it is defined there
Use "none" (coming from English CLDR dic file) if the character is not defined in English symbol file but is defined in CLDR.

Description of development approach

For all languages except English ("en"), generate the cldr.dic file with "-" in the level field, meaning that the level is inherited from previous files.
For English cldr.dic file, use "none" for the symbol level, as it was already before this PR.

Testing strategy:

Manual tests:

Generate cldr.dic files with:
py -3.7-32 build.py
Copy all these files in NVDA sources
Removed all the French symbol file content except commented header lines (better for me to test in French than in Hindi)
Checked that symbols are reported in French at higher punctuation levels
Also checked that I do not hear English punctuation when reading Hindi example in NVDA reads punctuation symbols in Indian languages even if punctuation level is set to none nvda#14417

Known issues with pull request:

Hindi CLDR file seems to define the punctuation names as said in English, but written with Hindi alphabet. I do not know if it is common or not when reading Hindi. Anyway, the content of Hindi CLDR file or of the Hindi symbol file is out-of-scope of this PR.
This PR fixes the issue on nvda-cldr side. When this PR is merged, I will still have to make another PR against nvda to use the new commit of this repo.
There are other issues in the case the locale symbol file is not only empty but totally missing. This is out-of-scope of this PR and should be fixed on nvda side. I also plan to do this.

Change log entries:

N/A in this repo.

Code Review Checklist:

It rather applies to nvda's repo, but I keep it here in case these checks are found useful.

Pull Request description:
- description is up to date
- change log entries
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
API is compatible with existing add-ons.
Documentation:
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
Security precautions taken.

…t from the level below.

CyrilleB79 · 2022-12-05T22:14:37Z

@seanbudd, @feerrenrut please take into account this request; and let me know if the procedure is OK for this repo. Thanks.

seanbudd

Thanks @CyrilleB79 for fixes this. LGTM

Commit message: Use symbolLevel=none only for English; for other languages, inherit it from the level below. (#4) Fixes nvaccess/nvda#14417 Summary of the issue: Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place. But there is a Hindi CLDR file. Thus the symbol level for symbols such as common punctuation (dot, question marke, etc.) is the one of CLDR, i.e. none. This is not adapted and it would be better to take advantage of the symbol levels that are defined in the English symbol file. Description of user facing changes In NVDA, locale CLDR dic file inherits symbol levels from the files coming after, i.e. English symbols and English CLDR. In case the locale symbol file does not define a character's level, this allows to: Use the level for this symbol if it is defined there Use "none" (coming from English CLDR dic file) if the character is not defined in English symbol file but is defined in CLDR. Description of development approach For all languages except English ("en"), generate the cldr.dic file with "-" in the level field, meaning that the level is inherited from previous files. For English cldr.dic file, use "none" for the symbol level, as it was already before this PR.

…evel is set to none. Fixes nvaccess#14417 Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4

… punctuation level Fixes nvaccess#14417 Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4

CyrilleB79 · 2022-12-19T14:47:05Z

Thanks for the merge.

Note: due to the GitHub magic word added in the initial description, nvaccess/nvda#14417 was closed a bit too early. Instead, it should be closed when nvaccess/nvda#14459 is merged.

@seanbudd I think in the future one should not use GitHub magic words to target PRs that are not in the same repository.

… punctuation level (#14459) Fixes #14417 Summary of the issue: Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place. But there is a Hindi CLDR file. Thus the symbol level for symbols such as common punctuation (dot, question marke, etc.) is the one of CLDR, i.e. none. This is not adapted and it would be better to take advantage of the symbol levels that are defined in the English symbol file. Description of user facing changes CLDR data will be available for languages which had no symbol file (am, et, kk, ne, th, ur) or empty symbol file (hi). For these languages, since there are no locale symbol file definition, the level defined in the English symbol file will be honoured. Description of development approach Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4.

@OzancanKaratas

… punctuation level (2nd attempt) (#14558) A first PR (#14459) had been merged to fix #14417. Unfortunately an issue was found (see #14473) so it has been reverted in #14477. This PR is a second attempt to fix #14417 without causing #14473. It will remain a draft until I can have more information on #14473 from @OzancanKaratas, as requested in #14473 (comment), or from anyone else able to reproduce. Link to issue number: Fixes #14417 Summary of the issue: Preliminary note for review Keep in mind the following: in NVDA with CLDR enabled and with no custom user symbol defined, symbol level for symbol X is defined as follows: look at locale symbol file: If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, look at next file. look at locale CLDR file: If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, look at next file. look at English symbol file: If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, look at next file. look at English CLDR file: If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, use default symbol level (don't remember if it is None or All). Description of the issue Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place. But there is a Hindi CLDR file. Currently, CLDR files are generated with level "None" for all symbols. Usually, in locales with a CLDR file and a normal symbol files, less common characters that are only in CLDR are reported at level None, i.e. whatever the punctuation level setting of the user. But common punctuation symbols (dot, question marke, etc.) are added by translators in the locale symbol file what allows to have these symbols reported at a higher punctuation level. For Hindi (or any language with no current symbol translated), all the characters present in CLDR file are reported at "None" level and above (i.e. at any level), because the level is not redefined in the locale (Hindi) symbol file. In such situation, using the level of the locale CLDR (None) is not a good strategy. It would be better to take advantage of the levels defined for the symbols in the English symbol file. Description of user facing changes CLDR data will be available for languages which had no symbol file (am, et, kk, ne, th, ur) or empty symbol file (hi). For these languages, since there are no locale symbol file definition, the level defined in the English symbol file will be honoured. Description of development approach Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4.

Use symbolLevel=none only for English; for other languages, inherit i…

bcd919e

…t from the level below.

CyrilleB79 marked this pull request as ready for review December 5, 2022 22:13

seanbudd self-assigned this Dec 6, 2022

seanbudd approved these changes Dec 18, 2022

View reviewed changes

seanbudd merged commit 8547a1b into nvaccess:main Dec 18, 2022

CyrilleB79 mentioned this pull request Dec 19, 2022

In Hindi, NVDA will not read anymore punctuation symbols whatever the punctuation level nvaccess/nvda#14459

Merged

6 tasks

CyrilleB79 deleted the inheritLevel branch December 19, 2022 14:51

CyrilleB79 mentioned this pull request Jan 17, 2023

In Hindi, NVDA will not read anymore punctuation symbols whatever the punctuation level (2nd attempt) nvaccess/nvda#14558

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inherit symbol level for languages other than English#4

Inherit symbol level for languages other than English#4
seanbudd merged 1 commit into
nvaccess:mainfrom
CyrilleB79:inheritLevel

CyrilleB79 commented Dec 5, 2022 •

edited by seanbudd

Loading

Uh oh!

CyrilleB79 commented Dec 5, 2022

Uh oh!

seanbudd left a comment

Uh oh!

CyrilleB79 commented Dec 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CyrilleB79 commented Dec 5, 2022 • edited by seanbudd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Change log entries:

Code Review Checklist:

Uh oh!

CyrilleB79 commented Dec 5, 2022

Uh oh!

seanbudd left a comment

Choose a reason for hiding this comment

Uh oh!

CyrilleB79 commented Dec 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CyrilleB79 commented Dec 5, 2022 •

edited by seanbudd

Loading