Skip to content

Inherit symbol level for languages other than English#4

Merged
seanbudd merged 1 commit into
nvaccess:mainfrom
CyrilleB79:inheritLevel
Dec 18, 2022
Merged

Inherit symbol level for languages other than English#4
seanbudd merged 1 commit into
nvaccess:mainfrom
CyrilleB79:inheritLevel

Conversation

@CyrilleB79

@CyrilleB79 CyrilleB79 commented Dec 5, 2022

Copy link
Copy Markdown
Contributor

Link to issue number:

Fixes nvaccess/nvda#14417

Summary of the issue:

Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place.
But there is a Hindi CLDR file. Thus the symbol level for symbols such as common punctuation (dot, question marke, etc.) is the one of CLDR, i.e. none.

This is not adapted and it would be better to take advantage of the symbol levels that are defined in the English symbol file.

Description of user facing changes

In NVDA, locale CLDR dic file inherits symbol levels from the files coming after, i.e. English symbols and English CLDR. In case the locale symbol file does not define a character's level, this allows to:

  1. Use the level for this symbol if it is defined there
  2. Use "none" (coming from English CLDR dic file) if the character is not defined in English symbol file but is defined in CLDR.

Description of development approach

  • For all languages except English ("en"), generate the cldr.dic file with "-" in the level field, meaning that the level is inherited from previous files.
  • For English cldr.dic file, use "none" for the symbol level, as it was already before this PR.

Testing strategy:

Manual tests:

Known issues with pull request:

  • Hindi CLDR file seems to define the punctuation names as said in English, but written with Hindi alphabet. I do not know if it is common or not when reading Hindi. Anyway, the content of Hindi CLDR file or of the Hindi symbol file is out-of-scope of this PR.
  • This PR fixes the issue on nvda-cldr side. When this PR is merged, I will still have to make another PR against nvda to use the new commit of this repo.
  • There are other issues in the case the locale symbol file is not only empty but totally missing. This is out-of-scope of this PR and should be fixed on nvda side. I also plan to do this.

Change log entries:

N/A in this repo.

Code Review Checklist:

It rather applies to nvda's repo, but I keep it here in case these checks are found useful.

  • Pull Request description:
    • description is up to date
    • change log entries
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • API is compatible with existing add-ons.
  • Documentation:
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • Security precautions taken.

@CyrilleB79 CyrilleB79 marked this pull request as ready for review December 5, 2022 22:13
@CyrilleB79

Copy link
Copy Markdown
Contributor Author

@seanbudd, @feerrenrut please take into account this request; and let me know if the procedure is OK for this repo. Thanks.

@seanbudd seanbudd self-assigned this Dec 6, 2022

@seanbudd seanbudd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CyrilleB79 for fixes this. LGTM

@seanbudd seanbudd merged commit 8547a1b into nvaccess:main Dec 18, 2022
github-actions Bot pushed a commit that referenced this pull request Dec 18, 2022
Commit message:
Use symbolLevel=none only for English; for other languages, inherit it from the level below. (#4)
Fixes nvaccess/nvda#14417

Summary of the issue:
Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place.
But there is a Hindi CLDR file. Thus the symbol level for symbols such as common punctuation (dot, question marke, etc.) is the one of CLDR, i.e. none.

This is not adapted and it would be better to take advantage of the symbol levels that are defined in the English symbol file.

Description of user facing changes
In NVDA, locale CLDR dic file inherits symbol levels from the files coming after, i.e. English symbols and English CLDR. In case the locale symbol file does not define a character's level, this allows to:

Use the level for this symbol if it is defined there
Use "none" (coming from English CLDR dic file) if the character is not defined in English symbol file but is defined in CLDR.
Description of development approach
For all languages except English ("en"), generate the cldr.dic file with "-" in the level field, meaning that the level is inherited from previous files.
For English cldr.dic file, use "none" for the symbol level, as it was already before this PR.
CyrilleB79 added a commit to CyrilleB79/nvda that referenced this pull request Dec 19, 2022
…evel is set to none.

Fixes nvaccess#14417

Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4
CyrilleB79 added a commit to CyrilleB79/nvda that referenced this pull request Dec 19, 2022
… punctuation level

Fixes nvaccess#14417

Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4
@CyrilleB79

Copy link
Copy Markdown
Contributor Author

Thanks for the merge.

Note: due to the GitHub magic word added in the initial description, nvaccess/nvda#14417 was closed a bit too early. Instead, it should be closed when nvaccess/nvda#14459 is merged.

@seanbudd I think in the future one should not use GitHub magic words to target PRs that are not in the same repository.

@CyrilleB79 CyrilleB79 deleted the inheritLevel branch December 19, 2022 14:51
seanbudd pushed a commit to nvaccess/nvda that referenced this pull request Dec 21, 2022
… punctuation level (#14459)

Fixes #14417

Summary of the issue:
Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place.
But there is a Hindi CLDR file. Thus the symbol level for symbols such as common punctuation (dot, question marke, etc.) is the one of CLDR, i.e. none.
This is not adapted and it would be better to take advantage of the symbol levels that are defined in the English symbol file.

Description of user facing changes
CLDR data will be available for languages which had no symbol file (am, et, kk, ne, th, ur) or empty symbol file (hi). For these languages, since there are no locale symbol file definition, the level defined in the English symbol file will be honoured.

Description of development approach
Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4.
seanbudd pushed a commit to nvaccess/nvda that referenced this pull request Mar 23, 2023
… punctuation level (2nd attempt) (#14558)

A first PR (#14459) had been merged to fix #14417. Unfortunately an issue was found (see #14473) so it has been reverted in #14477.

This PR is a second attempt to fix #14417 without causing #14473. It will remain a draft until I can have more information on #14473 from @OzancanKaratas, as requested in #14473 (comment), or from anyone else able to reproduce.

Link to issue number:
Fixes #14417

Summary of the issue:
Preliminary note for review
Keep in mind the following: in NVDA with CLDR enabled and with no custom user symbol defined, symbol level for symbol X is defined as follows:

look at locale symbol file:
If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, look at next file.
look at locale CLDR file:
If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, look at next file.
look at English symbol file:
If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, look at next file.
look at English CLDR file:
If X is defined in this file and a symbol level is defined for X, then this level applies for X. Else, use default symbol level (don't remember if it is None or All).
Description of the issue
Hindi has no symbol defined in its symbol file, only copyright header; seems that the file was prepared for translation but no actual symbol translation took place. But there is a Hindi CLDR file.

Currently, CLDR files are generated with level "None" for all symbols.

Usually, in locales with a CLDR file and a normal symbol files, less common characters that are only in CLDR are reported at level None, i.e. whatever the punctuation level setting of the user. But common punctuation symbols (dot, question marke, etc.) are added by translators in the locale symbol file what allows to have these symbols reported at a higher punctuation level.

For Hindi (or any language with no current symbol translated), all the characters present in CLDR file are reported at "None" level and above (i.e. at any level), because the level is not redefined in the locale (Hindi) symbol file.

In such situation, using the level of the locale CLDR (None) is not a good strategy. It would be better to take advantage of the levels defined for the symbols in the English symbol file.

Description of user facing changes
CLDR data will be available for languages which had no symbol file (am, et, kk, ne, th, ur) or empty symbol file (hi). For these languages, since there are no locale symbol file definition, the level defined in the English symbol file will be honoured.

Description of development approach
Update nvda-cldr repository to get the changes implemented in nvaccess/nvda-cldr#4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVDA reads punctuation symbols in Indian languages even if punctuation level is set to none

2 participants