Skip to content

symbols: support group references in replacements#11116

Merged
feerrenrut merged 13 commits into
nvaccess:masterfrom
sthibaul:groups
Sep 11, 2020
Merged

symbols: support group references in replacements#11116
feerrenrut merged 13 commits into
nvaccess:masterfrom
sthibaul:groups

Conversation

@sthibaul

@sthibaul sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor

Link to issue number:

fixes #11107

Summary of the issue:

Rules such as dates . are quite complex to write: ((?<=\b\d\d)\.(?=\d\d.(\d{2}|\d{4})\b))|((?<=\b\d\d.\d\d)\.(?=(\d{2}|\d{4})\b)) because one wants to change two pieces of text here, and so we have to match them twice. Other rules are even more complex to write (e.g. 1,20€ -> 1 euro 20 needs two complex symbols to perform the two differing ,->euro ->`` replacements.

Description of how this pull request fixes the issue:

This adds support for regexp group references.

Testing performed:

I tested the use of it in the French translation, included in the commit:

dates .        \b(\d\d)\.(\d\d)\.(\d{2}|\d{4})\b
dates /        \b(\d\d)/(\d\d)/(\d{2}|\d{4})\b
dates .        \1 point \2 point \3        all     always  # point de date
dates /        \1 barre oblique \2 barre oblique \3        all     always  # barre oblique de date

The change uses Python's standard support for group, already used for the complex group parsing.

Known issues with pull request:

It requires checking that no symbols file includes a backslash in the replacement string. I did this, and found two (commented) occurrences in the italian symbols file, fixed by the commit. (and incidently found a typo in the hebrew file)

Change log entry:

New features
Symbol pronunciation: Support grouping in complex symbol definition and group references in replacement rule, making them more powerful and simpler. (#11107)

@josephsl

josephsl commented May 5, 2020 via email

Copy link
Copy Markdown
Contributor

@sthibaul

sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor Author

Hi, this is more towards changes for developers,

Do you mean that I shouldn't have submitted the pull request here, but rather to another repository?

@josephsl

josephsl commented May 5, 2020 via email

Copy link
Copy Markdown
Contributor

@sthibaul

sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor Author

I added some words in the ./devDocs/developerGuide.t2t

@sthibaul

sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor Author

Hi, no, my comment relates to changes entry.

Ah, ok, I reworded it

@sthibaul

sthibaul commented May 5, 2020

Copy link
Copy Markdown
Contributor Author

Hi, no, my comment relates to changes entry.

Maybe https://github.com/nvaccess/nvda/wiki/Github-pull-request-template-explanation-and-examples#change-log-entry should say it explicitly that the text should be meant for users?

@AAClause

AAClause commented May 6, 2020

Copy link
Copy Markdown
Contributor

Hi Samuel :)
It seems to me that modifications made in 'locale/{fr,he,it}/symbols.dic' should be done via Assembla exclusively.
Would you like to join the team of translators?

@sthibaul

sthibaul commented May 6, 2020

Copy link
Copy Markdown
Contributor Author

Heya André :)
I can drop these from the PR, none of them actually have any effect, it was more for documentation.
I don't really plan to join the translators team, I just meant to pass on at least the date example so people get the idea and can then take benefit from it.

@feerrenrut

Copy link
Copy Markdown
Contributor

@josephsl can you comment on whether it will cause problems for the current translation system if changes are accepted to languages other than English?

@josephsl

josephsl commented May 7, 2020 via email

Copy link
Copy Markdown
Contributor

@sthibaul

sthibaul commented May 7, 2020

Copy link
Copy Markdown
Contributor Author

Here there are no english changes, just fixes in a couple of translations, and an improvement in another one.

@sthibaul

sthibaul commented May 7, 2020

Copy link
Copy Markdown
Contributor Author

To make things simpler, I have now dropped the locale/ changes, and we can commit the locale/ fixes separately

@Adriani90

Copy link
Copy Markdown
Collaborator

Will this also fix issue #2676?

@sthibaul

Copy link
Copy Markdown
Contributor Author

Will this also fix issue #2676?

No, it's unrelated, the grouping mentioned here is not of the same kind as in #2676

@sthibaul

Copy link
Copy Markdown
Contributor Author

So, any comment on including this?

I would like to include the same support in speech-dispatcher which uses the same dict files, but I would rather avoid making the file format diverge...

@sthibaul

Copy link
Copy Markdown
Contributor Author

Hello, any comment on this?

@feerrenrut feerrenrut left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @sthibaul and apologies for the delay to review the work.

Comment thread devDocs/developerGuide.t2t Outdated
Comment thread devDocs/developerGuide.t2t Outdated
Comment thread source/characterProcessing.py Outdated
Comment thread source/characterProcessing.py Outdated
@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit c28540f6ab

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit d39809229c

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit 36cd65d3f6

@sthibaul sthibaul force-pushed the groups branch 2 times, most recently from 9b6c2e3 to 0b833e6 Compare June 11, 2020 18:32
@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit 5fdd52145c

…merge

Also drop the test for now, will be commited after mergin the translator
update.
@sthibaul

Copy link
Copy Markdown
Contributor Author

Ok, we have sorted out the french translation question. Michel has commited reverting to not using groups, and using norep instead of always.

In this pull request I have dropped the fr_FR test, so that it can be commited without waiting for the french translation update.

When this pull request is merged, Michel will be able to update the french translation to use the group support, and then I'll submit another pull request to add a test for it.

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit ddc8590040

@CyrilleB79

Copy link
Copy Markdown
Contributor

I did not review the code. However I have some remarks:

  • At first glance, the provided example does not seem realistic: why would a user replace the dot by space when the punctuation level is set to all? I would have expected something to be announced instead at this high level. The replacement rule could be for example (in French):
    dates . \1 point \2 point \3 all always # point de date
    Maybe I have missed something, so please add an indication to let the reader know the context, i.e. why this rule was added in the French symbol file. @MichelSuch?
  • Also the semantic of this rule has changed:
    • Before the rule was matching a dot in the date and replacing it:
      dates . point all norep # point de date
    • Now it matches the whole date and replace it with group substitution:
      dates . \1 \2 \3 all norep # point de date
      Thus I suggest renaming the rule from "dates ." to "date with ." in the example as well as in the translation repo French file when this file will have to be modified (after this PR is merged). Also the associated comment should specify "date avec point" (i.e. "date with dot") instead of "point de date" (i.e. "date dot")
  • Regarding change log entry something more precise would help such as:
    Symbol pronunciation: Support grouping in complex symbol definition and group references in replacement rule

@sthibaul

sthibaul commented Aug 21, 2020

Copy link
Copy Markdown
Contributor Author
* why would a user replace the dot by space when the punctuation level is set to all?

Please read the latest version of the pull request, not the initial version. The initial version was using just what was actually in fr/symbols.dic, which happened to be doing exactly that, that's why I had initially just used it as such. That's one of the clarification points we had with Michel. The latest version of the pull request takes as example to replace the dots with "point".

I would have expected something to be announced instead at this high level. The replacement rule could be for example (in French):
dates . \1 point \2 point \3 all always # point de date

That's exactly what is currently in the pull request (but with norep, not always, which would introduce odd effects, that was also part of the clarification points with Michel)

* Also the semantic of this rule has changed:
  
  * Before the rule was matching a dot in the date and replacing it:
    `dates .	 point 	all	norep	# point de date`

No it was not, there was no "point" word there. Against that's part of what Michel fixed.

  * Now it matches the whole date and replace it with group substitution:
    `dates .	\1 \2 \3 	all	norep	# point de date`

Yes, and this is the whole point of this feature: avoid having to use a complex double-case rule, and instead have just one simple rule, with exactly the same effect as before (and the "expected effect" in the french translation was unclear, Michel now cleared up, to appear in git).

    Thus I suggest renaming the rule from "dates ." to "date with ." in the example as well as in the translation repo French file when this file will have to be modified (after this PR is merged).

Also the associated comment should specify "date avec point" (i.e. "date with dot") instead of "point de date" (i.e. "date dot")

As you wish, these parts are up to translator to see what they prefer, I just took what was already there.

@sthibaul

sthibaul commented Aug 21, 2020

Copy link
Copy Markdown
Contributor Author
Thus I suggest renaming the rule from "dates ." to "date with ." in the example as well as in the translation repo French file when this file will have to be modified (after this PR is merged).

Also the associated comment should specify "date avec point" (i.e. "date with dot") instead of "point de date" (i.e. "date dot")

As you wish, these parts are up to translator to see what they prefer, I just took what was already there.

Ah, sorry, rereading it I see what you mean: since the effect of the rule is on the whole date and not on the dot, we should rename the name and comment of the rule accordingly. I have now done so.

Comment thread devDocs/developerGuide.t2t Outdated
@CyrilleB79

Copy link
Copy Markdown
Contributor

Also two other thoughts:

  • In the symbol and punctuation dialog, the user may edit rules of complex symbols with or without matching groups. However there is nothing to inform the user if the replacement pattern should contain references and how many. The only place where I imagine this information could be written is in the comment of the rule. What do you think?
  • Maybe the user guide should also be updated in section 12.2.2. The following info may be added:
    • in the replacement field, the backslash should be escaped (doubled)
    • in the replacement field, \1, \2, etc. should be used to reference 1st, 2nd, etc. matching group
    • indicate that the comment should give an indication on the number of the matching groups and on what is expected for the placeholders if required (if this is the accepted solution)

@sthibaul

Copy link
Copy Markdown
Contributor Author
* In the symbol and punctuation dialog, the user may edit rules of complex symbols with or without matching groups. However there is nothing to inform the user if the replacement pattern should contain references and how many.

I'm not sure to understand: that dialog box contains the Pattern and the Replacement next to each other, is there really a need to make people document how the group replacement works? If they were written separately, I would agree that the interface needs to be documented somewhere, but from what I understand Pattern and Replacement are written together?

* Maybe the user guide should also be updated in section 12.2.2.

Right, I have added some text.

@CyrilleB79

Copy link
Copy Markdown
Contributor
* In the symbol and punctuation dialog, the user may edit rules of complex symbols with or without matching groups. However there is nothing to inform the user if the replacement pattern should contain references and how many.

I'm not sure to understand: that dialog box contains the Pattern and the Replacement next to each other, is there really a need to make people document how the group replacement works? If they were written separately, I would agree that the interface needs to be documented somewhere, but from what I understand Pattern and Replacement are written together?

The symbol dialog box contains only the rules. For simple symbols, it actually contains the pattern and the replacement next to each other on each line. But for complex symbols, you only have the name of the complex symbol (e.g. . sentence ending) and the replacement (e.g. dot). You do not have the pattern defining the complex symbol. The complex symbol definition can only be defined or modified in the symbols.dic file, not in the dialog box.

* Maybe the user guide should also be updated in section 12.2.2.

Right, I have added some text.

Seems you made the addition in the wrong paragraph, i.e. the paragraph about dictionaries instead of the paragraph about symbol dialog.

@sthibaul

Copy link
Copy Markdown
Contributor Author

Right, I have added some text.

Seems you made the addition in the wrong paragraph, i.e. the paragraph about dictionaries instead of the paragraph about symbol dialog.

Ah, that's why I was confused in how it would work. The user would indeed need to be notified somehow which group corresponds to what. I have added an initial paragraph, but rule comments would have to specify it indeed

@sthibaul

sthibaul commented Sep 6, 2020

Copy link
Copy Markdown
Contributor Author

Any news on this? @feerrenrut perhaps? The french translation question has been sorted out, we will update it after this gets commited.

@feerrenrut

Copy link
Copy Markdown
Contributor

Ok, thanks @sthibaul so I can take it that this is no longer blocked by the French translation and is ready for review and merging?

feerrenrut
feerrenrut previously approved these changes Sep 10, 2020
@sthibaul

Copy link
Copy Markdown
Contributor Author

Ok, thanks @sthibaul so I can take it that this is no longer blocked by the French translation and is ready for review and merging?

That's it, yes!

@feerrenrut feerrenrut merged commit d25e3de into nvaccess:master Sep 11, 2020
@nvaccessAuto nvaccessAuto added this to the 2020.4 milestone Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding support for group matches in complexSymbols?

10 participants