Hyphenate Braille using pyphen by LeonarddeR · Pull Request #19916 · nvaccess/nvda

LeonarddeR · 2026-04-07T19:59:12Z

Link to issue number:

Summary of the issue:

Word wrap is sometimes pretty aggressive, especially on shorter braille displays.

Description of user facing changes:

The boolean "word wrap" option in the braille settings has been replaced with a four-valued Text wrap option, giving finer-grained control over how words are broken when they don't fit on the display. The four choices are:

Off — Wrap at the raw edge of the display, cutting words in the middle if necessary. No visual indication that a word was cut.
Show mark when words are cut — Wrap at the raw edge, but whenever a word is cut mid-way, replace the last cell of the row with a continuation mark (braille dots 7-8) so the reader knows the word continues on the next row.
At word boundaries — Prefer breaking at spaces. If no space fits on the row, fall back to cutting the word and showing the continuation mark.
At word or syllable boundaries — As above, but when a word is too long to fit, try to split it at a syllable boundary (using hyphenation dictionaries from the pyphen library) so less of the word spills onto the next row. NVDA marks the split with braille dots 7-8, not a printed hyphen, because braille conventions use word division rather than print-style hyphenation.

Whenever a word is cut mid-way across rows — regardless of which mode is selected — the cut is now marked with the continuation symbol. This makes it easy to tell at a glance whether a row ends cleanly at a space or carries over into the next row.

Existing user profiles with the old wordWrap = True / wordWrap = False setting are automatically upgraded: True becomes "At word boundaries" and False becomes "Off".

Description of developer facing changes:

The deprecated braille.wordWrap boolean is bridged to the new braille.textWrap feature flag in both directions via _linkDeprecatedValues, so add-ons that still read or write the old key keep working (with a deprecation warning).

Description of development approach:

Feature flag. Added BrailleTextWrapFlag with members DEFAULT, NONE, MARK_WORD_CUTS, AT_WORD_BOUNDARIES, AT_WORD_OR_SYLLABLE_BOUNDARIES. The default behaviour is AT_WORD_OR_SYLLABLE_BOUNDARIES.
Unified continuation marker. The continuation mark consistently means "a word was cut here" across all modes.
Hyphenation module. New textUtils.hyphenation module wraps the pyphen library. getHyphenPositions(text, locale) returns an empty tuple for locales without a pyphen dictionary (logging once at debug level per locale), so the wrap logic falls back cleanly to word-boundary behaviour without raising.
Region language tracking. Region._languageIndexes records language changes within a braille region so hyphenation can be performed in the correct language when regions contain multilingual content.
Frozen builds. A py2exe hook (_hook_pyphen in source/setup.py) bundles pyphen's *.dic files into dist/pyphenDictionaries/ and rewrites pyphen's dictionary lookup path at freeze time. Only the .dic files are included — README files are skipped.
Profile upgrade. upgradeConfigFrom_22_to_23 maps the old wordWrap boolean to the new textWrap string enum.

Testing strategy:

Automated unit tests cover:

All four wrap modes in _calculateWindowRowBufferOffsets, including the case where no whitespace fits on the row, the syllable-boundary success path, the fallback when no syllable boundary fits before the display edge, and the unknown-language case.
Continuation-marker rendering in _get_windowBrailleCells.
Region language-index bookkeeping: default language lookup, _addFieldText inserting switch/restore entries when a field is in a different language, _addTextWithFields handling a formatChange command with a language attribute, and TextInfoRegion.update resetting the language index across updates.
textUtils.hyphenation.getHyphenPositions for both a known language (en_US) and an unknown one (returns () without raising).

Manual testing: loaded a pre-upgrade profile with wordWrap = True/False and confirmed the profile upgrade writes the expected textWrap value and that the braille settings panel shows the matching label; confirmed scons.bat dist produces dist/pyphenDictionaries/ containing only hyph_*.dic files.

Known issues with pull request:.

Unit tests were written by AI and are a bit difficult to parse, though the behavior has been manually tested too and the unit tests ensure that the behavior stays stable.

Code Review Checklist:

…tation Agent-Logs-Url: https://github.com/LeonarddeR/nvda/sessions/3c0c92ab-c024-44d3-bd6a-c7d6c3a92364 Co-authored-by: LeonarddeR <3049216+LeonarddeR@users.noreply.github.com>

cary-rowen · 2026-04-14T09:59:41Z

Once the Chinese word segmentation PR is merged, will it be possible to use its rules to handle Chinese line breaks within this new text wrap framework

LeonarddeR · 2026-04-14T10:01:50Z

I have no idea honestly. That is something we'd need to find out after that is merged.

Replace BrailleTextWrap IntEnum with BrailleTextWrapFlag feature flag stored via featureFlag config spec, mirroring reviewRoutingMovesSystemCaret. Rename members to NONE, MARK_WORD_CUTS, AT_WORD_BOUNDARIES, AT_WORD_OR_SYLLABLE_BOUNDARIES for clarity (braille uses word division, not print hyphenation). Unify continuation-marker semantics under rule A: the marker now fires on any mid-word row end regardless of mode, including the no-whitespace fallback in AT_WORD_BOUNDARIES/AT_WORD_OR_SYLLABLE_BOUNDARIES. Handle unknown languages gracefully in getHyphenPositions by returning an empty tuple and logging once per locale. Update profile upgrade, deprecation bridge for wordWrap, settings dialog (FeatureFlagCombo), and user guide.

… region language, and hyphenation Update test_calculateWindowRowBufferOffsets for the renamed BrailleTextWrapFlag feature flag and add tests #1-nvaccess#8 covering NONE, MARK_WORD_CUTS, AT_WORD_BOUNDARIES (including the rule-A marker fix for the no-whitespace fallback), and AT_WORD_OR_SYLLABLE_BOUNDARIES (success, empty positions, past-edge position, unknown language). Add test_windowBrailleCells for CONTINUATION_SHAPE rendering (nvaccess#9-nvaccess#10). Add test_regionLanguageIndexes for Region._languageIndexes defaults, _addFieldText switch/restore entries, _addTextWithFields formatChange language handling, and TextInfoRegion.update reset (nvaccess#11-nvaccess#14). Add test_hyphenation for getHyphenPositions with known and unknown locales (nvaccess#15-nvaccess#16).

LeonarddeR · 2026-04-20T19:38:31Z

I guess that when the chinese work is merged, we can fallback to that in the hyphenation module.

Patch auto-properties (rawToBraillePos, brailleToRawPos) on the buffer instance instead of the class — they are non-data Getter descriptors via AutoPropertyObject, so instance attributes shadow them directly. Add comments explaining the mocking strategy for syllable-boundary isolation and the side_effect=RuntimeError pattern for halting update() mid-method. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

…ject rawToBraillePos/brailleToRawPos are non-data Getter descriptors, so instance assignment shadows them directly. Cleanup in tearDown. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces a new braille “Text wrap” setting with optional continuation markers and syllable-aware wrapping (via pyphen), replacing the old boolean wordWrap while preserving add-on compatibility through deprecated-key bridging.

Changes:

Add BrailleTextWrapFlag feature-flag setting and update braille wrapping logic to support 4 wrap modes plus a continuation indicator.
Add locale-aware hyphenation support via new textUtils.hyphenation wrapper around pyphen, including py2exe bundling for frozen builds.
Update GUI, documentation, config schema upgrade, and add unit tests for wrap behavior, continuation rendering, hyphenation, and language tracking.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`uv.lock`	Adds `pyphen` dependency to the locked environment.
`pyproject.toml`	Declares `pyphen` dependency for builds.
`source/textUtils/hyphenation.py`	New hyphenation utility module wrapping `pyphen` with locale fallback/logging.
`source/braille.py`	Implements new wrap modes, continuation marker rendering, and region language tracking for correct hyphenation locale.
`source/config/featureFlagEnums.py`	Adds `BrailleTextWrapFlag` enum with display strings for the GUI.
`source/config/configSpec.py`	Bumps schema version and adds `braille.textWrap` featureFlag (keeps deprecated `wordWrap`).
`source/config/profileUpgradeSteps.py`	Adds upgrade step mapping old `wordWrap` to new `textWrap`.
`source/config/__init__.py`	Enables and implements deprecated config key bridging between `wordWrap` and `textWrap`.
`source/gui/settingsDialogs.py`	Replaces old checkbox with a `FeatureFlagCombo` for Text wrap.
`source/louisHelper.py`	Adds helper to get braille table language for default region language.
`source/setup.py`	Adds py2exe hook to bundle `pyphen` dictionaries and rewrite lookup path in frozen builds.
`user_docs/en/userGuide.md`	Documents the new Text wrap setting and its behaviors.
`user_docs/en/changes.md`	Adds changelog and deprecations notes for text wrap changes.
`tests/unit/test_hyphenation.py`	Tests hyphenation positions for known/unknown locales.
`tests/unit/test_braille/test_calculateWindowRowBufferOffsets.py`	Expands tests to cover all wrap modes and syllable-boundary behavior.
`tests/unit/test_braille/test_windowBrailleCells.py`	Tests continuation marker rendering in window cells.
`tests/unit/test_braille/test_regionLanguageIndexes.py`	Tests language index tracking for multilingual regions.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

- braille.py: guard MARK_WORD_CUTS continuation mark behind `end < bufferEnd` to prevent a phantom mark on the final row when the buffer ends exactly at the display edge - config/__init__.py: fix wordWrap→textWrap bridge writing a raw string into _cache; now validates through the spec so the cache holds a proper FeatureFlag object, matching what __setitem__ normally stores - userGuide.md: rephrase "Off" description — text is cut at the display edge (not "not wrapped"), just without a continuation mark - test_calculateWindowRowBufferOffsets.py: fix two tests that expected end positions without room for the continuation marker; _get_windowBrailleCells only appends the marker when remaining > 0, so end must be numCols - 1 to leave space Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Introduce cacheVal alongside val so the wordWrap→textWrap bridge can store a string in the profile and a validated FeatureFlag in the cache without duplicating the _getUpdateSection/_cache calls or returning early. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

seanbudd · 2026-05-15T02:35:22Z

This is a very large PR. Could you could please consider splitting it up to 2-3 pieces? The simplest split might just be TDD, opening a PR with the just the tests, with class level skips on them.

LeonarddeR · 2026-05-16T07:51:36Z

Closing in favour of three focused PRs split at @seanbudd's request:

Add pyphen-based hyphenation abstraction layer #20145 — pyphen abstraction layer (textUtils.hyphenation) + py2exe hook. No NVDA coupling; can merge independently.
Add braille text wrap modes with continuation marks #20146 — braille text wrap refactor: replaces the wordWrap boolean with a BrailleTextWrapFlag feature flag (Off / Show mark when words are cut / At word boundaries), continuation mark rendering, profile upgrade v22→v23, GUI combo box. Can merge independently.
A third PR — AT_WORD_OR_SYLLABLE_BOUNDARIES mode + per-region language tracking — will be opened once Add pyphen-based hyphenation abstraction layer #20145 and Add braille text wrap modes with continuation marks #20146 are merged and rebased.

Part of #17010. Split out from #19916 at reviewer request — this is the first of three PRs. The braille text-wrap refactor and syllable-aware wrap mode follow in separate PRs. Summary of the issue: NVDA lacks a locale-aware hyphenation API. One is needed to implement syllable-boundary braille text wrapping, so that long words can be broken at linguistically correct positions rather than always at the raw display edge. Description of user facing changes: None. This PR adds internal infrastructure only; no behaviour changes for users. Description of developer facing changes: New public function in textUtils.hyphenation: def getHyphenPositions(text: str, locale: str) -> tuple[int, ...] Returns the character offsets within text at which a hyphen may be inserted for the given locale. Returns an empty tuple for locales without a pyphen dictionary, logging a debug message once per locale per map lifetime, so callers can fall back cleanly without raising. A py2exe hook in source/setup.py bundles pyphen's hyph_*.dic files into dist/pyphenDictionaries/ and rewrites pyphen's dictionary lookup path at freeze time so the dictionaries are accessible in frozen builds. Description of development approach: LocaleDataMap (already used in NVDA for locale-aware character processing) handles locale fallback and caching. The _pyphenFactory function deliberately rejects region-subtag fallbacks — e.g. it will not silently serve en dictionaries for an en_US lookup — delegating that fallback logic to LocaleDataMap so region matching stays consistent with the rest of NVDA's locale handling.

Part of #17010. Split out from #19916 at reviewer request — this is the second of three PRs. The pyphen abstraction layer was shipped in #20145; the syllable-aware wrap mode follows in a separate PR. Summary of the issue: The braille word-wrap setting is a single boolean that gives users no control over how words are broken at the display edge. When a word is cut mid-way there is also no visual indication that the word continues on the next row. Description of user facing changes: The boolean "word wrap" checkbox in the braille settings has been replaced with a Text wrap combo box with three choices: Off — Wrap at the raw edge of the display, cutting words in the middle if necessary. No visual indication that a word was cut. Show mark when words are cut — Wrap at the raw edge, but whenever a word is cut mid-way, replace the last cell of the row with a continuation mark (braille dots 7-8) so the reader knows the word continues on the next row. At word boundaries — Prefer breaking at spaces. If no space fits on the row, fall back to cutting the word and showing the continuation mark. Existing config profiles with the old setting are automatically upgraded. Description of developer facing changes: BrailleTextWrapFlag feature flag enum added to config.featureFlagEnums with members DEFAULT, NONE, MARK_WORD_CUTS, AT_WORD_BOUNDARIES. Config schema bumped v22 → v23; old wordWrap boolean is deprecated and bridged bidirectionally to textWrap via _linkDeprecatedValues, so add-ons reading or writing the old key keep working (with a deprecation warning). CONTINUATION_SHAPE = 0xC0 (dots 7-8) constant added to braille. _WindowRowPositions frozen dataclass added to braille to hold the start/end buffer positions and continuation-mark flag for each row of the braille window, replacing the previous anonymous tuple. Description of development approach: The continuation mark is unified: it consistently means "a word was cut here" regardless of mode, so readers get a predictable signal. BrailleBuffer._calculateWindowRowBufferOffsets is extended to implement all three modes. Each entry in _windowRowBufferOffsets is a _WindowRowPositions instance whose showContinuationMark field records whether that row needs a continuation mark. BrailleBuffer._get_windowBrailleCells reads that flag to insert the mark. BrailleBuffer._set_windowEndPos short-circuits space-seeking for NONE and MARK_WORD_CUTS modes (backwards scroll alignment).

) Closes #17010 Follow-up for #20146 and #20145. This is the last of three PRs replacing #19916. Summary of the issue: Word wrap is sometimes pretty aggressive, especially on shorter braille displays. The previous two PRs added the text wrap infrastructure and continuation marks; this PR adds the final mode that splits long words at syllable boundaries using hyphenation dictionaries. Description of user facing changes: A fourth option, At word or syllable boundaries, is added to the Text wrap combo box in braille settings. Like "At word boundaries", it avoids splitting words mid-way, but when a word is too long to fit on the display it additionally tries to split at a syllable boundary (using hyphenation dictionaries from the pyphen library) so less of the word spills onto the next row. NVDA marks the split with the continuation mark (braille dots 7-8). For locales without a pyphen dictionary, the mode falls back cleanly to word-boundary behaviour without any error. Description of developer facing changes: BrailleTextWrapFlag.AT_WORD_OR_SYLLABLE_BOUNDARIES member added to config.featureFlagEnums. Region._languageIndexes (dict[int, str]) tracks language-span boundaries within a braille region. Populated during _addFieldText and _addTextWithFields when format fields carry a language attribute or when field text is in a different language than the surrounding content. Region._getLanguageAtPos(pos) looks up the language at a raw-text offset using a bisect on the (always-ascending) keys of _languageIndexes. BrailleBuffer._getLanguageAtBufferPos(pos) delegates to the region that owns that braille cell. louisHelper.getTableLanguage(table) queries louis.getTableInfo for the "language" key and normalises the result, providing the default language for a region when no format-field language is known. Description of development approach: When AT_WORD_OR_SYLLABLE_BOUNDARIES is selected and a word straddles a row boundary, _calculateWindowRowBufferOffsets already finds the last space before the display edge. This PR adds a second pass: it looks up the full word (from that space to the next space), retrieves the language at the word's braille position, and calls textUtils.hyphenation.getHyphenPositions (introduced in #20145) to obtain candidate hyphen offsets. It then iterates the candidates from the end (closest to the display edge) and picks the first that falls within the current row, updating end accordingly and setting showContinuationMark. Language tracking in Region ensures that the correct pyphen dictionary is selected even when a braille region contains multilingual content (e.g. a paragraph with inline foreign phrases).

LeonarddeR and others added 9 commits April 3, 2026 16:45

Add pyphen

90ac1ca

Add a hyphenation module

792938b

Add text wrap

e9df289

Config update

0e79ac7

Add language annotations to regions

bd889fa

Fix continuation stuff

3cd2347

userGuide: replace word wrap section with text wrap combo box documen…

098627d

…tation Agent-Logs-Url: https://github.com/LeonarddeR/nvda/sessions/3c0c92ab-c024-44d3-bd6a-c7d6c3a92364 Co-authored-by: LeonarddeR <3049216+LeonarddeR@users.noreply.github.com>

Update user guide

ccd60a1

Pre-commit auto-fix

3f1029e

seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Apr 10, 2026

seanbudd requested a review from SaschaCowley April 10, 2026 00:50

LeonarddeR added 2 commits April 14, 2026 08:05

No longer exclude bisect

903959c

Merge branch 'pyphen' of https://github.com/leonardder/nvda into pyphen

44685d5

LeonarddeR added 7 commits April 20, 2026 18:27

Merge branch 'master' into pyphen

3cda5ba

Fixup config spec

f818f62

Add a hook to setup.py

c0b2b97

Update tests

e2c9e4c

changes: document braille text wrap and hyphenation (nvaccess#17010)

e0d5810

LeonarddeR changed the title ~~Proof of concept: Hyphenate Braille using pyphen~~ Hyphenate Braille using pyphen Apr 20, 2026

LeonarddeR and others added 6 commits April 25, 2026 14:41

Merge remote-tracking branch 'origin/master' into pyphen

4020b2f

Use direct assignment for auto-property overrides instead of patch.ob…

d99919e

…ject rawToBraillePos/brailleToRawPos are non-data Getter descriptors, so instance assignment shadows them directly. Cleanup in tearDown. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into pyphen

124ea0c

Fix window start pos

82cdf54

docs: replace "word wrap" with "text wrap" in comments and test names

745536f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

LeonarddeR requested a review from a team as a code owner May 2, 2026 14:50

LeonarddeR requested review from Qchristensen and Copilot May 2, 2026 14:50

Copilot started reviewing on behalf of LeonarddeR May 2, 2026 14:50 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

LeonarddeR and others added 3 commits May 2, 2026 18:26

Fix typo: hyphenationMap

6432ea9

Potential fix for pull request finding

929a436

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Pre-commit auto-fix

1079808

LeonarddeR marked this pull request as draft May 2, 2026 16:33

LeonarddeR and others added 5 commits May 2, 2026 18:34

SMall fixups

8331868

Merge branch 'pyphen' of https://github.com/leonardder/nvda into pyphen

951d01a

Merge remote-tracking branch 'origin/master' into pyphen

312d0e2

LeonarddeR marked this pull request as ready for review May 5, 2026 21:19

seanbudd added this to the 2026.3 milestone May 11, 2026

seanbudd added the merge-early Merge Early in a developer cycle label May 11, 2026

LeonarddeR added 2 commits May 13, 2026 19:59

Merge remote-tracking branch 'origin/master' into pyphen

3d164c5

Update changes

79ac204

seanbudd marked this pull request as draft May 15, 2026 02:40

Merge remote-tracking branch 'origin/master' into pyphen

ad108c9

This was referenced May 16, 2026

Add pyphen-based hyphenation abstraction layer #20145

Merged

Add braille text wrap modes with continuation marks #20146

Merged

LeonarddeR closed this May 16, 2026

LeonarddeR mentioned this pull request May 20, 2026

Add syllable-boundary braille text wrap using pyphen hyphenation #20186

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hyphenate Braille using pyphen#19916

Hyphenate Braille using pyphen#19916
LeonarddeR wants to merge 35 commits into
nvaccess:masterfrom
LeonarddeR:pyphen

LeonarddeR commented Apr 7, 2026 •

edited

Loading

Uh oh!

cary-rowen commented Apr 14, 2026

Uh oh!

LeonarddeR commented Apr 14, 2026

Uh oh!

LeonarddeR commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seanbudd commented May 15, 2026

Uh oh!

LeonarddeR commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

LeonarddeR commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:.

Code Review Checklist:

Uh oh!

cary-rowen commented Apr 14, 2026

Uh oh!

LeonarddeR commented Apr 14, 2026

Uh oh!

LeonarddeR commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seanbudd commented May 15, 2026

Uh oh!

LeonarddeR commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LeonarddeR commented Apr 7, 2026 •

edited

Loading