Introduce cppjieba as a submodule for Chinese word segmentation by CrazySteve0605 · Pull Request #18548 · nvaccess/nvda

CrazySteve0605 · 2025-07-24T02:27:21Z

Introduce cppjieba, an NLP-based Chinese tokenizer, for implementing Chinese word navigation and braille output.

cppjieba is the C++ version of the popular Python library jieba, offering near-identical runtime performance.

A detialed integration analysis

1. Current State of Chinese Segmentation Tools

Chinese segmentation techniques trade accuracy for runtime and memory in predictable ways:

Dictionary-based (fast, lower OOV coverage). Libraries such as Jieba rely on prefix trees (tries) and greedy algorithms (maximum matching / DAG). These methods are CPU-efficient and lightweight but commonly miss new or domain-specific tokens (OOV words).
Statistical / sequence-labeling (better context, moderate cost). Approaches using HMMs, CRFs or perceptrons (adopted by tools like THULAC, LTP, pkuseg) label characters with B/I/E/S tags to model local context and transitions. They substantially improve OOV handling and achieve high F1 on news/text benchmarks, at the cost of higher computation and memory compared with pure dictionary methods.
Deep learning (highest accuracy, highest cost). Transformer/BERT-based models offer superior contextual understanding and segmentation accuracy, but their inference latency and memory footprint (often hundreds of megabytes to multiple gigabytes) make them unsuitable for real-time, low-latency assistive software like NVDA.

2. NVDA-Specific Requirements

NVDA requires low latency, a small memory footprint, and reliable offline operation — which rules out heavyweight deep-learning models. At the same time, very lightweight pure dictionary- or rule-based segmenters are also unsuitable: their lower accuracy and poor OOV handling lead to frequent mis-segmentation, inaccurate speech output, and increased user confusion in real-world documents. cppjieba therefore represents a pragmatic middle ground — native C++ performance and modest resource usage combined with a hybrid strategy (dictionary lookup plus HMM-based sequence labeling) that delivers appreciably better accuracy and OOV mitigation than pure dictionary methods, while remaining far lighter and faster than Transformer/BERT approaches — making it a sensible choice for improving Chinese text handling in NVDA without imposing large runtime or memory penalties.

Link to issue number:

Related to #4075 and a part of OSPP 2025 of NVDA.

Summary of the issue:

NVDA’s current word navigation mechanism relies on Unicode boundary rules through the Uniscribe API, which do not work well for languages such as Chinese due to the absence of explicit word delimiters.

Description of user facing changes:

None

Description of developer facing changes:

A tool to implement word navigation and braille output within Chinese content.

Description of development approach:

Added cppjieba as a submodule.
Added its wrapper and building script.

Testing strategy:

Confirm weather it can be successfully compiled and its segmentation function can be called by ctypes.

Known issues with pull request:

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

@coderabbitai summary

- Add `cppjieba` as a Git submodule under `third_party/cppjieba/` to provide robust Chinese word segmentation capabilities. - Update `.gitmodules` to point to the official `cppjieba` repository and configure it to track the `master` branch. - Update 'sconscript' to include the paths of 'cppjieba' and its dependency 'limonp' - Modify `copying.txt` to include the `cppjieba` license (MIT) alongside the project’s existing license, ensuring proper attribution and compliance. - Update documents

wmhn1872265132 · 2025-07-24T09:16:44Z

I downloaded the launcher built by GitHUB Actions for testing and it doesn't seem to work

cary-rowen · 2025-07-24T09:23:24Z

Imo, I suggest that initial development and debugging be carried out locally, especially during the stage when functionalities are not yet operational. If extensive testing by community early users is needed, at least new features should be functional.

CrazySteve0605 · 2025-07-24T09:28:34Z

I downloaded the launcher built by GitHUB Actions for testing and it doesn't seem to work

Apologies for the insufficient explanation. This PR is intended as an initial step of the overall work. It does not implement any segmentation functionality yet, but merely introduces the ‘cppjieba’ module. Therefore, it has no impact on end users.

CrazySteve0605 · 2025-07-24T09:39:42Z

Imo, I suggest that initial development and debugging be carried out locally, especially during the stage when functionalities are not yet operational. If extensive testing by community early users is needed, at least new features should be functional.

As many parts of the code need changes, splitting the work into smaller tasks might be more efficient, as the community can review them at the same time.

seanbudd · 2025-07-31T05:27:42Z

@CrazySteve0605 - can you please provide more information on why this library was selected over others? what were alternative options?

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

- Introduce `JiebaSingleton` class in `cppjieba.hpp`/`cppjieba.cpp` with def file under nvdaHelper/cppjieba/' - Inherits from `cppjieba::Jieba` and exposes a thread-safe `getOffsets()` method - Implements Meyers’ singleton via `getInstance()` with a private constructor - Deletes copy constructor, copy assignment, move constructor, and move assignment to enforce single instance - Add C-style API in the same module: - `int initJieba()` to force singleton initialization - `int segmentOffsets(const char* text, int** charOffsets, int* outLen)` to perform segmentation and return character offsets - `void freeOffsets(int* ptr)` to release allocated offset buffer

- Change 'submodules' in 'jobs - buildNVDA - Build NVDA - Checkout NVDA' from 'true' to 'recursive' to ensure cppjieba's submodule is fetched. - This will cause the submodule of sonic to be fetched as well, which seems currently unused.

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

seanbudd · 2025-08-15T06:43:43Z

@CrazySteve0605 - could you please update the PR description with what was requested in #18548 (comment), some more information on the module chosen?

Please also make sure to mark the PR as ready for review if it as this stage.

We are hoping to see implementation of segmentation with an example TextInfo such as WordDocumentTextInfo or Gecko_ia2_TextInfo

CrazySteve0605 · 2025-08-15T07:36:43Z

@seanbudd OK I see. I'm adding more details to make the main comment more informative. Instead of extending 'TextInfos', I'm creating a separate 'WordSegmentationStrategy' and 'WordSegmenter' that follow the strategy pattern, selecting an appropriate segmenter based on the Unicode fields present in the given text. I think this approach will be more extensible and easier to reuse in other potential components, such as braille output. I also plan to submit it as my next PR.

michaelDCurran · 2025-09-17T01:37:41Z

this pr is pretty much good. Just two small things in my last review.
We have decided that this and any other related PRs for this OSSP project will be merged into a special try-wordSegmentation-staging branch rather than master. So that all PRs can be reviewed and merged in to one working branch, which will then if successful, be merged to master all at the same time. Therefore I have switched the base branch of this pr to try-wordSegmentation-staging accordingly.

This reverts commit 06070c1.

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

…ieba

michaelDCurran · 2025-09-29T02:13:14Z

Please note, that I incorrectly had merged this as a squash commit, which caused problems for dependent PRs such as #18735.
I have now reverted that squash commit and merged this PR's branch unsquashed as commit 093a825

…eText ### Link to issue number: blocked by #18548, closes #4075, related #16237 and [a part of OSPP 2025](https://summer-ospp.ac.cn/org/prodetail/25d3e0488?list=org&navpage=org) of NVDA. ### Summary of the issue: ### Description of user facing changes: Chinese text can be navigated by word via system caret or review cursor. ### Description of developer facing changes: ### Description of development approach: - [ ] update `textUtils` - [x] add `WordSegment` module - [x] create `WordSegmentationStrategy' as an abstract base class to select segmentation strategy based on text content, following Strategy Pattern - [x] implement `ChineseWordSegmentationStrategy` (for Chinese text) - [x] implement `UniscribeWordSegmentationStrategy` (for other languages as default strategy) - [x] update `textUtils/__init__.py` - [x] add `WordSegmenter` class for word segmentation, integrating segmentation strategies - [x] update `textInfos/offsets.py` - [x] replace `useUniscribe` with `useUniscribeForCharOffset` & `useWordSegmenterForWordOffset` for segmentation extensions - [x] integrate `WordSegmenter` for calculating word offsets - [ ] update document ### Testing strategy: ### Known issues with pull request: Word segmentation functionality was integrated in `OffsetsTextInfo`. In `OffsetsTextInfo`, word segmentation is based on Uniscribe by default and Unicode as a fall-back. Subclasses of OffsetsTextInfo #### Supported 1. `NVDAObjectTextInfo` 2. `InputCompositionTextInfo`: 3. `JABTextInfo` 4. `SimpleResultTextInfo` 5. `VirtualBufferTextInfo`: use self-hosted function to calculate offset and invoke iits superclass' `_getWordOffsets` #### Unsupported 1. `DisplayModelTextInfo`: straightly disable 2. `EditTextInfo`: straightly use Uniscribe 3. `ScintillaTextInfo`: entirely use self-defined 'word', for Scintilla-based editors such Notepad++ source/NVDAObjects/window/scintilla.py 4. `VsTextEditPaneTextInfo`: use a special COM automation API, for Microsoft Visual Studio and Microsoft SQL Server Management Studio source/appModules/devenv.py 5. `TextFrameTextInfo`: use self-defined 'word', based on PowerPoint's COM object, for PowerPoint's text frame source/appModules/powerpnt.py 6. `LwrTextInfo`: based on pre-computed words during related progress, for structured text using 'LineWordResult' (e.g. Windows OCR) source/contentRecog/__init__.py #### Partial Supported Some architectures totally or priorly use native difination of a 'word', which softwares depend on may not be able to use the new functionallity. 1. `IA2TextTextInfo`: override and fall back source/NVDAObjects/IAccessible/\_\_init\_\_.py ### Code Review Checklist:  - [ ] Documentation: - Change log entry - User Documentation - Developer / Technical Documentation - Context sensitive help for GUI changes - [ ] Testing: - Unit tests - System (end to end) tests - Manual testing - [ ] UX of all users considered: - Speech - Braille - Low Vision - Different web browsers - Localization in other languages / culture than English - [ ] API is compatible with existing add-ons. - [ ] Security precautions taken.

@LeonarddeR

commit e25f593 Author: wencong <manchen_0528@outlook.com> Date: Wed May 20 11:45:38 2026 +0800 Address review feedback for Chinese word segmentation (nvaccess#20178) This PR addresses selected review suggestions from Sean on PR nvaccess#19166 and syncs the branch with the latest master. Completed: - [x] Move the braille offset converter helper out of the nested function. - [x] Move the word segmentation braille converter import out of the local scope. - [x] Move the word segmentation initialization debug log into initialize(). - [x] Remove unsupported RuntimeError-specific handling around wordSeg.initialize(). - [x] Log word segmentation initializer resolution and execution failures. - [x] Add missing return type annotations for wordSegFlag properties. - [x] Remove redundant int annotations from IntFlag members. - [x] Move test helper objects out of nested scopes and add type hints. - [x] Move braille test configuration setup/cleanup into setUp and tearDown. - [x] Update the help ID casing for the word segmentation setting. - [x] Add a user guide section for the word segmentation setting. - [x] Move the Chinese word segmentation change log entries to 2026.3. - [x] Sync with the latest nvaccess/master. Remaining: - [ ] Confirm whether the new config option should be exposed in the UI. --------- Co-authored-by: Michael Curran <mick@nvaccess.org> Co-authored-by: Leonard de Ruijter <3049216+LeonarddeR@users.noreply.github.com> Co-authored-by: Sean Budd <sean@nvaccess.org> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 887ad98 Merge: 9a6bc21 102a54e Author: Sean Budd <sean@nvaccess.org> Date: Mon May 18 16:23:14 2026 +1000 Merge branch 'master' into try-chineseWordSegmentation-staging commit 9a6bc21 Author: Wang Chong <306289287@qq.com> Date: Mon May 18 14:22:13 2026 +0800 Refactor Dicts Installation Logic in cppjieba's sconscript (nvaccess#20162) commit edf3429 Author: wencong <manchen_0528@outlook.com> Date: Tue May 12 08:26:06 2026 +0800 Remove reverted settings dialog debounce changes from Chinese word segmentation branch (nvaccess#20106) This removes settings dialog category-change debounce code that was accidentally reintroduced while resolving conflicts. That code was previously reverted by nvaccess#19846, so this branch should follow current master here and only keep the Document Navigation changes needed for Chinese word segmentation. This also removes an unrelated copyright-only change in browseMode.py. commit fab3060 Author: Wang Chong <306289287@qq.com> Date: Mon May 11 12:21:12 2026 +0800 Fixup merge master into Chinese Word Segmentation (nvaccess#20055)    - [ ] Documentation: - Change log entry - User Documentation - Developer / Technical Documentation - Context sensitive help for GUI changes - [ ] Testing: - Unit tests - System (end to end) tests - Manual testing - [ ] UX of all users considered: - Speech - Braille - Low Vision - Different web browsers - Localization in other languages / culture than English - [ ] API is compatible with existing add-ons. - [ ] Security precautions taken. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 31c9585 Merge: 9edb45f d42d90c Author: Sean Budd <sean@nvaccess.org> Date: Mon May 4 09:25:02 2026 +1000 Merge master into chinese word segmentation (nvaccess#20041)    - [ ] Documentation: - Change log entry - User Documentation - Developer / Technical Documentation - Context sensitive help for GUI changes - [ ] Testing: - Unit tests - System (end to end) tests - Manual testing - [ ] UX of all users considered: - Speech - Braille - Low Vision - Different web browsers - Localization in other languages / culture than English - [ ] API is compatible with existing add-ons. - [ ] Security precautions taken. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: GitHub Actions <github-actions@github.com> Co-authored-by: Sascha Cowley <16543535+SaschaCowley@users.noreply.github.com> Co-authored-by: Peng-An Chen <andy72039@gmail.com> Co-authored-by: Sean Budd <sean@nvaccess.org> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Michael Curran <mick@nvaccess.org> Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Bill Dengler <codeofdusk@gmail.com> Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net> Co-authored-by: Boumtchack <147637328+Boumtchack@users.noreply.github.com> Co-authored-by: Kefas Lungu <84945041+kefaslungu@users.noreply.github.com> Co-authored-by: Jani Kinnunen <janikinnunen340@gmail.com> Co-authored-by: Akash Kakkar <6381747+akash07k@users.noreply.github.com> Co-authored-by: makhlwf <78276231+makhlwf@users.noreply.github.com> Co-authored-by: Quin Gillespie <trypsynth@gmail.com> Co-authored-by: Bram Duvigneau <bram@bramd.nl> Co-authored-by: Luke Davis <8139760+XLTechie@users.noreply.github.com> Co-authored-by: audioses <67341000+audioses@users.noreply.github.com> Co-authored-by: gexgd0419 <55008943+gexgd0419@users.noreply.github.com> Co-authored-by: Cyrille Bougot <cyrille.bougot@laposte.net> Co-authored-by: Joseph Lee <joseph.lee22590@gmail.com> Co-authored-by: Christopher Proß <cpross@mailbox.org> Co-authored-by: Ryan McCleary <remccleary@gmail.com> Co-authored-by: wencong <manchen_0528@outlook.com> Co-authored-by: Emil-18 <135248352+Emil-18@users.noreply.github.com> Co-authored-by: Danil <50794055+Danstiv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Noelia Ruiz Martínez <nrm1977@gmail.com> Co-authored-by: Sean Budd <seanbudd123@gmail.com> Co-authored-by: James Teh <jamie@jantrid.net> Co-authored-by: Thiago Seus <thiago.seus@yahoo.com.br> Co-authored-by: WangFeng Huang <1398969445@qq.com> Co-authored-by: ethanl-11 <124083447+ethanl-11@users.noreply.github.com> commit d42d90c Merge: 07d9b13 7f2d68f Author: cary-rowen <manchen_0528@outlook.com> Date: Fri May 1 22:21:05 2026 +0800 Merge latest master into chinese word segmentation branch commit 07d9b13 Merge: 9edb45f f7dc081 Author: cary-rowen <manchen_0528@outlook.com> Date: Mon Apr 20 12:03:50 2026 +0800 Merge latest master into try-chineseWordSegmentation-staging commit 9edb45f Merge: 096e985 4c9d616 Author: Sean Budd <sean@nvaccess.org> Date: Mon Apr 20 11:01:52 2026 +1000 Merge master into try-chineseWordSegmentation-staging (nvaccess#19747) commit 4c9d616 Author: cary-rowen <manchen_0528@outlook.com> Date: Thu Apr 9 22:19:56 2026 +0800 Fix textInfo word expansion test expectation commit 072b405 Author: cary-rowen <manchen_0528@outlook.com> Date: Thu Apr 9 22:00:30 2026 +0800 Fix word expansion without flowsTo commit 006277b Merge: 096e985 d8bf309 Author: Wang Chong <306289287@qq.com> Date: Fri Mar 6 00:35:58 2026 +0800 Merge branch 'master' commit 096e985 Author: Wang Chong <306289287@qq.com> Date: Mon Mar 2 12:02:27 2026 +0800 Fixup for Chinese Word Segmentation and Braille Output (nvaccess#19324) Summary of the issue: Some punctuations have extra separators (spaces) before or after them. Description of user facing changes: Braille output will be more accurate. commit 29d9f5a Merge: b50a0d5 9cafffb Author: Michael Curran <mick@nvaccess.org> Date: Thu Oct 30 12:12:25 2025 +1100 Merge pull request nvaccess#18865 from CrazySteve0605/brailleOutputForChinese blocked by nvaccess#18735, related nvaccess#1890 (comment), and [a part of OSPP 2025](https://summer-ospp.ac.cn/org/prodetail/25d3e0488?list=org&navpage=org) of NVDA. Braille output for Chinese become easier to read. The displayed braille and text are not aligned with each other in Braille Viewer, which shouldn't have an effect on reading from a braille displayer since only braille is transferred and the carets of raw text and output braille are aligned.  - [ ] Documentation: - Change log entry - User Documentation - Developer / Technical Documentation - Context sensitive help for GUI changes - [ ] Testing: - Unit tests - System (end to end) tests - Manual testing - [ ] UX of all users considered: - Speech - Braille - Low Vision - Different web browsers - Localization in other languages / culture than English - [ ] API is compatible with existing add-ons. - [ ] Security precautions taken. commit 9cafffb Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue Oct 28 22:26:13 2025 +0000 Pre-commit auto-fix commit 0d27c91 Merge: d2714a3 b50a0d5 Author: Michael Curran <mick@nvaccess.org> Date: Wed Oct 29 08:23:08 2025 +1000 Merge branch 'try-chineseWordSegmentation-staging' into brailleOutputForChinese commit b50a0d5 Merge: c86b760 db90fff Author: Michael Curran <mick@nvaccess.org> Date: Wed Oct 29 09:20:19 2025 +1100 Merge pull request nvaccess#18735 from CrazySteve0605/wordNavigationForChineseText blocked by nvaccess#18548, closes nvaccess#4075, related nvaccess#16237 and [a part of OSPP 2025](https://summer-ospp.ac.cn/org/prodetail/25d3e0488?list=org&navpage=org) of NVDA. Chinese text can be navigated by word via system caret or review cursor. - [ ] update `textUtils` - [x] add `WordSegment` module - [x] create `WordSegmentationStrategy' as an abstract base class to select segmentation strategy based on text content, following Strategy Pattern - [x] implement `ChineseWordSegmentationStrategy` (for Chinese text) - [x] implement `UniscribeWordSegmentationStrategy` (for other languages as default strategy) - [x] update `textUtils/__init__.py` - [x] add `WordSegmenter` class for word segmentation, integrating segmentation strategies - [x] update `textInfos/offsets.py` - [x] replace `useUniscribe` with `useUniscribeForCharOffset` & `useWordSegmenterForWordOffset` for segmentation extensions - [x] integrate `WordSegmenter` for calculating word offsets - [ ] update document Word segmentation functionality was integrated in `OffsetsTextInfo`. In `OffsetsTextInfo`, word segmentation is based on Uniscribe by default and Unicode as a fall-back. Subclasses of OffsetsTextInfo 1. `NVDAObjectTextInfo` 2. `InputCompositionTextInfo`: 3. `JABTextInfo` 4. `SimpleResultTextInfo` 5. `VirtualBufferTextInfo`: use self-hosted function to calculate offset and invoke iits superclass' `_getWordOffsets` 1. `DisplayModelTextInfo`: straightly disable 2. `EditTextInfo`: straightly use Uniscribe 3. `ScintillaTextInfo`: entirely use self-defined 'word', for Scintilla-based editors such Notepad++ source/NVDAObjects/window/scintilla.py 4. `VsTextEditPaneTextInfo`: use a special COM automation API, for Microsoft Visual Studio and Microsoft SQL Server Management Studio source/appModules/devenv.py 5. `TextFrameTextInfo`: use self-defined 'word', based on PowerPoint's COM object, for PowerPoint's text frame source/appModules/powerpnt.py 6. `LwrTextInfo`: based on pre-computed words during related progress, for structured text using 'LineWordResult' (e.g. Windows OCR) source/contentRecog/__init__.py Some architectures totally or priorly use native difination of a 'word', which softwares depend on may not be able to use the new functionallity. 1. `IA2TextTextInfo`: override and fall back source/NVDAObjects/IAccessible/\_\_init\_\_.py  - [ ] Documentation: - Change log entry - User Documentation - Developer / Technical Documentation - Context sensitive help for GUI changes - [ ] Testing: - Unit tests - System (end to end) tests - Manual testing - [ ] UX of all users considered: - Speech - Braille - Low Vision - Different web browsers - Localization in other languages / culture than English - [ ] API is compatible with existing add-ons. - [ ] Security precautions taken. commit db90fff Author: Wang Chong <306289287@qq.com> Date: Tue Oct 28 12:57:06 2025 +0800 remove duplicate importing lines commit d2714a3 Author: Wang Chong <306289287@qq.com> Date: Mon Oct 27 11:28:27 2025 +0800 fixup commit 042b778 Merge: b8ace76 c86b760 Author: Michael Curran <mick@nvaccess.org> Date: Mon Oct 27 11:47:18 2025 +1000 Merge branch 'try-chineseWordSegmentation-staging' into wordNavigationForChineseText commit c86b760 Merge: 4a4b1af 9e37e57 Author: Michael Curran <mick@nvaccess.org> Date: Mon Oct 27 11:45:35 2025 +1000 Merge branch 'master' into try-chineseWordSegmentation-staging commit b8ace76 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat Oct 25 01:09:03 2025 +0000 Pre-commit auto-fix commit d32549f Author: Wang Chong <306289287@qq.com> Date: Sat Oct 25 09:03:22 2025 +0800 make word segmentation module reinitialized after settings are saved commit 2083095 Author: Wang Chong <306289287@qq.com> Date: Sat Oct 25 08:50:00 2025 +0800 fixup commit d55d077 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Oct 9 21:19:50 2025 +0000 Pre-commit auto-fix commit 085ba2f Merge: 80b0472 4a4b1af Author: Michael Curran <mick@nvaccess.org> Date: Fri Oct 10 07:17:56 2025 +1000 Merge branch 'try-chineseWordSegmentation-staging' into wordNavigationForChineseText commit 4a4b1af Merge: 093a825 42dfcfe Author: Michael Curran <mick@nvaccess.org> Date: Fri Oct 10 07:13:22 2025 +1000 Merge branch 'master' into try-chineseWordSegmentation-staging commit 80b0472 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue Sep 30 05:12:16 2025 +0000 Pre-commit auto-fix commit f98b1b1 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue Sep 30 05:10:50 2025 +0000 Pre-commit auto-fix commit d1373b2 Author: Wang Chong <306289287@qq.com> Date: Tue Sep 30 13:06:54 2025 +0800 fixup commit 9ab3dba Merge: 251811e 0940a73 Author: Wang Chong <306289287@qq.com> Date: Tue Sep 30 13:06:22 2025 +0800 Merge branch 'wordNavigationForChineseText' into brailleOutputForChinese commit 0940a73 Author: Wang Chong <306289287@qq.com> Date: Tue Sep 30 12:59:46 2025 +0800 fixup commit c3a8562 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue Sep 30 03:25:34 2025 +0000 Pre-commit auto-fix commit 5e0e3fd Author: Wang Chong <306289287@qq.com> Date: Tue Sep 30 11:24:03 2025 +0800 simplify the logic for 'Auto' option in Word Segmentation Standard settings commit 552b42b Author: Wang Chong <306289287@qq.com> Date: Tue Sep 30 10:33:24 2025 +0800 fixup unittests commit 653e808 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon Sep 29 22:58:20 2025 +0000 Pre-commit auto-fix commit b40d709 Author: Wang Chong <306289287@qq.com> Date: Tue Sep 30 06:56:48 2025 +0800 revert `Initialize Word Segmenters for Unused Languages:` checkbox and fixup the initialization logic commit face4bd Merge: 6f586fd 093a825 Author: Michael Curran <mick@nvaccess.org> Date: Mon Sep 29 12:09:28 2025 +1000 Merge branch 'try-chineseWordSegmentation-staging' into wordNavigationForChineseText commit 093a825 Merge: 4bc116d b327e23 Author: Michael Curran <mick@nvaccess.org> Date: Mon Sep 29 12:07:17 2025 +1000 Merge branch 'integrateCppJieba' into try-chineseWordSegmentation-staging commit b327e23 Merge: 30e855f b236fe6 Author: Michael Curran <mick@nvaccess.org> Date: Mon Sep 29 08:38:17 2025 +1000 Merge branch 'try-chineseWordSegmentation-staging' into integrateCPPJieba commit 251811e Author: Wang Chong <306289287@qq.com> Date: Sun Sep 28 08:04:06 2025 +0800 update changelog commit 3b7bf5f Author: Wang Chong <306289287@qq.com> Date: Sun Sep 28 08:02:59 2025 +0800 correct and simplify the offset calculations commit 9304a39 Merge: 2d7c596 6f586fd Author: Wang Chong <306289287@qq.com> Date: Sun Sep 28 08:02:28 2025 +0800 Merge branch 'wordNavigationForChineseText' into brailleOutputForChinese commit 6f586fd Author: Wang Chong <306289287@qq.com> Date: Sun Sep 28 07:58:07 2025 +0800 update changelog commit b69d466 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 28 07:55:26 2025 +0800 fix up commit 9834b68 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 27 09:16:37 2025 +0800 extract punctuation from `wordSegStrategy.py` to `wordSegUtils.py` commit 9479029 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 27 09:15:51 2025 +0800 fixup commit f769457 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat Sep 27 00:46:03 2025 +0000 Pre-commit auto-fix commit 2eec029 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 27 08:40:57 2025 +0800 add unittest cases for `WordSegmenter` commit 43bfe03 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 27 08:39:14 2025 +0800 make initialization of word segmenters conditional on language commit 111a24d Author: Wang Chong <306289287@qq.com> Date: Fri Sep 26 20:50:44 2025 +0800 update `wordSegSegmenter.py` to handle offsets at the end of the string commit 69617c4 Merge: 53b3870 fec70a9 Author: Wang Chong <306289287@qq.com> Date: Fri Sep 26 20:50:03 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit fec70a9 Merge: 30e855f b236fe6 Author: Wang Chong <306289287@qq.com> Date: Fri Sep 26 18:11:03 2025 +0800 Merge branch 'master' into integrateCPPJieba commit 53b3870 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 25 23:09:55 2025 +0800 make `cppjieba` only available when NVDA's language is set to Chinese commit 250e700 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 25 22:32:03 2025 +0800 update UI text for Uniscribe commit ccf07f9 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 25 22:31:47 2025 +0800 correct method naming commit 38ec7ff Merge: dc23346 5562e70 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 25 22:31:06 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit 5562e70 Merge: 30e855f 339af3e Author: Wang Chong <306289287@qq.com> Date: Thu Sep 25 22:29:07 2025 +0800 Merge branch 'master' into integrateCPPJieba commit 30e855f Merge: 2e730d6 dc22697 Author: Michael Curran <mick@nvaccess.org> Date: Mon Sep 22 18:20:36 2025 +1000 Merge branch 'try-chineseWordSegmentation-staging' into integrateCPPJieba commit dc23346 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 21 22:58:31 2025 +0800 Update source/core.py Co-authored-by: Cyrille Bougot <cyrille.bougot2@laposte.net> commit 9537999 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 21 22:52:10 2025 +0800 revert copyright header of `configSpec.py` commit 90660ba Author: Wang Chong <306289287@qq.com> Date: Sun Sep 21 22:24:18 2025 +0800 update `wordSegStrategy.py` commit 7ee08d0 Merge: a8955a3 2e730d6 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 21 22:22:54 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit a8955a3 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 21 22:19:19 2025 +0800 Revert "update module importing order and type annotations" This reverts commit 3bfbe59. commit 2e730d6 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 21 00:20:19 2025 +0800 Update .gitattributes Co-authored-by: Sean Budd <seanbudd123@gmail.com> commit 194a69e Author: Wang Chong <306289287@qq.com> Date: Sun Sep 21 00:10:40 2025 +0800 avoid using compilation time path commit c2cbb24 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 20 23:53:56 2025 +0800 Revert "Update projectDocs/dev/createDevEnvironment.md" This reverts commit 06070c1. commit 0f507d5 Merge: b3e08ee 5a557d0 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 20 23:53:19 2025 +0800 Merge branch 'master' into integrateCPPJieba commit bac3210 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat Sep 13 01:33:24 2025 +0000 Pre-commit auto-fix commit 97eb6dd Author: Wang Chong <306289287@qq.com> Date: Sat Sep 13 09:19:40 2025 +0800 handle punctuation spacing commit 2b1d4b3 Merge: cf3e115 984b6eb Author: Wang Chong <306289287@qq.com> Date: Fri Sep 12 19:12:00 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit 984b6eb Merge: b3e08ee 3d74061 Author: Wang Chong <306289287@qq.com> Date: Fri Sep 12 19:09:24 2025 +0800 Merge branch 'master' into integrateCPPJieba commit cf3e115 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue Sep 9 16:49:47 2025 +0000 Pre-commit auto-fix commit 3a0badc Author: Wang Chong <306289287@qq.com> Date: Wed Sep 10 00:37:33 2025 +0800 update `wordSegStrategy.py` * add LRU caching commit f5087cc Merge: 3bfbe59 b3e08ee Author: Wang Chong <306289287@qq.com> Date: Wed Sep 10 00:24:14 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit b3e08ee Author: Wang Chong <306289287@qq.com> Date: Wed Sep 10 00:23:09 2025 +0800 update helper of `coojieba` commit bec5dc5 Merge: 09b1890 a210f97 Author: Wang Chong <306289287@qq.com> Date: Wed Sep 10 00:19:54 2025 +0800 Merge branch 'master' into integrateCPPJieba commit 09b1890 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 19:04:29 2025 +0800 fix building script commit 00796fe Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 18:39:59 2025 +0800 revert installing script commit 30120f8 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 18:38:36 2025 +0800 update building script commit 53158b6 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 18:17:50 2025 +0800 update changelog commit 3bfbe59 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 17:04:33 2025 +0800 update module importing order and type annotations commit b848e1b Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 16:28:01 2025 +0800 update `wordSegStrategy.py` commit abeb147 Merge: a1113d8 2955ca8 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 12:29:00 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit 2955ca8 Author: Wang Chong <306289287@qq.com> Date: Sun Sep 7 12:28:14 2025 +0800 simplify helper of `cppjieba` * turn to build-in `Word` structure * remove some items we don't use commit a9281f6 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 6 18:36:20 2025 +0800 update .gitattributes for .hpp header files commit 49cc1fe Author: Wang Chong <306289287@qq.com> Date: Sat Sep 6 18:31:05 2025 +0800 update cppjieba to the latest commit commit fe118ee Merge: 1869ed0 e6557a7 Author: Wang Chong <306289287@qq.com> Date: Sat Sep 6 18:27:32 2025 +0800 Merge branch 'master' into integrateCPPJieba commit 2d7c596 Author: Wang Chong <306289287@qq.com> Date: Fri Sep 5 00:09:16 2025 +0800 add word separator to optimize braille output for Chinese text * add a subclass of `OffsetConverter` to handle the offset mapping for raw text and separated one * add logic to invoke it when Chinese translation tables are used commit a1113d8 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 4 14:07:24 2025 +0800 add `segmentedText` method commit c1fb4b8 Merge: 9e6a2e1 1869ed0 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 4 12:50:22 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit 1869ed0 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 4 12:41:29 2025 +0800 simplify the initialization of cppjieba drop off initialization of the keyword extractor which we don't need commit 11827fb Merge: 3e495d2 952db62 Author: Wang Chong <306289287@qq.com> Date: Thu Sep 4 09:51:52 2025 +0800 Merge branch 'master' into integrateCPPJieba commit 9e6a2e1 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Aug 28 11:17:31 2025 +0000 Pre-commit auto-fix commit 3b2d835 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 28 19:11:05 2025 +0800 resolve deprecation commit a4edc9e Merge: 97b6db7 3e495d2 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 28 19:05:26 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit 3e495d2 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 28 18:13:11 2025 +0800 add wrappers for user dict management commit a643391 Merge: c853b64 c30a787 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 28 18:10:09 2025 +0800 Merge branch 'master' into integrateCPPJieba commit 97b6db7 Author: Wang Chong <306289287@qq.com> Date: Sun Aug 24 23:47:59 2025 +0800 update for pyright checks commit 4a680ea Author: Wang Chong <306289287@qq.com> Date: Sun Aug 24 23:45:48 2025 +0800 make "Auto" the default option for word navigation commit 356c11c Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat Aug 23 15:35:45 2025 +0000 Pre-commit auto-fix commit 3ba56f0 Author: Wang Chong <306289287@qq.com> Date: Sat Aug 23 23:25:47 2025 +0800 add configuration for word navigation users can find it in NVDA settings -> Document Navigation -> Word Segmentation Method commit eeb96aa Author: Wang Chong <306289287@qq.com> Date: Sat Aug 23 20:33:54 2025 +0800 use multithreading for cppjieba's initialization commit 38b4bea Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Aug 21 07:00:36 2025 +0000 Pre-commit auto-fix commit 3f54d62 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 21 14:59:22 2025 +0800 add initialization logic to wordSeg module - add an decorator to easily add initializers - extract `cppjieba`'s initializer to fit the decorator commit 8244a76 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 21 11:04:44 2025 +0800 make wordSegment module to make file structure clearer - create `wordSeg` package - migrate wordSegment module into wordSeg package and rename to wordSeg commit d69e8b7 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 21 10:44:37 2025 +0800 add trailing commas in multi-line constructs commit 3c65868 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 21 10:36:16 2025 +0800 update log commit ddd48e8 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 21 10:33:28 2025 +0800 add type annotations commit 676fc42 Author: Wang Chong <306289287@qq.com> Date: Thu Aug 21 09:51:24 2025 +0800 add copyright header commit 0d40f0a Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed Aug 20 13:31:57 2025 +0000 Pre-commit auto-fix commit 407d4b2 Merge: 4adac07 adc22fb Author: Wang Chong <306289287@qq.com> Date: Wed Aug 20 21:27:13 2025 +0800 Merge branch 'integrateCPPJieba' into wordNavigationForChineseText commit 4adac07 Author: Wang Chong <306289287@qq.com> Date: Wed Aug 20 21:22:52 2025 +0800 update the word segmentation structure - redesign 2 properties of 'OffsetTextInfo' as enums to make them more configurable, inspired by @LeonarddeR - override them in some subclasses to simulate specific behaviors commit adc22fb Author: Wang Chong <306289287@qq.com> Date: Wed Aug 20 19:22:28 2025 +0800 add wrapper for word manager commit 19cad8a Merge: eba63ab 92f345e Author: Wang Chong <306289287@qq.com> Date: Wed Aug 20 17:35:03 2025 +0800 Merge branch 'master' into integrateCPPJieba commit f72d348 Author: Wang Chong <306289287@qq.com> Date: Mon Aug 18 15:06:19 2025 +0800 update type annotations commit 557f404 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon Aug 18 04:09:20 2025 +0000 Pre-commit auto-fix commit da64cd8 Author: Wang Chong <306289287@qq.com> Date: Mon Aug 18 12:02:49 2025 +0800 update `displayModel.py` make `DisplayModelTextInfo`'s flag aligned with its superclass commit 81f2040 Author: Wang Chong <306289287@qq.com> Date: Mon Aug 18 11:55:36 2025 +0800 update `textInfos/offsets.py` - replace `useUniscribe` with `useUniscribeForCharOffset` & `useWordSegmenterForWordOffset` for segmentation extensions - integrate `WordSegmenter` for calculating word offsets commit 9f62f04 Author: Wang Chong <306289287@qq.com> Date: Mon Aug 18 11:54:41 2025 +0800 update `textUtils/__init__.py` add `WordSegmenter` class for word segmentation, integrating segmentation strategies commit b0ac081 Author: Wang Chong <306289287@qq.com> Date: Mon Aug 18 11:38:48 2025 +0800 add `WordSegment` module - create `WordSegmentationStrategy' as an abstract base class to select segmentation strategy based on text content, following Strategy Pattern - implement `ChineseWordSegmentationStrategy` (for Chinese text) - implement `UniscribeWordSegmentationStrategy` (for other languages as default strategy) commit eba63ab Merge: 53dd3bb b9f19e6 Author: Wang Chong <306289287@qq.com> Date: Fri Aug 15 14:16:16 2025 +0800 Merge branch 'master' into integrateCPPJieba merge for Python updated commit 53dd3bb Merge: c853b64 b0241da Author: Wang Chong <306289287@qq.com> Date: Sat Aug 9 20:43:34 2025 +0800 Merge branch 'master' into integrateCPPJieba commit c853b64 Author: Wang Chong <306289287@qq.com> Date: Sat Aug 9 20:33:12 2025 +0800 Update include/readme.md Co-authored-by: Sean Budd <seanbudd123@gmail.com> commit c60c2da Author: Wang Chong <306289287@qq.com> Date: Sat Aug 9 20:30:58 2025 +0800 update copyright headers based on @seanbudd's suggestions commit 38a12dc Author: Wang Chong <306289287@qq.com> Date: Wed Aug 6 21:29:22 2025 +0800 Update building and setup script for cppjieba's dicts installation commit da662be Author: Wang Chong <306289287@qq.com> Date: Wed Aug 6 21:27:44 2025 +0800 Update .gitignore for cppjieba commit 0d92c08 Author: Wang Chong <306289287@qq.com> Date: Wed Aug 6 12:05:54 2025 +0800 Update GitHub action workflow to fetch cppjieba's submodule - Change 'submodules' in 'jobs - buildNVDA - Build NVDA - Checkout NVDA' from 'true' to 'recursive' to ensure cppjieba's submodule is fetched. - This will cause the submodule of sonic to be fetched as well, which seems currently unused. commit d4c3a92 Merge: f4cab8a 432364c Author: Wang Chong <306289287@qq.com> Date: Wed Aug 6 11:49:16 2025 +0800 Merge branch 'master' into integrateCPPJieba commit f4cab8a Merge: 7de7464 4ba948a Author: Wang Chong <306289287@qq.com> Date: Tue Aug 5 17:38:55 2025 +0800 Merge branch 'master' into integrateCPPJieba commit 7de7464 Author: Wang Chong <306289287@qq.com> Date: Tue Aug 5 12:52:39 2025 +0800 add JiebaSingleton wrapper and C API for NVDA segmentation - Introduce `JiebaSingleton` class in `cppjieba.hpp`/`cppjieba.cpp` with def file under nvdaHelper/cppjieba/' - Inherits from `cppjieba::Jieba` and exposes a thread-safe `getOffsets()` method - Implements Meyers’ singleton via `getInstance()` with a private constructor - Deletes copy constructor, copy assignment, move constructor, and move assignment to enforce single instance - Add C-style API in the same module: - `int initJieba()` to force singleton initialization - `int segmentOffsets(const char* text, int** charOffsets, int* outLen)` to perform segmentation and return character offsets - `void freeOffsets(int* ptr)` to release allocated offset buffer commit 1fbf05f Author: Wang Chong <306289287@qq.com> Date: Mon Aug 4 09:59:03 2025 +0800 add building script for cppjieba commit 3d4d9f1 Author: Wang Chong <306289287@qq.com> Date: Fri Aug 1 13:59:53 2025 +0800 Remove changes in sconscript for localLIb commit 2273a60 Author: Wang Chong <306289287@qq.com> Date: Fri Aug 1 13:06:08 2025 +0800 Update include/readme.md Co-authored-by: Sean Budd <seanbudd123@gmail.com> commit 06070c1 Author: Wang Chong <306289287@qq.com> Date: Fri Aug 1 13:04:22 2025 +0800 Update projectDocs/dev/createDevEnvironment.md Co-authored-by: Sean Budd <seanbudd123@gmail.com> commit ae58e9b Author: Wang Chong <306289287@qq.com> Date: Thu Jul 24 16:59:26 2025 +0800 Add comments for building script of cppjieba and its dependency commit fb4efef Author: Wang Chong <306289287@qq.com> Date: Thu Jul 24 16:51:49 2025 +0800 Update what's new commit 5cb5189 Author: Wang Chong <306289287@qq.com> Date: Wed Jul 23 17:30:56 2025 +0800 Introduce cppjieba as a submodule for Chinese word segmentation - Add `cppjieba` as a Git submodule under `third_party/cppjieba/` to provide robust Chinese word segmentation capabilities. - Update `.gitmodules` to point to the official `cppjieba` repository and configure it to track the `master` branch. - Update 'sconscript' to include the paths of 'cppjieba' and its dependency 'limonp' - Modify `copying.txt` to include the `cppjieba` license (MIT) alongside the project’s existing license, ensuring proper attribution and compliance. - Update documents

seanbudd requested a review from michaelDCurran July 24, 2025 02:29

CrazySteve0605 added 2 commits July 24, 2025 16:51

Update what's new

fb4efef

Add comments for building script of cppjieba and its dependency

ae58e9b

CrazySteve0605 marked this pull request as ready for review July 24, 2025 09:05

CrazySteve0605 requested a review from a team as a code owner July 24, 2025 09:05

seanbudd reviewed Jul 31, 2025

View reviewed changes

Comment thread nvdaHelper/local/sconscript Outdated

Comment thread include/readme.md

Comment thread projectDocs/dev/createDevEnvironment.md Outdated

CrazySteve0605 and others added 3 commits August 1, 2025 13:04

Update projectDocs/dev/createDevEnvironment.md

06070c1

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Update include/readme.md

2273a60

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Remove changes in sconscript for localLIb

3d4d9f1

CrazySteve0605 marked this pull request as draft August 4, 2025 00:32

CrazySteve0605 added 7 commits August 4, 2025 09:59

add building script for cppjieba

1fbf05f

Merge branch 'master' into integrateCPPJieba

f4cab8a

Merge branch 'master' into integrateCPPJieba

d4c3a92

Update .gitignore for cppjieba

da662be

Update building and setup script for cppjieba's dicts installation

38a12dc

gerald-hartig reviewed Aug 8, 2025

View reviewed changes

Comment thread nvdaHelper/cppjieba/cppjieba.cpp Outdated

gerald-hartig reviewed Aug 8, 2025

View reviewed changes

Comment thread nvdaHelper/cppjieba/cppjieba.hpp Outdated

seanbudd reviewed Aug 8, 2025

View reviewed changes

Comment thread include/readme.md Outdated

Comment thread nvdaHelper/cppjieba/cppjieba.cpp Outdated

Comment thread nvdaHelper/cppjieba/cppjieba.hpp

Comment thread nvdaHelper/cppjieba/sconscript

Comment thread user_docs/en/changes.md

CrazySteve0605 and others added 2 commits August 9, 2025 20:30

update copyright headers based on @seanbudd's suggestions

c60c2da

Update include/readme.md

c853b64

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

CrazySteve0605 force-pushed the integrateCPPJieba branch from d80388d to b3e08ee Compare September 9, 2025 16:45

CrazySteve0605 marked this pull request as ready for review September 12, 2025 11:10

michaelDCurran requested changes Sep 17, 2025

View reviewed changes

Comment thread projectDocs/dev/createDevEnvironment.md

Comment thread nvdaHelper/cppjieba/cppjieba.hpp Outdated

michaelDCurran changed the base branch from master to try-chineseWordSegmentation-staging September 17, 2025 01:33

seanbudd reviewed Sep 17, 2025

View reviewed changes

Comment thread .gitattributes Outdated

Comment thread projectDocs/dev/createDevEnvironment.md

CrazySteve0605 mentioned this pull request Sep 18, 2025

Add Word Separaters in Braille Output for Chinese text #18865

Merged

5 tasks

CrazySteve0605 added 3 commits September 20, 2025 23:53

Merge branch 'master' into integrateCPPJieba

0f507d5

Revert "Update projectDocs/dev/createDevEnvironment.md"

c2cbb24

This reverts commit 06070c1.

avoid using compilation time path

194a69e

CrazySteve0605 force-pushed the integrateCPPJieba branch from 984b6eb to 194a69e Compare September 20, 2025 16:18

CrazySteve0605 requested a review from a team as a code owner September 20, 2025 16:18

CrazySteve0605 requested review from Qchristensen and removed request for a team September 20, 2025 16:18

CrazySteve0605 and others added 2 commits September 21, 2025 00:20

Update .gitattributes

2e730d6

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Merge branch 'try-chineseWordSegmentation-staging' into integrateCPPJ…

30e855f

…ieba

CrazySteve0605 requested a review from michaelDCurran September 24, 2025 09:10

CrazySteve0605 force-pushed the integrateCPPJieba branch 2 times, most recently from 5562e70 to fec70a9 Compare September 26, 2025 12:49

Merge branch 'try-chineseWordSegmentation-staging' into integrateCPPJ…

b327e23

…ieba

michaelDCurran force-pushed the integrateCPPJieba branch from fec70a9 to b327e23 Compare September 28, 2025 22:38

michaelDCurran approved these changes Sep 29, 2025

View reviewed changes

michaelDCurran merged commit a3a1815 into nvaccess:try-chineseWordSegmentation-staging Sep 29, 2025
38 checks passed

github-actions Bot added this to the 2026.1 milestone Sep 29, 2025

seanbudd mentioned this pull request Nov 5, 2025

[WIP] Merge Chinese Word Segmentation work #19166

Closed

CrazySteve0605 deleted the integrateCPPJieba branch May 18, 2026 02:59

seanbudd mentioned this pull request May 20, 2026

Add Chinese Word Segmentation #20183

Open

Uh oh!

Conversation

CrazySteve0605 commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A detialed integration analysis

1. Current State of Chinese Segmentation Tools

2. NVDA-Specific Requirements

Link to issue number:

Summary of the issue:

Description of user facing changes:

Description of developer facing changes:

Description of development approach:

Testing strategy:

Known issues with pull request:

Code Review Checklist:

Uh oh!

wmhn1872265132 commented Jul 24, 2025

Uh oh!

cary-rowen commented Jul 24, 2025

Uh oh!

CrazySteve0605 commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CrazySteve0605 commented Jul 24, 2025

Uh oh!

seanbudd commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seanbudd commented Aug 15, 2025

Uh oh!

CrazySteve0605 commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

michaelDCurran commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michaelDCurran commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

CrazySteve0605 commented Jul 24, 2025 •

edited

Loading

CrazySteve0605 commented Jul 24, 2025 •

edited

Loading