Merged
Conversation
Contributor
There was a problem hiding this comment.
Hey - 我发现了 2 个问题
给 AI Agent 的提示
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="tools/i18n/sync_ocr_expected.py" line_range="423-432" />
<code_context>
+ unresolved_texts: List[str] = []
+
+ # 第一轮:唯一命中
+ for text, candidates in candidates_by_text:
+ if len(candidates) == 1:
+ lang_id = next(iter(candidates))
+ if lang_id not in resolved_set:
+ resolved_in_order.append(lang_id)
+ resolved_set.add(lang_id)
+ elif len(candidates) == 0:
+ unresolved_texts.append(text)
+
+ # 第二轮:如果歧义候选与已解析 ID 有交集,用交集兜底
+ for text, candidates in candidates_by_text:
+ if len(candidates) > 1:
+ intersection = [lang_id for lang_id in resolved_in_order if lang_id in candidates]
+ if len(intersection) == 1:
+ lang_id = intersection[0]
+ if lang_id not in resolved_set:
</code_context>
<issue_to_address>
**issue (bug_risk):** 由于 `resolved_set` 的检查,当前歧义文本的解析逻辑永远无法从交集中分配 ID。
在第二轮中,`intersection` 来源于 `resolved_in_order`,而 `resolved_set` 也是基于同一个列表构建的。这意味着对 `intersection` 中任意 `lang_id` 来说,`if lang_id not in resolved_set` 这个条件始终为假,因此分支体永远不会执行,歧义文本也就不会通过这个启发式规则被解析。如果你希望歧义文本能复用已有的 ID,大概率需要 (a) 去掉这个条件判断,并避免重新加入 `resolved_in_order`/`resolved_set`,或者 (b) 允许 `resolved_in_order` 中出现重复项。按当前写法,第二轮的逻辑实际上是死代码。
</issue_to_address>
### Comment 2
<location path="tools/i18n/sync_ocr_expected.py" line_range="472-481" />
<code_context>
+ unresolved_all: List[Tuple[str, str, List[str]]] = []
+
+ for file_path in pipeline_files:
+ try:
+ new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
+ file_path, tables, reverse_index
+ )
+ except Exception as exc:
+ safe_print(f"[ERROR] {file_path}: {exc}")
+ continue
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 在处理文件时吞掉所有异常会隐藏损坏的 JSON 或逻辑错误,同时仍然返回退出码 0。
由于循环在捕获到 `Exception` 后只记录日志并继续执行,即使有部分文件解析或处理失败,CI 仍然会显示为成功,导致更新结果悄无声息地变成“只更新了一部分”。建议对失败进行统计(例如使用 `failed_files` 计数或列表),并在 `main()` 中如果存在失败就以非零状态码退出;或者对于像 JSON/解析错误这类关键异常选择性地重新抛出,让任务显式失败。
建议实现如下:
```python
unresolved_all: List[Tuple[str, str, List[str]]] = []
failed_files: List[Tuple[str, Exception]] = []
for file_path in pipeline_files:
try:
new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
file_path, tables, reverse_index
)
except Exception as exc:
safe_print(f"[ERROR] {file_path}: {exc}")
failed_files.append((file_path, exc))
continue
if failed_files:
safe_print(
f"[ERROR] 共有 {len(failed_files)} 个文件处理失败,退出码为 1:" # summary in Chinese to match existing comments
)
for path, exc in failed_files:
safe_print(f" - {path}: {exc}")
# 非零退出码以便 CI 正确失败
sys.exit(1)
```
1. 确保在 `tools/i18n/sync_ocr_expected.py` 文件顶部已经导入了 `sys`:
- 如果还没有,请和其他 import 放在一起添加 `import sys`。
2. 以上代码片段很可能位于 `main()` 或其他顶层函数中。如果这个循环不在 `main()` 里,请确保最终是由 `main()` 在该函数返回失败状态后调用 `sys.exit(1)`(或等价的非零退出),而不是从调用栈深处直接退出进程。如有需要,请相应地调整控制流。
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的代码评审。
Original comment in English
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="tools/i18n/sync_ocr_expected.py" line_range="423-432" />
<code_context>
+ unresolved_texts: List[str] = []
+
+ # 第一轮:唯一命中
+ for text, candidates in candidates_by_text:
+ if len(candidates) == 1:
+ lang_id = next(iter(candidates))
+ if lang_id not in resolved_set:
+ resolved_in_order.append(lang_id)
+ resolved_set.add(lang_id)
+ elif len(candidates) == 0:
+ unresolved_texts.append(text)
+
+ # 第二轮:如果歧义候选与已解析 ID 有交集,用交集兜底
+ for text, candidates in candidates_by_text:
+ if len(candidates) > 1:
+ intersection = [lang_id for lang_id in resolved_in_order if lang_id in candidates]
+ if len(intersection) == 1:
+ lang_id = intersection[0]
+ if lang_id not in resolved_set:
</code_context>
<issue_to_address>
**issue (bug_risk):** Ambiguous text resolution logic can never assign an ID from the intersection because of the `resolved_set` check.
In the second pass, `intersection` is derived from `resolved_in_order`, and `resolved_set` is built from the same list. That makes the `if lang_id not in resolved_set` guard always false for any `lang_id` in `intersection`, so the body never runs and ambiguous texts are never resolved via this heuristic. If you want ambiguous texts to reuse an existing ID, you likely need to (a) drop this guard and avoid re-adding to `resolved_in_order`/`resolved_set`, or (b) allow duplicate entries in `resolved_in_order`. As written, the second-pass logic is effectively dead code.
</issue_to_address>
### Comment 2
<location path="tools/i18n/sync_ocr_expected.py" line_range="472-481" />
<code_context>
+ unresolved_all: List[Tuple[str, str, List[str]]] = []
+
+ for file_path in pipeline_files:
+ try:
+ new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
+ file_path, tables, reverse_index
+ )
+ except Exception as exc:
+ safe_print(f"[ERROR] {file_path}: {exc}")
+ continue
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Swallowing all exceptions when processing files can hide broken JSON or logic errors while still returning exit code 0.
Because the loop only logs and continues on `Exception`, CI will show success even when some files fail to parse or process, leading to silently partial updates. Consider tracking failures (e.g., a `failed_files` counter) and exiting non‑zero from `main()` if any occur, or selectively re‑raising for critical exceptions like JSON/parse errors so the job fails visibly.
Suggested implementation:
```python
unresolved_all: List[Tuple[str, str, List[str]]] = []
failed_files: List[Tuple[str, Exception]] = []
for file_path in pipeline_files:
try:
new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
file_path, tables, reverse_index
)
except Exception as exc:
safe_print(f"[ERROR] {file_path}: {exc}")
failed_files.append((file_path, exc))
continue
if failed_files:
safe_print(
f"[ERROR] 共有 {len(failed_files)} 个文件处理失败,退出码为 1:" # summary in Chinese to match existing comments
)
for path, exc in failed_files:
safe_print(f" - {path}: {exc}")
# 非零退出码以便 CI 正确失败
sys.exit(1)
```
1. Ensure `sys` is imported at the top of `tools/i18n/sync_ocr_expected.py`:
- Add `import sys` alongside the other imports if it's not already present.
2. The shown snippet is likely inside `main()` or another top‑level function. If this loop is not in `main()`, make sure that the `sys.exit(1)` (or equivalent non‑zero exit) is ultimately called from `main()` after this function returns a status indicating failure, instead of exiting directly from deep inside the call stack. Adjust the control flow accordingly if needed.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces an automated i18n sync mechanism to standardize OCR node expected fields across pipelines by expanding them into CN/TC/EN/JP text, and adds automation support via a translation submodule and a GitHub Action.
Changes:
- Added a Python tool to scan pipeline JSON/JSONC and rewrite OCR
expectedarrays using i18n tables (+ hotfix merge support). - Added a translation referrer submodule and a workflow to run the sync in write mode.
- Updated many pipeline
expectedarrays to include multi-language variants and reorder/append unresolved originals.
Reviewed changes
Copilot reviewed 54 out of 54 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tools/i18n/sync_ocr_expected.py | New sync tool to expand OCR expected using i18n reverse lookup + hotfix merge. |
| tools/i18n/EndFieldTranslationReferrer | Adds translation tables provider as a submodule gitlink. |
| tools/i18n/.gitignore | Ignores local i18n table JSON files under tools/i18n. |
| assets/resource_fast/pipeline/SellProduct/SellCore.json | Updates OCR expected entries to multi-language set / reorders unresolved items. |
| assets/resource_fast/pipeline/SellProduct/ChangeGoods.json | Updates OCR expected entries and removes inline comments. |
| assets/resource_fast/pipeline/DeliveryJobs/Wuling.json | Reorders/expands OCR expected list. |
| assets/resource_fast/pipeline/DeliveryJobs/ValleyIV.json | Reorders/expands OCR expected lists in multiple nodes. |
| assets/resource_fast/pipeline/DeliveryJobs/TransferJob.json | Reorders/expands OCR expected list. |
| assets/resource_fast/pipeline/DeliveryJobs/PackCargo.json | Expands OCR expected list, preserving original shorter strings at end. |
| assets/resource_fast/pipeline/Common/Text.json | Reorders/expands OCR expected list. |
| assets/resource_adb/pipeline/SeizeEntrustTask.json | Expands OCR expected into CN/TC/EN/JP. |
| assets/resource_adb/pipeline/BAKER.json | Expands OCR expected into CN/TC/EN/JP. |
| assets/resource/pipeline/Weapon.json | Expands OCR expected into CN/TC/EN/JP. |
| assets/resource/pipeline/SimpleProductionBatch.json | Expands/reorders OCR expected lists for multiple nodes. |
| assets/resource/pipeline/SeizeEntrustTask.json | Expands/reorders OCR expected lists; appends unresolved originals. |
| assets/resource/pipeline/SceneManager/SceneValleyIV.json | Expands/reorders OCR expected list. |
| assets/resource/pipeline/SceneManager/SceneMenu.json | Expands/reorders OCR expected lists (including repeated “ENDFIELD”). |
| assets/resource/pipeline/SceneManager/SceneMap.json | Expands OCR expected list with additional translations. |
| assets/resource/pipeline/SceneManager/SceneLogin.json | Reorders/expands OCR expected list (note: introduces a trailing space). |
| assets/resource/pipeline/Resell/ResellROI.json | Expands/reorders OCR expected list. |
| assets/resource/pipeline/ReadWiki.json | Expands OCR expected list with TC/JP. |
| assets/resource/pipeline/ProtocolSpace/Prepare.json | Expands OCR expected list with TC/JP. |
| assets/resource/pipeline/ProtocolSpace/OperationalManual.json | Expands OCR expected list (adds JP string). |
| assets/resource/pipeline/ItemTransfer.json | Expands/reorders OCR expected lists. |
| assets/resource/pipeline/ImportBluePrints.json | Expands OCR expected list with TC/EN/JP. |
| assets/resource/pipeline/GiftOperator.json | Expands OCR expected list with TC/EN/JP. |
| assets/resource/pipeline/GearAssembly.json | Expands OCR expected lists with TC/EN/JP. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/RainbowFin.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/IndoorCrops.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/EternalSunset.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CollapsedTianshiPillar.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CleansingJade.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CisternOriginiumSlugs.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/BeaconDamagedInBlightTide.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/AncientTree.json | Cleans inline comments and reorders OCR expected entries. |
| assets/resource/pipeline/DijiangRewards/Template/TextTemplate.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/DijiangRewards/Template/Status.json | Expands/reorders OCR expected entries; adds time-format variants. |
| assets/resource/pipeline/DijiangRewards/Template/Location.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/DijiangRewards/RecoveryEmotion.json | Expands/reorders OCR expected entries; appends unresolved originals. |
| assets/resource/pipeline/DijiangRewards/ReceptionRoom.json | Expands OCR expected entries (adds JP). |
| assets/resource/pipeline/DijiangRewards/MainFlow.json | Expands/reorders OCR expected entries; appends unresolved original. |
| assets/resource/pipeline/DijiangRewards/GrowthChamber.json | Expands/reorders OCR expected entries; adds multiple English variants. |
| assets/resource/pipeline/DailyRewards/Tasks.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/DailyRewards/ProtocolPass.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/DailyRewards/Emails.json | Expands OCR expected entries; adds multiple mailbox synonyms. |
| assets/resource/pipeline/DailyRewards/ClaimDeliveryJobs.json | Adds EN/JP variants alongside existing expected items. |
| assets/resource/pipeline/CreditShopping/Reflash.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/CreditShopping/ClaimCredit.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/Crafting.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/BatchAddFriends.json | Expands OCR expected entries to multi-language set. |
| assets/resource/pipeline/BAKER.json | Expands/reorders OCR expected entries; appends unresolved original. |
| assets/resource/pipeline/AutoSimulationCollect.json | Expands/reorders OCR expected entries; appends unresolved original. |
| .gitmodules | Registers the new translation referrer submodule. |
| .github/workflows/i18n-sync.yml | Adds a workflow to update submodule and run OCR expected sync in --write mode. |
Comments suppressed due to low confidence (3)
tools/i18n/sync_ocr_expected.py:1
- If a block comment
/* ... */is unterminated, the loop exits wheni + 1 >= self.n, but the code still doesi += 2, which can advance past EOF and cause downstream out-of-range indexing. Consider explicitly detecting the unterminated case and raisingValueError(\"Unterminated block comment\")before incrementing past the end.
assets/resource/pipeline/SceneManager/SceneLogin.json:1 - This string contains a trailing space, which can cause OCR matching to fail or reduce confidence because the expected token no longer matches the UI text exactly. Please remove the trailing whitespace so the expected value is stable.
tools/i18n/sync_ocr_expected.py:1 - This function signature exceeds common Python style line-length limits, which hurts readability and makes future diffs noisier. Consider wrapping parameters onto multiple lines (PEP 8 style) for consistency with the rest of the file.
…e and improve error handling in sync script - Introduced an optional input to the i18n-sync workflow for updating the translation submodule to the latest commit. - Added a step to display the current commit of the translation submodule. - Enhanced error handling in the sync_ocr_expected.py script to report failed file processing.
MistEO
reviewed
Mar 8, 2026
MistEO
reviewed
Mar 8, 2026
MistEO
reviewed
Mar 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary by Sourcery
在 CI 中接入按需执行的自动化流程:将流水线 JSON 中 OCR 期望文本与集中式 i18n 翻译表进行自动同步。
Enhancements:
tools/i18n下引入翻译子模块及相关忽略规则,将其作为本地化文本 ID 的权威来源。CI:
i18n-syncGitHub Actions 工作流,用于更新翻译子模块、以写入模式运行 OCR 期望文本同步脚本,并报告变更的文件。Original summary in English
Summary by Sourcery
Automate synchronization of OCR expected texts in pipeline JSONs with centralized i18n translation tables and wire it into CI for on-demand execution.
Enhancements:
CI: