Skip to content

chore:自动i18n#1123

Merged
MistEO merged 7 commits intomainfrom
chore/i18n
Mar 8, 2026
Merged

chore:自动i18n#1123
MistEO merged 7 commits intomainfrom
chore/i18n

Conversation

@overflow65537
Copy link
Member

@overflow65537 overflow65537 commented Mar 8, 2026

Summary by Sourcery

在 CI 中接入按需执行的自动化流程:将流水线 JSON 中 OCR 期望文本与集中式 i18n 翻译表进行自动同步。

Enhancements:

  • 新增一个 Python 工具,用于基于 CN/TC/EN/JP 的 i18n 表和可选的热修复覆盖,对流水线定义中的 OCR 期望字段进行规范化处理。
  • tools/i18n 下引入翻译子模块及相关忽略规则,将其作为本地化文本 ID 的权威来源。
  • 批量更新流水线 JSON 资源,使 OCR 节点统一使用标准化的、多语言期望文本格式。

CI:

  • 新增一个 i18n-sync GitHub Actions 工作流,用于更新翻译子模块、以写入模式运行 OCR 期望文本同步脚本,并报告变更的文件。
Original summary in English

Summary by Sourcery

Automate synchronization of OCR expected texts in pipeline JSONs with centralized i18n translation tables and wire it into CI for on-demand execution.

Enhancements:

  • Add a Python utility to normalize OCR expected fields in pipeline definitions using CN/TC/EN/JP i18n tables and optional hotfix overrides.
  • Introduce a translation submodule and related ignore rules under tools/i18n to serve as the authoritative source for localized text IDs.
  • Bulk-update pipeline JSON resources so OCR nodes use the standardized, multi-language expected text format.

CI:

  • Add an i18n-sync GitHub Actions workflow that updates the translation submodule, runs the OCR expected sync script in write mode, and reports changed files.

Copilot AI review requested due to automatic review settings March 8, 2026 14:43
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 2 个问题

给 AI Agent 的提示
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="tools/i18n/sync_ocr_expected.py" line_range="423-432" />
<code_context>
+    unresolved_texts: List[str] = []
+
+    # 第一轮:唯一命中
+    for text, candidates in candidates_by_text:
+        if len(candidates) == 1:
+            lang_id = next(iter(candidates))
+            if lang_id not in resolved_set:
+                resolved_in_order.append(lang_id)
+                resolved_set.add(lang_id)
+        elif len(candidates) == 0:
+            unresolved_texts.append(text)
+
+    # 第二轮:如果歧义候选与已解析 ID 有交集,用交集兜底
+    for text, candidates in candidates_by_text:
+        if len(candidates) > 1:
+            intersection = [lang_id for lang_id in resolved_in_order if lang_id in candidates]
+            if len(intersection) == 1:
+                lang_id = intersection[0]
+                if lang_id not in resolved_set:
</code_context>
<issue_to_address>
**issue (bug_risk):** 由于 `resolved_set` 的检查,当前歧义文本的解析逻辑永远无法从交集中分配 ID。

在第二轮中,`intersection` 来源于 `resolved_in_order`,而 `resolved_set` 也是基于同一个列表构建的。这意味着对 `intersection` 中任意 `lang_id` 来说,`if lang_id not in resolved_set` 这个条件始终为假,因此分支体永远不会执行,歧义文本也就不会通过这个启发式规则被解析。如果你希望歧义文本能复用已有的 ID,大概率需要 (a) 去掉这个条件判断,并避免重新加入 `resolved_in_order`/`resolved_set`,或者 (b) 允许 `resolved_in_order` 中出现重复项。按当前写法,第二轮的逻辑实际上是死代码。
</issue_to_address>

### Comment 2
<location path="tools/i18n/sync_ocr_expected.py" line_range="472-481" />
<code_context>
+    unresolved_all: List[Tuple[str, str, List[str]]] = []
+
+    for file_path in pipeline_files:
+        try:
+            new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
+                file_path, tables, reverse_index
+            )
+        except Exception as exc:
+            safe_print(f"[ERROR] {file_path}: {exc}")
+            continue
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 在处理文件时吞掉所有异常会隐藏损坏的 JSON 或逻辑错误,同时仍然返回退出码 0。

由于循环在捕获到 `Exception` 后只记录日志并继续执行,即使有部分文件解析或处理失败,CI 仍然会显示为成功,导致更新结果悄无声息地变成“只更新了一部分”。建议对失败进行统计(例如使用 `failed_files` 计数或列表),并在 `main()` 中如果存在失败就以非零状态码退出;或者对于像 JSON/解析错误这类关键异常选择性地重新抛出,让任务显式失败。

建议实现如下:

```python
    unresolved_all: List[Tuple[str, str, List[str]]] = []
    failed_files: List[Tuple[str, Exception]] = []

    for file_path in pipeline_files:
        try:
            new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
                file_path, tables, reverse_index
            )
        except Exception as exc:
            safe_print(f"[ERROR] {file_path}: {exc}")
            failed_files.append((file_path, exc))
            continue

    if failed_files:
        safe_print(
            f"[ERROR] 共有 {len(failed_files)} 个文件处理失败,退出码为 1:"  # summary in Chinese to match existing comments
        )
        for path, exc in failed_files:
            safe_print(f"  - {path}: {exc}")
        # 非零退出码以便 CI 正确失败
        sys.exit(1)

```

1. 确保在 `tools/i18n/sync_ocr_expected.py` 文件顶部已经导入了 `sys`- 如果还没有,请和其他 import 放在一起添加 `import sys`2. 以上代码片段很可能位于 `main()` 或其他顶层函数中。如果这个循环不在 `main()` 里,请确保最终是由 `main()` 在该函数返回失败状态后调用 `sys.exit(1)`(或等价的非零退出),而不是从调用栈深处直接退出进程。如有需要,请相应地调整控制流。
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的代码评审。
Original comment in English

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="tools/i18n/sync_ocr_expected.py" line_range="423-432" />
<code_context>
+    unresolved_texts: List[str] = []
+
+    # 第一轮:唯一命中
+    for text, candidates in candidates_by_text:
+        if len(candidates) == 1:
+            lang_id = next(iter(candidates))
+            if lang_id not in resolved_set:
+                resolved_in_order.append(lang_id)
+                resolved_set.add(lang_id)
+        elif len(candidates) == 0:
+            unresolved_texts.append(text)
+
+    # 第二轮:如果歧义候选与已解析 ID 有交集,用交集兜底
+    for text, candidates in candidates_by_text:
+        if len(candidates) > 1:
+            intersection = [lang_id for lang_id in resolved_in_order if lang_id in candidates]
+            if len(intersection) == 1:
+                lang_id = intersection[0]
+                if lang_id not in resolved_set:
</code_context>
<issue_to_address>
**issue (bug_risk):** Ambiguous text resolution logic can never assign an ID from the intersection because of the `resolved_set` check.

In the second pass, `intersection` is derived from `resolved_in_order`, and `resolved_set` is built from the same list. That makes the `if lang_id not in resolved_set` guard always false for any `lang_id` in `intersection`, so the body never runs and ambiguous texts are never resolved via this heuristic. If you want ambiguous texts to reuse an existing ID, you likely need to (a) drop this guard and avoid re-adding to `resolved_in_order`/`resolved_set`, or (b) allow duplicate entries in `resolved_in_order`. As written, the second-pass logic is effectively dead code.
</issue_to_address>

### Comment 2
<location path="tools/i18n/sync_ocr_expected.py" line_range="472-481" />
<code_context>
+    unresolved_all: List[Tuple[str, str, List[str]]] = []
+
+    for file_path in pipeline_files:
+        try:
+            new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
+                file_path, tables, reverse_index
+            )
+        except Exception as exc:
+            safe_print(f"[ERROR] {file_path}: {exc}")
+            continue
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Swallowing all exceptions when processing files can hide broken JSON or logic errors while still returning exit code 0.

Because the loop only logs and continues on `Exception`, CI will show success even when some files fail to parse or process, leading to silently partial updates. Consider tracking failures (e.g., a `failed_files` counter) and exiting non‑zero from `main()` if any occur, or selectively re‑raising for critical exceptions like JSON/parse errors so the job fails visibly.

Suggested implementation:

```python
    unresolved_all: List[Tuple[str, str, List[str]]] = []
    failed_files: List[Tuple[str, Exception]] = []

    for file_path in pipeline_files:
        try:
            new_text, changes, unresolved_nodes, ocr_nodes = process_pipeline_file(
                file_path, tables, reverse_index
            )
        except Exception as exc:
            safe_print(f"[ERROR] {file_path}: {exc}")
            failed_files.append((file_path, exc))
            continue

    if failed_files:
        safe_print(
            f"[ERROR] 共有 {len(failed_files)} 个文件处理失败,退出码为 1:"  # summary in Chinese to match existing comments
        )
        for path, exc in failed_files:
            safe_print(f"  - {path}: {exc}")
        # 非零退出码以便 CI 正确失败
        sys.exit(1)

```

1. Ensure `sys` is imported at the top of `tools/i18n/sync_ocr_expected.py`:

   - Add `import sys` alongside the other imports if it's not already present.

2. The shown snippet is likely inside `main()` or another top‑level function. If this loop is not in `main()`, make sure that the `sys.exit(1)` (or equivalent non‑zero exit) is ultimately called from `main()` after this function returns a status indicating failure, instead of exiting directly from deep inside the call stack. Adjust the control flow accordingly if needed.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an automated i18n sync mechanism to standardize OCR node expected fields across pipelines by expanding them into CN/TC/EN/JP text, and adds automation support via a translation submodule and a GitHub Action.

Changes:

  • Added a Python tool to scan pipeline JSON/JSONC and rewrite OCR expected arrays using i18n tables (+ hotfix merge support).
  • Added a translation referrer submodule and a workflow to run the sync in write mode.
  • Updated many pipeline expected arrays to include multi-language variants and reorder/append unresolved originals.

Reviewed changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tools/i18n/sync_ocr_expected.py New sync tool to expand OCR expected using i18n reverse lookup + hotfix merge.
tools/i18n/EndFieldTranslationReferrer Adds translation tables provider as a submodule gitlink.
tools/i18n/.gitignore Ignores local i18n table JSON files under tools/i18n.
assets/resource_fast/pipeline/SellProduct/SellCore.json Updates OCR expected entries to multi-language set / reorders unresolved items.
assets/resource_fast/pipeline/SellProduct/ChangeGoods.json Updates OCR expected entries and removes inline comments.
assets/resource_fast/pipeline/DeliveryJobs/Wuling.json Reorders/expands OCR expected list.
assets/resource_fast/pipeline/DeliveryJobs/ValleyIV.json Reorders/expands OCR expected lists in multiple nodes.
assets/resource_fast/pipeline/DeliveryJobs/TransferJob.json Reorders/expands OCR expected list.
assets/resource_fast/pipeline/DeliveryJobs/PackCargo.json Expands OCR expected list, preserving original shorter strings at end.
assets/resource_fast/pipeline/Common/Text.json Reorders/expands OCR expected list.
assets/resource_adb/pipeline/SeizeEntrustTask.json Expands OCR expected into CN/TC/EN/JP.
assets/resource_adb/pipeline/BAKER.json Expands OCR expected into CN/TC/EN/JP.
assets/resource/pipeline/Weapon.json Expands OCR expected into CN/TC/EN/JP.
assets/resource/pipeline/SimpleProductionBatch.json Expands/reorders OCR expected lists for multiple nodes.
assets/resource/pipeline/SeizeEntrustTask.json Expands/reorders OCR expected lists; appends unresolved originals.
assets/resource/pipeline/SceneManager/SceneValleyIV.json Expands/reorders OCR expected list.
assets/resource/pipeline/SceneManager/SceneMenu.json Expands/reorders OCR expected lists (including repeated “ENDFIELD”).
assets/resource/pipeline/SceneManager/SceneMap.json Expands OCR expected list with additional translations.
assets/resource/pipeline/SceneManager/SceneLogin.json Reorders/expands OCR expected list (note: introduces a trailing space).
assets/resource/pipeline/Resell/ResellROI.json Expands/reorders OCR expected list.
assets/resource/pipeline/ReadWiki.json Expands OCR expected list with TC/JP.
assets/resource/pipeline/ProtocolSpace/Prepare.json Expands OCR expected list with TC/JP.
assets/resource/pipeline/ProtocolSpace/OperationalManual.json Expands OCR expected list (adds JP string).
assets/resource/pipeline/ItemTransfer.json Expands/reorders OCR expected lists.
assets/resource/pipeline/ImportBluePrints.json Expands OCR expected list with TC/EN/JP.
assets/resource/pipeline/GiftOperator.json Expands OCR expected list with TC/EN/JP.
assets/resource/pipeline/GearAssembly.json Expands OCR expected lists with TC/EN/JP.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/RainbowFin.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/IndoorCrops.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/EternalSunset.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CollapsedTianshiPillar.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CleansingJade.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CisternOriginiumSlugs.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/BeaconDamagedInBlightTide.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/AncientTree.json Cleans inline comments and reorders OCR expected entries.
assets/resource/pipeline/DijiangRewards/Template/TextTemplate.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/DijiangRewards/Template/Status.json Expands/reorders OCR expected entries; adds time-format variants.
assets/resource/pipeline/DijiangRewards/Template/Location.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/DijiangRewards/RecoveryEmotion.json Expands/reorders OCR expected entries; appends unresolved originals.
assets/resource/pipeline/DijiangRewards/ReceptionRoom.json Expands OCR expected entries (adds JP).
assets/resource/pipeline/DijiangRewards/MainFlow.json Expands/reorders OCR expected entries; appends unresolved original.
assets/resource/pipeline/DijiangRewards/GrowthChamber.json Expands/reorders OCR expected entries; adds multiple English variants.
assets/resource/pipeline/DailyRewards/Tasks.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/DailyRewards/ProtocolPass.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/DailyRewards/Emails.json Expands OCR expected entries; adds multiple mailbox synonyms.
assets/resource/pipeline/DailyRewards/ClaimDeliveryJobs.json Adds EN/JP variants alongside existing expected items.
assets/resource/pipeline/CreditShopping/Reflash.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/CreditShopping/ClaimCredit.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/Crafting.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/BatchAddFriends.json Expands OCR expected entries to multi-language set.
assets/resource/pipeline/BAKER.json Expands/reorders OCR expected entries; appends unresolved original.
assets/resource/pipeline/AutoSimulationCollect.json Expands/reorders OCR expected entries; appends unresolved original.
.gitmodules Registers the new translation referrer submodule.
.github/workflows/i18n-sync.yml Adds a workflow to update submodule and run OCR expected sync in --write mode.
Comments suppressed due to low confidence (3)

tools/i18n/sync_ocr_expected.py:1

  • If a block comment /* ... */ is unterminated, the loop exits when i + 1 >= self.n, but the code still does i += 2, which can advance past EOF and cause downstream out-of-range indexing. Consider explicitly detecting the unterminated case and raising ValueError(\"Unterminated block comment\") before incrementing past the end.
    assets/resource/pipeline/SceneManager/SceneLogin.json:1
  • This string contains a trailing space, which can cause OCR matching to fail or reduce confidence because the expected token no longer matches the UI text exactly. Please remove the trailing whitespace so the expected value is stable.
    tools/i18n/sync_ocr_expected.py:1
  • This function signature exceeds common Python style line-length limits, which hurts readability and makes future diffs noisier. Consider wrapping parameters onto multiple lines (PEP 8 style) for consistency with the rest of the file.

…e and improve error handling in sync script

- Introduced an optional input to the i18n-sync workflow for updating the translation submodule to the latest commit.
- Added a step to display the current commit of the translation submodule.
- Enhanced error handling in the sync_ocr_expected.py script to report failed file processing.
@MistEO MistEO merged commit b9e6f83 into main Mar 8, 2026
17 checks passed
@MistEO MistEO deleted the chore/i18n branch March 8, 2026 15:19
MistEO pushed a commit that referenced this pull request Mar 9, 2026
MistEO pushed a commit that referenced this pull request Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants