Skip to content

feat: update sync ocr for more adaptations#1164

Merged
MistEO merged 2 commits intov2from
chore/ocr-sync
Mar 9, 2026
Merged

feat: update sync ocr for more adaptations#1164
MistEO merged 2 commits intov2from
chore/ocr-sync

Conversation

@Constrat
Copy link
Member

@Constrat Constrat commented Mar 9, 2026

This will likely merge conflict A LOT of ongoing PRs.

The reason why I added them here anyway is because at least it won't run on the default branch as a workflow?
But the workflow will a open a PR anyway... hmm unsure on what to do.

Also should we add KR?

Technically we can add every single language Endfield supports... Obviously some languages become impossible for the OCR to work with and managing more than the "most" used ones is also annoying as replaces / regex may become tedious.

Summary by Sourcery

改进 OCR 期望文本的同步机制,并更新流水线资源以更好地适配多语言场景。

增强内容:

  • 在 OCR 语言 ID 解析中添加面向表格的回退策略,以处理不同语言中具有相同行内容时出现的模糊匹配问题。
  • 确保扩展后的 OCR 期望文本在不同语言之间去重,避免重复条目。
  • 更新多个流水线 JSON 资源,以与新的 OCR 期望同步行为保持一致。
Original summary in English

Summary by Sourcery

Improve OCR expected-text synchronization and update pipeline resources for broader language adaptations.

Enhancements:

  • Add a table-aware fallback in OCR language ID resolution to handle ambiguous matches with identical rows across languages.
  • Ensure expanded OCR expected text values are deduplicated across languages to avoid repeated entries.
  • Update multiple pipeline JSON resources to align with the new OCR expectation synchronization behavior.

@Constrat Constrat requested a review from overflow65537 March 9, 2026 14:07
@Constrat Constrat marked this pull request as ready for review March 9, 2026 14:09
Copilot AI review requested due to automatic review settings March 9, 2026 14:09
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我在这里给出了一些整体反馈:

  • resolve_lang_ids 中,新的“相同行”消歧逻辑依赖全局的 LANG_ORDER,并且会为每个存在歧义的文本重新计算 rows;建议将 LANG_ORDER(或预先按语言 ID 计算好的行映射)作为参数传入,这样更容易测试,也能避免重复的表查找。
  • expand_expected_from_ids 中,seen 集合在所有 lang_ids 之间是共享的,这会把行为改成“全局去重文本”;如果本意只是避免单行内重复,请将 seen 移到循环内部,这样它会在每个 lang_id 上重新初始化。
给 AI Agent 的提示
Please address the comments from this code review:

## Overall Comments
- In `resolve_lang_ids`, the new identical-row disambiguation logic relies on the global `LANG_ORDER` and recomputes `rows` for each ambiguous text; consider passing `LANG_ORDER` (or a precomputed per-lang-id row map) as an argument to make this easier to test and avoid repeated table lookups.
- The `seen` set in `expand_expected_from_ids` is shared across all `lang_ids`, which changes behavior to globally deduplicate texts; if the intention was only to avoid duplicates within a single row, move `seen` inside the loop so it resets for each `lang_id`.

Sourcery 对开源项目免费 —— 如果你觉得我们的代码评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English

Hey - I've left some high level feedback:

  • In resolve_lang_ids, the new identical-row disambiguation logic relies on the global LANG_ORDER and recomputes rows for each ambiguous text; consider passing LANG_ORDER (or a precomputed per-lang-id row map) as an argument to make this easier to test and avoid repeated table lookups.
  • The seen set in expand_expected_from_ids is shared across all lang_ids, which changes behavior to globally deduplicate texts; if the intention was only to avoid duplicates within a single row, move seen inside the loop so it resets for each lang_id.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `resolve_lang_ids`, the new identical-row disambiguation logic relies on the global `LANG_ORDER` and recomputes `rows` for each ambiguous text; consider passing `LANG_ORDER` (or a precomputed per-lang-id row map) as an argument to make this easier to test and avoid repeated table lookups.
- The `seen` set in `expand_expected_from_ids` is shared across all `lang_ids`, which changes behavior to globally deduplicate texts; if the intention was only to avoid duplicates within a single row, move `seen` inside the loop so it resets for each `lang_id`.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

本 PR 主要用于批量更新各 pipeline OCR 节点的 expected 文本,以适配更多语言/表述,并增强 sync_ocr_expected.py 在反查 i18n ID 时的兜底策略与输出去重,降低重复文本带来的识别噪声。

Changes:

  • 增强 tools/i18n/sync_ocr_expected.py:歧义 ID 新增“多语言整行一致则自动选取”兜底,并对展开后的 expected 做去重。
  • 大范围更新 assets/resource* 下 pipeline 的 OCR expected 列表:补充/调整中繁英日/韩等文本,清理重复项、修正顺序与覆盖范围。
  • 同步修正部分节点的 expected 内容(如 “Refresh/更新/刷新”、“探索等级/Exploration Level”等)以提升跨语言识别命中率。

Reviewed changes

Copilot reviewed 51 out of 51 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/i18n/sync_ocr_expected.py OCR expected 同步脚本:歧义 ID 兜底 + 展开文本去重
assets/resource_fast/pipeline/SellProduct/SellCore.json 去除 expected 中重复文本
assets/resource_fast/pipeline/SellProduct/EnterOutpost.json 扩充据点相关 OCR 期望文本(含多语言/别名)
assets/resource_fast/pipeline/DeliveryJobs/Wuling.json 调整/补齐投标与装箱相关多语言期望文本顺序
assets/resource_fast/pipeline/DeliveryJobs/ValleyIV.json 清理重复项并调整多语言期望文本顺序
assets/resource_fast/pipeline/Common/Text.json 清理重复项并调整区域文本期望(含韩语)
assets/resource_fast/pipeline/Common/Status.json 补齐“探索等级”多语言期望文本
assets/resource_fast/pipeline/Common/ChangeRegion.json 清理重复项并调整区域切换 OCR 文本(含韩语)
assets/resource_adb/pipeline/SeizeEntrustTask.json 清理“Refresh/更新”重复期望文本
assets/resource/pipeline/Weapon.json 补齐品质/排序相关多语言期望文本
assets/resource/pipeline/SimpleProductionBatch.json 扩充材料不足/成功提示的多语言期望文本并去重
assets/resource/pipeline/SeizeEntrustTask.json 补齐委托任务相关多语言期望文本并清理重复项
assets/resource/pipeline/SceneManager/SceneValleyIV.json 补齐部分地点繁中/清理重复项
assets/resource/pipeline/SceneManager/SceneMenu.json 菜单项补齐繁中并清理 ENDFIELD 等重复项
assets/resource/pipeline/Resell/ResellROI.json 扩充“朋友/加载中”等多语言期望文本
assets/resource/pipeline/Resell.json 补齐繁中与“无法购买”多语言期望文本
assets/resource/pipeline/ProtocolSpace/Prepare.json 补齐协议空间/等级多语言期望文本
assets/resource/pipeline/ProtocolSpace/OperationalManual.json 补齐“行动手册”繁中并去除重复项
assets/resource/pipeline/ProdManual.json 多处补齐繁中/英日等期望文本并调整返回等同义词
assets/resource/pipeline/OpenGame.json 补齐继续/领取/获得奖励等多语言期望文本
assets/resource/pipeline/ItemTransfer.json 扩充好友/仓库切换/连接状态/排序等多语言期望文本
assets/resource/pipeline/ImportBluePrints.json 扩充共享蓝图与保存状态的多语言期望文本
assets/resource/pipeline/GiftOperator.json 补齐干员联络/默认等多语言期望文本
assets/resource/pipeline/GearAssembly.json 补齐排序“DESC/降順”等多语言期望文本
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/RainbowFin.json 调整接取/前往任务等多语言期望文本顺序
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/IndoorCrops.json 同上(IndoorCrops 任务)
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/EternalSunset.json 同上(EternalSunset 任务)
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CollapsedTianshiPillar.json 同上(CollapsedTianshiPillar 任务)
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CleansingJade.json 同上(CleansingJade 任务)
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/CisternOriginiumSlugs.json 同上(CisternOriginiumSlugs 任务)
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/BeaconDamagedInBlightTide.json 同上(BeaconDamagedInBlightTide 任务)
assets/resource/pipeline/EnvironmentMonitoring/OutskirtsMonitoringTerminal/AncientTree.json 同上(AncientTree 任务)
assets/resource/pipeline/DijiangRewards/Template/TextTemplate.json 补齐领取/取消/获得奖励等多语言期望文本并去重
assets/resource/pipeline/DijiangRewards/Template/Status.json 清理计时文本重复项
assets/resource/pipeline/DijiangRewards/Template/Location.json 补齐会客室多语言期望文本
assets/resource/pipeline/DijiangRewards/ReceptionRoom.json 会客室/赠予/已填入等节点补齐与去重
assets/resource/pipeline/DijiangRewards/NeedCredit.json 补齐会客室多语言期望文本
assets/resource/pipeline/DijiangRewards/Manufacturing.json 扩充制造相关多语言期望文本
assets/resource/pipeline/DijiangRewards/MainFlow.json 补齐培养舱与取消的多语言期望文本
assets/resource/pipeline/DijiangRewards/GrowthChamber.json 培养舱相关节点补齐多语言并清理重复/调整排序文本
assets/resource/pipeline/DailyRewards/Tasks.json 补齐行动手册/简易制作/装备制造等多语言期望文本
assets/resource/pipeline/DailyRewards/ProtocolPass.json 补齐“一键领取/快速领取”等多语言期望文本顺序
assets/resource/pipeline/DailyRewards/Event.json 补齐活动中心繁中并清理重复项
assets/resource/pipeline/DailyRewards/Emails.json 补齐获得道具/全部收取/全部领取等多语言期望文本
assets/resource/pipeline/CreditShopping/ClaimCredit.json 去除重复文本项
assets/resource/pipeline/Crafting.json 补齐简易制作多语言期望文本并去重
assets/resource/pipeline/BatchAddFriends.json 补齐搜索/添加好友等多语言期望文本
assets/resource/pipeline/BAKER.json 补齐“全部显示”多语言期望文本
assets/resource/pipeline/AutoSimulationCollect.json 补齐行动手册/模拟/区域/确认等多语言期望文本并去重
assets/resource/pipeline/AutoEcoFarm/AutoEcoFarmFindFarmland.json 补齐取消/追踪多语言期望文本
assets/resource/pipeline/AutoEcoFarm/AutoEcoFarmCommon.json 补齐清空多语言期望文本

Comment on lines +448 to +450
if len(set(rows)) == 1:
lang_id = min(candidates) # Choose smallest ID
if lang_id not in resolved_set:
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 min(candidates) 会按字符串字典序取最小值;如果 lang_id 是数字字符串(如 "10"、"2"),会得到错误结果,和注释里的“smallest ID”不一致。建议按数值比较(例如对纯数字用 key=int),或显式说明/保证 ID 的排序规则。

Copilot uses AI. Check for mistakes.
pass
else:
unresolved_texts.append(text)
# Third fallback: if ambigous IDs have identical rows in all languages, pick anyone (smallest ID)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释里 ambigous 拼写错误,建议改为 ambiguous

Copilot uses AI. Check for mistakes.
@MistEO MistEO merged commit e103bb0 into v2 Mar 9, 2026
21 checks passed
@MistEO MistEO deleted the chore/ocr-sync branch March 9, 2026 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants