-
Notifications
You must be signed in to change notification settings - Fork 13
Proposal: Translation Memory (TM) based i18n workflow #202
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
This issue summarizes the discussion between @em3s and @eazyhozy.
Motivation
We want to automate documentation translation (starting with Korean) while:
- Preserving MDX structure (code blocks, JSX, tables)
- Maintaining term consistency via glossary
- Respecting human-contributed translations
- Keeping the process transparent and open to contributions
Architecture
The OSS CI has no external API dependencies — it only performs TM lookup. LLM translation runs on Kakao's infrastructure; the process and scripts are documented in TRANSLATION.md.
OSS CI — triggers on docs, TM, or glossary changes:
en/*.mdx changed ─┐
│ ┌─────────────┐ ┌─────────────┐ ┌──────────┐
tm/** changed ──┼───▶│ Parse MDX │───▶│ TM Lookup │───▶│ Generate │──▶ PR
│ │ extract │ │ exact match │ │ ko/*.mdx │
glossary changed ──┘ │ segments │ │ HIT→target │ └──────────┘
└─────────────┘ │ MISS→source │
└─────────────┘
External pipeline — runs periodically on Kakao infrastructure:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐
│ Scan TM │───▶│ Batch │───▶│ Update TM │───▶│ PR to │
│ collect │ │ translate │ │ tm/ko/*.yaml│ │ OSS repo │
│ MISS entries│ │ (LLM + │ │ contributors│ └──────────┘
└─────────────┘ │ glossary) │ │ =[model] │ │
└─────────────┘ └─────────────┘ │
▼
OSS CI triggers again
TM Format
Per-document YAML files at tm/{lang}/{doc-path}.yaml:
- source: "Core Concepts"
target: "핵심 개념"
contributors: [dave.lake]
context: title
- source: "Actionbase is a database for serving user interactions."
target: "Actionbase는 사용자 인터랙션을 위한 데이터베이스입니다."
contributors: [kanana-2]
context: paragraph- Matching is exact string match — any source change triggers re-translation
- Entries with human contributors are never overwritten by automation
- MISS segments keep English original in the translated doc (easy to spot)
Glossary Format
# glossary/ko.yaml
version: v1
terms:
Edge: "엣지"
Query: "쿼리"
Mutation: "뮤테이션"
preserve:
- Actionbase
- HBase
- KubernetesFile Structure
tm/ko/{doc-path}.yaml # Translation Memory (per-document)
glossary/ko.yaml # Glossary
scripts/translate-map.py # TM mapping script
TRANSLATION.md # Process documentation
.github/workflows/
translation.yml # CI: triggers on docs/TM/glossary changes
src/content/docs/
*.mdx # English source
ko/*.mdx # Translated output
Contributing Translations
Edit tm/ko/{doc}.yaml directly — add or fix entries with your GitHub username in contributors, then open a PR. The CI regenerates translated docs automatically.
Work Items
- WI-1: TM mapping script + CI workflow + seed TM from existing translations
- WI-2: External TM update pipeline +
TRANSLATION.md - WI-3: Add translation contribution guide to
CONTRIBUTING.md(TM editing workflow, glossary usage, PR conventions)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request