Skip to content

Proposal: Translation Memory (TM) based i18n workflow #202

@eazyhozy

Description

@eazyhozy

This issue summarizes the discussion between @em3s and @eazyhozy.

Motivation

We want to automate documentation translation (starting with Korean) while:

  • Preserving MDX structure (code blocks, JSX, tables)
  • Maintaining term consistency via glossary
  • Respecting human-contributed translations
  • Keeping the process transparent and open to contributions

Architecture

The OSS CI has no external API dependencies — it only performs TM lookup. LLM translation runs on Kakao's infrastructure; the process and scripts are documented in TRANSLATION.md.

OSS CI — triggers on docs, TM, or glossary changes:

en/*.mdx changed ─┐
                   │    ┌─────────────┐    ┌─────────────┐    ┌──────────┐
tm/**    changed ──┼───▶│ Parse MDX   │───▶│ TM Lookup   │───▶│ Generate │──▶ PR
                   │    │ extract     │    │ exact match │    │ ko/*.mdx │
glossary changed ──┘    │ segments    │    │ HIT→target  │    └──────────┘
                        └─────────────┘    │ MISS→source │
                                           └─────────────┘

External pipeline — runs periodically on Kakao infrastructure:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌──────────┐
│ Scan TM     │───▶│ Batch       │───▶│ Update TM   │───▶│ PR to    │
│ collect     │    │ translate   │    │ tm/ko/*.yaml│    │ OSS repo │
│ MISS entries│    │ (LLM +      │    │ contributors│    └──────────┘
└─────────────┘    │  glossary)  │    │ =[model]    │         │
                   └─────────────┘    └─────────────┘         │
                                                              ▼
                                                   OSS CI triggers again

TM Format

Per-document YAML files at tm/{lang}/{doc-path}.yaml:

- source: "Core Concepts"
  target: "핵심 개념"
  contributors: [dave.lake]
  context: title

- source: "Actionbase is a database for serving user interactions."
  target: "Actionbase는 사용자 인터랙션을 위한 데이터베이스입니다."
  contributors: [kanana-2]
  context: paragraph
  • Matching is exact string match — any source change triggers re-translation
  • Entries with human contributors are never overwritten by automation
  • MISS segments keep English original in the translated doc (easy to spot)

Glossary Format

# glossary/ko.yaml
version: v1
terms:
  Edge: "엣지"
  Query: "쿼리"
  Mutation: "뮤테이션"
preserve:
  - Actionbase
  - HBase
  - Kubernetes

File Structure

tm/ko/{doc-path}.yaml       # Translation Memory (per-document)
glossary/ko.yaml             # Glossary
scripts/translate-map.py     # TM mapping script
TRANSLATION.md               # Process documentation
.github/workflows/
  translation.yml            # CI: triggers on docs/TM/glossary changes
src/content/docs/
  *.mdx                      # English source
  ko/*.mdx                   # Translated output

Contributing Translations

Edit tm/ko/{doc}.yaml directly — add or fix entries with your GitHub username in contributors, then open a PR. The CI regenerates translated docs automatically.

Work Items

  • WI-1: TM mapping script + CI workflow + seed TM from existing translations
  • WI-2: External TM update pipeline + TRANSLATION.md
  • WI-3: Add translation contribution guide to CONTRIBUTING.md (TM editing workflow, glossary usage, PR conventions)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions