GitHub - yigitkonur/cli-localize: AI-powered localization CLI — SRT, JSON, PO, XML, ARB with token-aware batching

translation CLI built for AI agents. breaks localization files into token-sized batches, hands them to an LLM in a compact format, validates the output, and reconstructs the translated file. stateful and resumable — pick up where you left off.

cli-localize init -i messages.json -l "en>tr"
cli-localize batch -s .loc-*.json -b 1
cli-localize submit -s .loc-*.json -b 1 -p batch1.ibf
cli-localize finalize -s .loc-*.json

or in one shot:

cli-localize oneshot -i messages.json -l "en>tr"

what it does

7 localization formats — SRT, JSON (i18next/react-intl/vue-i18n), PO/POT, Android XML, iOS .strings, YAML (Rails/Symfony), Flutter ARB
token-aware batching — uses tiktoken (cl100k_base) to split files into batches that fit LLM context windows. estimates output length with 1.2x expansion factor
IBF (indexed block format) — compact wire format for LLM translation. one entry per line, IDs in brackets, newlines escaped. minimal token waste
context windows for subtitles — SRT batches include surrounding entries as read-only context so the LLM can maintain narrative coherence
5-layer validation — structural check, extraction, decode, content verification (ID matching, no hallucinated IDs), placeholder preservation
retry with 3 attempts — failed batches get re-queued automatically. after 3 failures, skip and move on
resumable sessions — all state persists to a .loc-*.json file next to your input. crash, restart, continue
graceful fallback — unfinished batches fall back to source text on finalize

supported formats

format	extensions	placeholder style
SRT	`.srt`	none (timecodes preserved)
JSON	`.json`	`{{name}}` (i18next), `{name}` (ICU)
PO/POT	`.po`, `.pot`	`%s`, `%(name)s` (printf)
Android XML	`.xml`	`%1$s`, `%2$d`
iOS .strings	`.strings`	`%@`, `%d`, `%ld`, `%f`
YAML	`.yml`, `.yaml`	`%{name}` (Ruby), `{{name}}`
Flutter ARB	`.arb`	`{name}`, `{count, plural, ...}` (ICU)

auto-detected by file extension. for .xml, content is sniffed for <resources> to confirm Android format.

install

pip install .

or with uv:

uv sync

for a standalone binary (no Python needed):

uv sync --extra dev
uv run python build.py --clean
# produces: dist/cli-localize-{platform}

requires Python 3.10+. only two runtime dependencies: tiktoken and pyyaml.

usage

step by step

# 1. parse file, create session
cli-localize init -i strings.json -l "en>de" -t 5000

# 2. get batch in IBF format (pipe to your LLM)
cli-localize batch -s .loc-a1b2-c3d4e5f6.json -b 1

# 3. submit the LLM's translation
cli-localize submit -s .loc-a1b2-c3d4e5f6.json -b 1 -p translated.ibf

# 4. repeat for remaining batches, then finalize
cli-localize finalize -s .loc-a1b2-c3d4e5f6.json

oneshot (for simple agent loops)

cli-localize oneshot -i strings.json -l "en>de"

auto-creates or resumes a session and returns the next pending batch. designed for single-turn agent workflows.

check progress

cli-localize status -s .loc-a1b2-c3d4e5f6.json

list supported formats

cli-localize formats

IBF format

the wire format between cli-localize and the LLM. minimal, line-oriented, token-efficient.

request (sent to LLM):

#TRANSLATE:v1:en>tr:batch=1/10:entries=5:ctx=10
@context_before
[47] previous subtitle for context
[48] another context entry
@translate
[49] text to translate
[50] another entry
@context_after
[51] following context
---

response (from LLM):

#TRANSLATED:v1:batch=1/10:count=5:status=ok
[49] translated text
[50] another translation
---

newlines in content are escaped as \n. empty translations are [id] with no trailing text.

CLI reference

init

flag	default	description
`-i, --input`	required	input file path
`-o, --output`	`{lang}_{stem}{ext}`	output file path
`-l, --lang`	`en>tr`	language pair (quote it — `>` is shell redirection)
`-t, --target-tokens`	`5000`	target tokens per batch
`-c, --context`	`10`	context window size (SRT only)
`-f, --format`	`auto`	force format: `srt json po android strings yaml arb`

batch

flag	description
`-s, --session`	session state file
`-b, --batch`	batch number (1-indexed)
`-p, --with-prompt`	prepend a full translation prompt before the IBF block

submit

flag	description
`-s, --session`	session state file
`-b, --batch`	batch number
`-p, --patch`	path to `.ibf` file with the LLM's response

status / finalize

flag	description
`-s, --session`	session state file

how batching works

TokenBatcher uses tiktoken's cl100k_base encoding. for each entry it estimates output tokens as floor(source_tokens * 1.2) + 10 — 20% expansion for translated text, 10 tokens for IBF framing overhead. batches fill until the target token count would be exceeded. falls back to 25 entries per batch if tiktoken isn't available.

validation pipeline

when you submit a translated batch, it goes through five checks:

structural — header regex, --- delimiter, line format matches [id] text
extraction — strips LLM preamble/postamble, finds #TRANSLATED: and --- markers
decode — parses metadata and entries, unescapes \n
content — count match, all expected IDs present, no hallucinated IDs, batch number matches
placeholder — checks source-format placeholders are preserved (warnings only, doesn't reject)

project structure

xlat/
  cli.py              — argparse entry point, command routing
  session.py          — stateful translation session logic
  ibf_format.py       — IBF encoder/decoder/validator
  batcher.py          — token-aware batch splitting
  format_handlers/
    base.py           — FormatHandler ABC, registry, TranslationEntry
    srt.py            — SubRip subtitles
    json_handler.py   — JSON (i18next, react-intl, vue-i18n)
    po.py             — GNU gettext PO/POT
    android_xml.py    — Android strings.xml
    ios_strings.py    — iOS/macOS .strings
    yaml_handler.py   — YAML (Rails/Symfony)
    arb.py            — Flutter ARB

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
tests		tests
xlat		xlat
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
agent-prompt.md		agent-prompt.md
build.py		build.py
cli-localize.spec		cli-localize.spec
cli_localize_main.py		cli_localize_main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

what it does

supported formats

install

usage

step by step

oneshot (for simple agent loops)

check progress

list supported formats

IBF format

CLI reference

init

batch

submit

status / finalize

how batching works

validation pipeline

project structure

license

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

what it does

supported formats

install

usage

step by step

oneshot (for simple agent loops)

check progress

list supported formats

IBF format

CLI reference

init

batch

submit

status / finalize

how batching works

validation pipeline

project structure

license

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages