English | 日本語 | 中文(简体) | 中文(粵語) | 한국어 | Русский | हिन्दी | தமிழ் | ไทย
Philosophy: Don't reinvent the wheel. This project deliberately avoids building romanization logic from scratch. Instead, it composes best-in-class, community-maintained libraries — one for each script — and focuses on the orchestration layer: script detection, engine routing, dialect handling, and a unified API. Every romanization engine in the dependency list is a dedicated, battle-tested library maintained by domain experts. That's the point.
Script detection and local romanization engine for lyrics. Supports 12 scripts across Japanese, Chinese (Mandarin and Cantonese), Korean, Cyrillic, Indic, Tamil, and Thai — all running locally with zero API calls.
Extracted from Spotify Karaoke. Used by OpenKara.
- Zero API calls — all romanization runs locally
- Auto script detection — pass in text, get back the detected script
- 12+ scripts — Japanese, Chinese, Korean, Cyrillic, 7 Indic scripts, Tamil, Thai
- Cantonese support — Jyutping alongside default Mandarin Pinyin
- Lightweight detector subpath — import only script detection without pulling in romanization engines
- Ukrainian-aware Cyrillic — auto-detects Ukrainian-specific characters and applies the correct transliteration preset
npm install lyric-romanizeryarn add lyric-romanizerpnpm add lyric-romanizerimport { createRomanizer, detectScript } from 'lyric-romanizer';
const romanizer = createRomanizer();
// Auto-detect script and romanize
const result = await romanizer.romanizeLines(['你好世界', 'こんにちは']);
// { script: 'chinese', lines: ['nǐ hǎo shì jiè', 'こんにちは'] }
// Romanize a single line
const line = await romanizer.romanizeLine('안녕하세요');
// 'annyeonghaseyo'// Main entry — full romanization engine
import {
createRomanizer,
detectScript,
isLatinScript,
requiresExternalRomanization,
UnsupportedRomanizationError,
} from 'lyric-romanizer';
// Detector-only subpath — lightweight, no romanization dependencies
import { detectScript, isLatinScript, NON_LATIN_SCRIPT_RE } from 'lyric-romanizer/detector';type ScriptType =
| 'japanese' | 'chinese' | 'korean' | 'cyrillic'
| 'devanagari' | 'gujarati' | 'gurmukhi' | 'telugu'
| 'kannada' | 'odia' | 'tamil' | 'malayalam'
| 'bengali' | 'arabic' | 'hebrew' | 'thai'
| 'other';
interface Romanizer {
romanizeLine(line: string, options?: RomanizeOptions): Promise<string>;
romanizeLines(lines: readonly string[], options?: RomanizeOptions): Promise<RomanizeResult>;
}
type RomanizeOptions = { script?: ScriptType; dialect?: 'mandarin' | 'cantonese' };
type RomanizeResult = { script: ScriptType; lines: string[] };
type RomanizerOptions = { japaneseDictPath?: string };Factory that returns a Romanizer instance. The Kuroshiro engine (Japanese) is lazily initialized on first use and cached.
const romanizer = createRomanizer();
// Override the Kuromoji dictionary CDN path (e.g. for self-hosting)
const romanizer = createRomanizer({
japaneseDictPath: 'https://my-cdn.com/kuromoji/dict',
});Detects the dominant script in the given text lines. Checks for Japanese kana first (definitive), then scores all other scripts by character count.
detectScript(['こんにちは']); // 'japanese'
detectScript(['你好世界']); // 'chinese'
detectScript(['Привет']); // 'cyrillic'
detectScript(['Hello world']); // 'latin'
detectScript(['123 ???']); // 'other'Fast check — returns true if the text contains only Latin letters (no CJK, Cyrillic, Indic, etc.). Useful for skipping romanization entirely.
isLatinScript(['Hello world']); // true
isLatinScript(['안녕하세요']); // false
isLatinScript(['♪♪♪']); // false (no letters)Returns true for scripts that cannot be romanized locally and require an external API.
requiresExternalRomanization('chinese'); // false
requiresExternalRomanization('arabic'); // true
requiresExternalRomanization('malayalam'); // trueRomanizes a single line. If script is omitted, it is auto-detected via detectScript. Returns the original line unchanged for Latin text or non-letter content.
For Chinese text, the dialect option controls the romanization system: 'mandarin' (default) uses Pinyin, 'cantonese' uses Jyutping.
Throws UnsupportedRomanizationError for external scripts.
await romanizer.romanizeLine('你好世界');
// 'nǐ hǎo shì jiè' (default: Mandarin/Pinyin)
await romanizer.romanizeLine('你好', { dialect: 'cantonese' });
// 'nei5 hou2' (Jyutping)
await romanizer.romanizeLine('Привет мир');
// 'Privet mir'
await romanizer.romanizeLine('Hello world');
// 'Hello world' (no-op)
await romanizer.romanizeLine('مرحبا');
// throws UnsupportedRomanizationError { script: 'arabic' }Romanizes multiple lines in parallel. Returns the detected script and romanized lines.
const { script, lines } = await romanizer.romanizeLines([
'สวัสดี',
'ชาวโลก',
]);
// { script: 'thai', lines: ['sawatdi', 'chaolok'] }Thrown when attempting to romanize a script that requires an external API. Has a script property for programmatic handling.
try {
await romanizer.romanizeLine('مرحبا');
} catch (err) {
if (err instanceof UnsupportedRomanizationError) {
console.log(err.script); // 'arabic'
// fall back to external API
}
}| Script | Engine | Example |
|---|---|---|
| Universal (fallback) | transliteration | Привет → Privet |
| Japanese | kuroshiro + kuromoji | こんにちは → konnichiha |
| Mandarin | pinyin-pro | 你好 → nǐ hǎo |
| Cantonese | to-jyutping | 佢冇 → keoi5 mou5 |
| Korean | @romanize/korean | 안녕 → annyeong |
| Cyrillic | cyrillic-to-translit-js | Привет → Privet |
| Devanagari | sanscript | नमस्ते → namaste |
| Gujarati | sanscript | નમસ્તે → namaste |
| Gurmukhi | sanscript | ਨਮਸਤੇ → namaste |
| Telugu | sanscript | నమస్తే → namaste |
| Kannada | sanscript | ನಮಸ್ತೆ → namaste |
| Odia | sanscript | ନମସ୍ତେ → namaste |
| Tamil | tamil-romanizer | வணக்கம் → vanakkam |
| Thai | @dehoist/romanize-thai | สวัสดี → sawatdi |
| Script | Method |
|---|---|
| Malayalam | Google Translate dt=rm |
| Bengali | Google Translate dt=rm |
| Arabic | Google Translate dt=rm |
| Hebrew | Google Translate dt=rm |
| Other | Google Translate dt=rm |
Use requiresExternalRomanization() to detect these and branch to your preferred API.
Cyrillic auto-detects Ukrainian-specific characters (і, ї, є, ґ) and applies the Ukrainian transliteration preset. All other Cyrillic text defaults to Russian.
Chinese text defaults to Mandarin (Pinyin). Pass dialect: 'cantonese' in RomanizeOptions to romanize Chinese text to Jyutping instead.
const { lines } = await romanizer.romanizeLines(['你好世界', '食飯'], {
script: 'chinese',
dialect: 'cantonese',
});
// ['nei5 hou2 sai3 gaai3', 'sik6 faan6']MIT