lyric-romanizer

Philosophy: Don't reinvent the wheel. This project deliberately avoids building romanization logic from scratch. Instead, it composes best-in-class, community-maintained libraries — one for each script — and focuses on the orchestration layer: script detection, engine routing, dialect handling, and a unified API. Every romanization engine in the dependency list is a dedicated, battle-tested library maintained by domain experts. That's the point.

Script detection and local romanization engine for lyrics. Supports 12 scripts across Japanese, Chinese (Mandarin and Cantonese), Korean, Cyrillic, Indic, Tamil, and Thai — all running locally with zero API calls.

Extracted from Spotify Karaoke. Used by OpenKara.

Features

Zero API calls — all romanization runs locally
Auto script detection — pass in text, get back the detected script
12+ scripts — Japanese, Chinese, Korean, Cyrillic, 7 Indic scripts, Tamil, Thai
Cantonese support — Jyutping alongside default Mandarin Pinyin
Lightweight detector subpath — import only script detection without pulling in romanization engines
Ukrainian-aware Cyrillic — auto-detects Ukrainian-specific characters and applies the correct transliteration preset

Installation

npm install lyric-romanizer

yarn add lyric-romanizer

pnpm add lyric-romanizer

Quick Start

import { createRomanizer, detectScript } from 'lyric-romanizer';

const romanizer = createRomanizer();

// Auto-detect script and romanize
const result = await romanizer.romanizeLines(['你好世界', 'こんにちは']);
// { script: 'chinese', lines: ['nǐ hǎo shì jiè', 'こんにちは'] }

// Romanize a single line
const line = await romanizer.romanizeLine('안녕하세요');
// 'annyeonghaseyo'

API

Imports

// Main entry — full romanization engine
import {
  createRomanizer,
  detectScript,
  isLatinScript,
  requiresExternalRomanization,
  UnsupportedRomanizationError,
} from 'lyric-romanizer';

// Detector-only subpath — lightweight, no romanization dependencies
import { detectScript, isLatinScript, NON_LATIN_SCRIPT_RE } from 'lyric-romanizer/detector';

Types

type ScriptType =
  | 'japanese' | 'chinese' | 'korean' | 'cyrillic'
  | 'devanagari' | 'gujarati' | 'gurmukhi' | 'telugu'
  | 'kannada' | 'odia' | 'tamil' | 'malayalam'
  | 'bengali' | 'arabic' | 'hebrew' | 'thai'
  | 'other';

interface Romanizer {
  romanizeLine(line: string, options?: RomanizeOptions): Promise<string>;
  romanizeLines(lines: readonly string[], options?: RomanizeOptions): Promise<RomanizeResult>;
}

type RomanizeOptions = { script?: ScriptType; dialect?: 'mandarin' | 'cantonese' };
type RomanizeResult = { script: ScriptType; lines: string[] };
type RomanizerOptions = { japaneseDictPath?: string };

`createRomanizer(options?)`

Factory that returns a Romanizer instance. The Kuroshiro engine (Japanese) is lazily initialized on first use and cached.

const romanizer = createRomanizer();

// Override the Kuromoji dictionary CDN path (e.g. for self-hosting)
const romanizer = createRomanizer({
  japaneseDictPath: 'https://my-cdn.com/kuromoji/dict',
});

`detectScript(lines)`

Detects the dominant script in the given text lines. Checks for Japanese kana first (definitive), then scores all other scripts by character count.

detectScript(['こんにちは']);          // 'japanese'
detectScript(['你好世界']);            // 'chinese'
detectScript(['Привет']);             // 'cyrillic'
detectScript(['Hello world']);        // 'latin'
detectScript(['123 ???']);            // 'other'

`isLatinScript(lines)`

Fast check — returns true if the text contains only Latin letters (no CJK, Cyrillic, Indic, etc.). Useful for skipping romanization entirely.

isLatinScript(['Hello world']);  // true
isLatinScript(['안녕하세요']);    // false
isLatinScript(['♪♪♪']);         // false (no letters)

`requiresExternalRomanization(script)`

Returns true for scripts that cannot be romanized locally and require an external API.

requiresExternalRomanization('chinese');   // false
requiresExternalRomanization('arabic');    // true
requiresExternalRomanization('malayalam'); // true

`romanizer.romanizeLine(line, options?)`

Romanizes a single line. If script is omitted, it is auto-detected via detectScript. Returns the original line unchanged for Latin text or non-letter content.

For Chinese text, the dialect option controls the romanization system: 'mandarin' (default) uses Pinyin, 'cantonese' uses Jyutping.

Throws UnsupportedRomanizationError for external scripts.

await romanizer.romanizeLine('你好世界');
// 'nǐ hǎo shì jiè' (default: Mandarin/Pinyin)

await romanizer.romanizeLine('你好', { dialect: 'cantonese' });
// 'nei5 hou2' (Jyutping)

await romanizer.romanizeLine('Привет мир');
// 'Privet mir'

await romanizer.romanizeLine('Hello world');
// 'Hello world' (no-op)

await romanizer.romanizeLine('مرحبا');
// throws UnsupportedRomanizationError { script: 'arabic' }

`romanizer.romanizeLines(lines, options?)`

Romanizes multiple lines in parallel. Returns the detected script and romanized lines.

const { script, lines } = await romanizer.romanizeLines([
  'สวัสดี',
  'ชาวโลก',
]);
// { script: 'thai', lines: ['sawatdi', 'chaolok'] }

`UnsupportedRomanizationError`

Thrown when attempting to romanize a script that requires an external API. Has a script property for programmatic handling.

try {
  await romanizer.romanizeLine('مرحبا');
} catch (err) {
  if (err instanceof UnsupportedRomanizationError) {
    console.log(err.script); // 'arabic'
    // fall back to external API
  }
}

Supported Scripts

Local (fully offline)

Script	Engine	Example
Universal (fallback)	transliteration	`Привет` → `Privet`
Japanese	kuroshiro + kuromoji	`こんにちは` → `konnichiha`
Mandarin	pinyin-pro	`你好` → `nǐ hǎo`
Cantonese	to-jyutping	`佢冇` → `keoi5 mou5`
Korean	@romanize/korean	`안녕` → `annyeong`
Cyrillic	cyrillic-to-translit-js	`Привет` → `Privet`
Devanagari	sanscript	`नमस्ते` → `namaste`
Gujarati	sanscript	`નમસ્તે` → `namaste`
Gurmukhi	sanscript	`ਨਮਸਤੇ` → `namaste`
Telugu	sanscript	`నమస్తే` → `namaste`
Kannada	sanscript	`ನಮಸ್ತೆ` → `namaste`
Odia	sanscript	`ନମସ୍ତେ` → `namaste`
Tamil	tamil-romanizer	`வணக்கம்` → `vanakkam`
Thai	@dehoist/romanize-thai	`สวัสดี` → `sawatdi`

External (requires API)

Script	Method
Malayalam	Google Translate `dt=rm`
Bengali	Google Translate `dt=rm`
Arabic	Google Translate `dt=rm`
Hebrew	Google Translate `dt=rm`
Other	Google Translate `dt=rm`

Use requiresExternalRomanization() to detect these and branch to your preferred API.

Script-Specific Notes

Cyrillic Detection

Cyrillic auto-detects Ukrainian-specific characters (і, ї, є, ґ) and applies the Ukrainian transliteration preset. All other Cyrillic text defaults to Russian.

Cantonese Support

Chinese text defaults to Mandarin (Pinyin). Pass dialect: 'cantonese' in RomanizeOptions to romanize Chinese text to Jyutping instead.

const { lines } = await romanizer.romanizeLines(['你好世界', '食飯'], {
  script: 'chinese',
  dialect: 'cantonese',
});
// ['nei5 hou2 sai3 gaai3', 'sik6 faan6']

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lyric-romanizer

Features

Installation

Quick Start

API

Imports

Types

`createRomanizer(options?)`

`detectScript(lines)`

`isLatinScript(lines)`

`requiresExternalRomanization(script)`

`romanizer.romanizeLine(line, options?)`

`romanizer.romanizeLines(lines, options?)`

`UnsupportedRomanizationError`

Supported Scripts

Local (fully offline)

External (requires API)

Script-Specific Notes

Cyrillic Detection

Cantonese Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lyric-romanizer

Features

Installation

Quick Start

API

Imports

Types

createRomanizer(options?)

detectScript(lines)

isLatinScript(lines)

requiresExternalRomanization(script)

romanizer.romanizeLine(line, options?)

romanizer.romanizeLines(lines, options?)

UnsupportedRomanizationError

Supported Scripts

Local (fully offline)

External (requires API)

Script-Specific Notes

Cyrillic Detection

Cantonese Support

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`createRomanizer(options?)`

`detectScript(lines)`

`isLatinScript(lines)`

`requiresExternalRomanization(script)`

`romanizer.romanizeLine(line, options?)`

`romanizer.romanizeLines(lines, options?)`

`UnsupportedRomanizationError`

Packages