uniwidth

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 22, 2025 License: MIT Imports: 1 Imported by: 0

README ΒΆ

uniwidth - Modern Unicode Width Calculation for Go

Go Version CI Status Go Report Card codecov Go Reference License Release Stars

uniwidth is a modern, high-performance Unicode width calculation library for Go 1.25+. It provides 3.9-46x faster width calculation compared to existing solutions through tiered lookup optimization and Go 1.25+ compiler features.

πŸš€ Performance

Based on comprehensive benchmarks vs go-runewidth:

  • ASCII strings: 15-46x faster
  • CJK strings: 4-14x faster
  • Mixed/Emoji strings: 6-8x faster
  • Zero allocations: 0 B/op, 0 allocs/op

Run benchmarks yourself: cd bench && go test -bench=. -benchmem

✨ Features

  • πŸš€ 3.9-46x faster than go-runewidth (proven in benchmarks)
  • πŸ’Ž Zero allocations (no GC pressure)
  • 🧡 Thread-safe (immutable design, no global state)
  • 🎯 Unicode 16.0 support
  • πŸ”§ Modern API (Go 1.25+, clean design)
  • πŸ“Š Tiered lookup (O(1) for 90-95% of cases)

πŸ“¦ Installation

go get github.com/unilibs/uniwidth

Requirements: Go 1.25 or later

πŸ”§ Usage

Basic Usage
package main

import (
    "fmt"
    "github.com/unilibs/uniwidth"
)

func main() {
    // Calculate width of a string
    width := uniwidth.StringWidth("Hello δΈ–η•Œ")
    fmt.Println(width) // Output: 10 (Hello=5, space=1, δΈ–η•Œ=4)

    // Calculate width of a single rune
    w := uniwidth.RuneWidth('δΈ–')
    fmt.Println(w) // Output: 2

    // ASCII-only strings are super fast!
    width = uniwidth.StringWidth("Hello, World!")
    fmt.Println(width) // Output: 13
}
Options API (NEW!)

Configure handling of ambiguous-width characters:

import "github.com/unilibs/uniwidth"

// East Asian locale (ambiguous characters are wide)
opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide),
}
width := uniwidth.StringWidthWithOptions("Β±Β½", opts...)
fmt.Println(width) // Output: 4 (each character is 2 columns)

// Neutral locale (ambiguous characters are narrow) - DEFAULT
opts = []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow),
}
width = uniwidth.StringWidthWithOptions("Β±Β½", opts...)
fmt.Println(width) // Output: 2 (each character is 1 column)
Real-World TUI Examples
// Terminal prompt
prompt := "❯ Enter command: "
width := uniwidth.StringWidth(prompt)
fmt.Printf("Prompt width: %d columns\n", width)

// Table cell padding
text := "Hello δΈ–η•Œ"
padding := 20 - uniwidth.StringWidth(text)
fmt.Printf("%s%s\n", text, strings.Repeat(" ", padding))

// Truncate to fit terminal width
func truncate(s string, maxWidth int) string {
    width := 0
    for i, r := range s {
        w := uniwidth.RuneWidth(r)
        if width+w > maxWidth {
            return s[:i] + "…"
        }
        width += w
    }
    return s
}
Performance-Critical Code
// ASCII fast path (46x faster than go-runewidth!)
text := "Hello, World!"
width := uniwidth.StringWidth(text) // ~4.6 ns/op

// CJK fast path (14x faster!)
text := "δ½ ε₯½δΈ–η•Œ"
width := uniwidth.StringWidth(text) // ~33.7 ns/op

// Mixed content (8x faster!)
text := "Hello πŸ‘‹ World"
width := uniwidth.StringWidth(text) // ~65.9 ns/op

// All with zero allocations!

πŸ—οΈ Architecture

Tiered Lookup Strategy

uniwidth uses a multi-tier approach for optimal performance:

  1. Tier 1: ASCII Fast Path (O(1))

    • Covers ~95% of typical terminal content
    • Uses simple len(s) for ASCII-only strings
    • 15-46x faster than binary search
  2. Tier 2: Common CJK & Emoji (O(1))

    • Range checks for frequent characters
    • CJK Unified Ideographs: 20,992 characters
    • Common emoji ranges
    • 4-14x faster than binary search
  3. Tier 3: Binary Search Fallback (O(log n))

    • For rare characters not in hot paths
    • Minimal overhead (~5-10% of cases)
Go 1.25+ Optimizations
  • SIMD Auto-Vectorization: ASCII detection uses SSE2/AVX2
  • Aggressive Inlining: Hot paths compile to minimal instructions
  • Zero Allocations: No heap allocations, no GC pressure

πŸ“Š Benchmarks

BenchmarkStringWidth_ASCII_Short_Uniwidth-12     149590729   9.500 ns/op   0 B/op   0 allocs/op
BenchmarkStringWidth_ASCII_Short_GoRunewidth-12   10065044  150.1 ns/op   0 B/op   0 allocs/op
                                                             ^^^^^^^^^^
                                                             15.8x faster!

BenchmarkStringWidth_CJK_Short_Uniwidth-12        19064941   63.64 ns/op   0 B/op   0 allocs/op
BenchmarkStringWidth_CJK_Short_GoRunewidth-12      2771077  368.0 ns/op   0 B/op   0 allocs/op
                                                             ^^^^^^^^^^^
                                                             5.8x faster!

Run benchmarks yourself:

go test -bench=. -benchmem

🎯 Use Cases

Perfect for:

  • TUI frameworks (terminal rendering hot paths)
  • Terminal emulators (text layout calculations)
  • CLI tools (table alignment, formatting)
  • Text editors (cursor positioning, column calculation)
  • Any high-performance text width calculation

πŸ”„ Migration from go-runewidth

uniwidth provides a compatible API for easy migration:

// Before (go-runewidth)
import "github.com/mattn/go-runewidth"
width := runewidth.StringWidth(s)

// After (uniwidth) - drop-in replacement!
import "github.com/unilibs/uniwidth"
width := uniwidth.StringWidth(s)

Performance improvement: 3.9-46x faster, zero code changes!

πŸ“š Documentation

πŸ§ͺ Testing

# Run tests
go test -v

# Run benchmarks
go test -bench=. -benchmem

# Run with coverage
go test -cover

Current test coverage: 90.3% (exceeds 90% target βœ…)

πŸš€ Development Status

Current: v0.1.0 (Stable Release)

βœ… Stable Release: This library has completed beta testing. The API is stable and ready for production use. Minor version updates (v0.2.x) will maintain backward compatibility.

What Beta Means:

  • βœ… Feature-complete for core functionality
  • βœ… Production-quality code and performance
  • ⚠️ API may evolve based on community feedback
  • ⚠️ Edge cases still being discovered and fixed
  • 🎯 Goal: API freeze before v1.0.0-rc

Completed:

  • βœ… PoC (3 days) - 3.9-46x speedup proven
  • βœ… Complete Unicode 16.0 tables - Generated from official data
  • βœ… Options API - East Asian Width & emoji configuration
  • βœ… Comprehensive testing - 84.6% coverage, fuzzing, conformance tests
  • βœ… Bug fixes - Variation selectors, regional indicator flags
  • βœ… Documentation - README, ARCHITECTURE, CHANGELOG

Beta Goals (Before RC):

  • Community feedback integration
  • Edge case coverage >95%
  • API stability validation
  • Performance regression testing
  • Documentation refinement

Future Roadmap (v1.0+):

  • Grapheme cluster support (for complex emoji ZWJ sequences)
  • Additional locale support
  • Extended SIMD optimizations
  • Profile-Guided Optimization (PGO)

🀝 Contributing

Contributions welcome! This is part of the unilibs organization - modern Unicode libraries for Go.

πŸ“„ License

MIT License - see LICENSE file

Built by the Phoenix TUI Framework team.

Part of the unilibs ecosystem:

  • uniwidth - Unicode width calculation (this project)
  • unigrapheme - Grapheme clustering (planned)
  • More Unicode utilities coming soon!

πŸ“ž Support


πŸ™ Special Thanks

Professor Ancha Baranova - This project would not have been possible without her invaluable help and support. Her assistance was crucial in bringing uniwidth to life.


Made with ❀️ by the Phoenix team | Powered by Go 1.25+

Documentation ΒΆ

Overview ΒΆ

Package uniwidth provides modern Unicode width calculation for Go 1.25+.

uniwidth uses a tiered lookup strategy for optimal performance:

  • Tier 1: ASCII (O(1), ~95% of typical content)
  • Tier 2: Common CJK & Emoji (O(1), ~90% of non-ASCII)
  • Tier 3: Binary search for rare characters (O(log n))

This approach is 3-4x faster than traditional binary-search-only methods like go-runewidth, while maintaining full Unicode 16.0 compliance.

Index ΒΆ

Constants ΒΆ

This section is empty.

Variables ΒΆ

This section is empty.

Functions ΒΆ

func RuneWidth ΒΆ

func RuneWidth(r rune) int

RuneWidth returns the visual width of a rune in monospace terminals.

Returns:

  • 0 for control characters, zero-width joiners, combining marks
  • 1 for most characters (ASCII, Latin, Cyrillic, etc.)
  • 2 for wide characters (CJK, Emoji, etc.)

This function uses a tiered lookup strategy:

  • O(1) for ASCII (most common case)
  • O(1) for common CJK and emoji (hot paths)
  • O(log n) for rare characters (fallback)

func RuneWidthWithOptions ΒΆ

func RuneWidthWithOptions(r rune, opts ...Option) int

RuneWidthWithOptions returns the visual width of a rune with custom options.

This function applies the same tiered lookup strategy as RuneWidth, but allows customization of ambiguous character handling and emoji presentation.

Example:

// East Asian locale (ambiguous characters are wide)
width := uniwidth.RuneWidthWithOptions('Β±', uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide))
// width = 2

// Neutral locale (ambiguous characters are narrow)
width := uniwidth.RuneWidthWithOptions('Β±', uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow))
// width = 1

func StringWidth ΒΆ

func StringWidth(s string) int

StringWidth calculates the visual width of a string in monospace terminals.

This function provides a fast path for ASCII-only strings, and uses RuneWidth for strings containing Unicode characters.

Special handling:

  • Variation selectors (U+FE0E/U+FE0F) modify the width of the preceding character
  • Regional indicator pairs (flags) are counted as width 2, not 4

func StringWidthWithOptions ΒΆ

func StringWidthWithOptions(s string, opts ...Option) int

StringWidthWithOptions calculates the visual width of a string with custom options.

This function applies the same fast paths as StringWidth, but allows customization of ambiguous character handling and emoji presentation.

Example:

// East Asian locale (ambiguous characters are wide)
opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide),
}
width := uniwidth.StringWidthWithOptions("Hello Β±Β½", opts...)
// width = 10 (Hello=5, space=1, Β±=2, Β½=2)

// Neutral locale (ambiguous characters are narrow)
opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow),
}
width := uniwidth.StringWidthWithOptions("Hello Β±Β½", opts...)
// width = 8 (Hello=5, space=1, Β±=1, Β½=1)

Types ΒΆ

type EAWidth ΒΆ

type EAWidth int

EAWidth represents the width for East Asian Ambiguous characters.

const (
	// EANarrow treats ambiguous characters as narrow (width 1).
	// This is the default for non-East Asian locales.
	EANarrow EAWidth = 1

	// EAWide treats ambiguous characters as wide (width 2).
	// This is appropriate for East Asian (CJK) locales.
	EAWide EAWidth = 2
)

type Option ΒΆ

type Option func(*Options)

Option is a functional option for configuring Unicode width calculation.

func WithEastAsianAmbiguous ΒΆ

func WithEastAsianAmbiguous(width EAWidth) Option

WithEastAsianAmbiguous sets the width for East Asian Ambiguous characters.

Example:

// Treat ambiguous characters as wide (East Asian locale)
width := uniwidth.StringWidthWithOptions("Β±Β½", uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide))
// width = 4 (each character is 2 columns wide)

// Treat ambiguous characters as narrow (neutral locale)
width := uniwidth.StringWidthWithOptions("Β±Β½", uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow))
// width = 2 (each character is 1 column wide)

func WithEmojiPresentation ΒΆ

func WithEmojiPresentation(emoji bool) Option

WithEmojiPresentation sets whether emoji should be rendered as emoji (wide) or text (narrow).

Example:

// Emoji as emoji (wide, width 2) - default
width := uniwidth.StringWidthWithOptions("πŸ˜€", uniwidth.WithEmojiPresentation(true))
// width = 2

// Emoji as text (narrow, width 1)
width := uniwidth.StringWidthWithOptions("πŸ˜€", uniwidth.WithEmojiPresentation(false))
// width = 1

Note: This primarily affects emoji that have both text and emoji presentation variants. Most emoji are always rendered as wide regardless of this setting.

type Options ΒΆ

type Options struct {
	// EastAsianAmbiguous specifies how to handle ambiguous-width characters.
	// Default: EANarrow (width 1)
	EastAsianAmbiguous EAWidth

	// EmojiPresentation specifies whether emoji should be rendered as emoji (width 2)
	// or text (width 1). When true, emoji are treated as width 2.
	// Default: true (emoji presentation)
	EmojiPresentation bool
}

Options configures Unicode width calculation behavior.

Use the functional options pattern to create customized configurations:

opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide),
    uniwidth.WithEmojiPresentation(true),
}
width := uniwidth.StringWidthWithOptions("Hello δΈ–η•Œ", opts...)

Directories ΒΆ

Path Synopsis
cmd
generate-tables command
generate-tables generates Unicode width tables from official Unicode 16.0 data.
generate-tables generates Unicode width tables from official Unicode 16.0 data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL