uawk

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2026 License: MIT Imports: 8 Imported by: 0

README

uawk

Go Reference CI Go Report Card

AWK interpreter written in Go with coregex regex engine.

Features

  • POSIX AWK compliant with GNU AWK extensions
  • Parallel file processing (-j N)
  • Embeddable Go API
  • Zero CGO dependencies

Installation

go install github.com/kolkov/uawk/cmd/uawk@latest

Usage

Command Line
# Basic usage
uawk '{ print $1 }' file.txt

# Field separator
uawk -F: '{ print $1 }' /etc/passwd

# Variables
uawk -v name=World 'BEGIN { print "Hello, " name }'

# Program from file
uawk -f script.awk input.txt

# Parallel processing
uawk -j 4 '{ sum += $1 } END { print sum }' *.log

# Non-POSIX regex mode
uawk --no-posix '/pattern/ { print }' file.txt
As a Library
package main

import (
    "fmt"
    "strings"
    "github.com/kolkov/uawk"
)

func main() {
    output, err := uawk.Run(`{ print $1 }`, strings.NewReader("hello world"), nil)
    if err != nil {
        panic(err)
    }
    fmt.Print(output)

    // With configuration
    config := &uawk.Config{
        FS: ":",
        Variables: map[string]string{"threshold": "100"},
    }
    output, err = uawk.Run(`$2 > threshold { print $1 }`, input, config)

    // Compile once, run multiple times
    prog, err := uawk.Compile(`{ sum += $1 } END { print sum }`)
    for _, file := range files {
        result, _ := prog.Run(file, nil)
        fmt.Println(result)
    }
}

Benchmarks

See uawk-bench for benchmark suite and methodology.

Results vary by workload. Regex-heavy patterns benefit from coregex optimizations. I/O-bound workloads show smaller differences between implementations.

Building

git clone https://github.com/kolkov/uawk
cd uawk
go build -o uawk ./cmd/uawk

Requires Go 1.25+.

Architecture

Source → Lexer → Parser → AST → Semantic Analysis → Compiler → Optimizer → VM
Component Description
Lexer Context-sensitive tokenizer, UTF-8
Parser Recursive descent
Compiler Bytecode generation (~110 opcodes)
Optimizer Peephole optimization
VM Stack-based execution

Supported Features

Standard AWK
  • Pattern-action rules, BEGIN/END blocks
  • Field splitting and assignment
  • Built-in variables (NR, NF, FS, RS, OFS, ORS, FILENAME, etc.)
  • Arithmetic, string, and regex operators
  • Control flow (if/else, while, for, do-while)
  • Associative arrays
  • Built-in functions (print, printf, sprintf, length, substr, split, sub, gsub, match, tolower, toupper, sin, cos, exp, log, sqrt, int, rand, srand, system, etc.)
  • User-defined functions
  • I/O redirection (>, >>, |, getline)
Extensions
  • -j N parallel execution
  • -c Unicode character operations
  • --posix / --no-posix regex mode
  • Debug flags (-d, -da, -dt)

License

MIT

Acknowledgments

  • GoAWK by Ben Hoyt — reference implementation and test suite
  • coregex — regex engine

Documentation

Overview

Package uawk provides a high-performance AWK interpreter.

uawk is a modern AWK implementation written in Go, featuring:

  • Full POSIX AWK compatibility
  • High-performance regex engine (coregex)
  • Zero external dependencies for core functionality
  • Embeddable library for Go applications

Quick Start

For simple one-off execution:

output, err := uawk.Run(`{ print $1 }`, strings.NewReader("hello world"), nil)

With configuration:

output, err := uawk.Run(program, input, &uawk.Config{
    FS: ":",
    Variables: map[string]string{"threshold": "100"},
})

Compiled Programs

For repeated execution of the same program:

prog, err := uawk.Compile(`$1 > threshold { print $2 }`)
if err != nil {
    log.Fatal(err)
}

for _, file := range files {
    output, err := prog.Run(file, &uawk.Config{
        Variables: map[string]string{"threshold": "100"},
    })
    // ...
}

Configuration

The Config type allows customization of AWK execution:

  • Field and record separators (FS, RS, OFS, ORS)
  • Pre-defined variables
  • Custom I/O writers

Error Handling

Errors are returned as specific types for detailed handling:

Thread Safety

Compiled Program objects are safe for concurrent use. Each call to Program.Run creates an independent execution context.

Index

Examples

Constants

View Source
const Version = "0.1.0"

Version is the uawk version string.

Variables

This section is empty.

Functions

func Exec

func Exec(program string, input io.Reader, output io.Writer, config *Config) error

Exec is a simplified interface for running an AWK program. It reads from input, writes to output, and returns any error.

This function is useful for integration with I/O pipelines where you need control over the output writer.

Example:

err := uawk.Exec(`{ print toupper($0) }`, os.Stdin, os.Stdout, nil)

func IsExitError

func IsExitError(err error) (int, bool)

IsExitError reports whether err is an ExitError and returns the exit code. Returns (code, true) if err is an ExitError, or (0, false) otherwise.

func Run

func Run(program string, input io.Reader, config *Config) (string, error)

Run executes an AWK program with the given input. This is a convenience function for one-off execution. For repeated execution of the same program, use Compile followed by Program.Run.

Parameters:

  • program: AWK source code
  • input: input data reader (can be nil for programs without input)
  • config: execution configuration (can be nil for defaults)

Returns the program output as a string, or an error if parsing, compilation, or execution fails.

Example:

output, err := uawk.Run(`{ print $1 }`, strings.NewReader("hello world"), nil)
// output: "hello\n"
Example

Example functions for documentation

package main

import (
	"fmt"
	"strings"

	"github.com/kolkov/uawk"
)

func main() {
	output, _ := uawk.Run(`{ print $1 }`, strings.NewReader("hello world\n"), nil)
	fmt.Print(output)
}
Output:

hello

Types

type CompileError

type CompileError struct {
	Message string // Error description
}

CompileError represents a semantic error during compilation.

func (*CompileError) Error

func (e *CompileError) Error() string

type Config

type Config struct {
	// FS is the input field separator (default: " ").
	// When set to a single space, runs of whitespace are treated as separators.
	// Otherwise, each occurrence of the string is a separator.
	// Can also be a regular expression pattern.
	FS string

	// RS is the input record separator (default: "\n").
	// When set to empty string, records are separated by blank lines.
	RS string

	// OFS is the output field separator (default: " ").
	// Used when printing multiple values with print statement.
	OFS string

	// ORS is the output record separator (default: "\n").
	// Appended after each print statement.
	ORS string

	// Variables contains pre-defined variables.
	// These are set before BEGIN block execution.
	// Example: map[string]string{"threshold": "100", "prefix": "LOG:"}
	Variables map[string]string

	// Output is the writer for print/printf statements.
	// If nil, output is captured and returned from Run.
	Output io.Writer

	// Stderr is the writer for error output.
	// If nil, errors are discarded.
	Stderr io.Writer

	// Args contains command-line arguments (ARGV).
	// Args[0] is typically the program name.
	Args []string

	// POSIXRegex enables POSIX leftmost-longest regex matching.
	// When true (default), uses AWK/POSIX ERE semantics (slower but compliant).
	// When false, uses leftmost-first matching (faster, Perl-like).
	// Set to false for better performance when POSIX compliance is not required.
	POSIXRegex *bool

	// Parallel enables parallel execution with the specified number of workers.
	// When > 1, the program is executed in parallel if it is safe to do so.
	// When 0 or 1, sequential execution is used (default).
	// Note: Parallel execution has limitations - see CanParallelize().
	Parallel int

	// ChunkSize is the approximate size in bytes of each input chunk
	// when parallel execution is enabled. Default: 4MB (4 * 1024 * 1024).
	ChunkSize int
}

Config holds configuration options for AWK execution.

type ExitError

type ExitError struct {
	Code int // Exit status code (0 = success)
}

ExitError represents a normal exit with a status code. This is not an error condition; it indicates the AWK program called exit with the given status.

func (*ExitError) Error

func (e *ExitError) Error() string

type ParallelAnalysis added in v0.2.0

type ParallelAnalysis struct {
	Safety           ParallelSafety
	CanParallelize   bool
	HasAggregation   bool
	AggregatedVars   []int
	AggregatedArrays []int
}

ParallelAnalysis contains the results of parallel safety analysis.

type ParallelSafety added in v0.2.0

type ParallelSafety int

ParallelSafety represents the parallelization safety level.

const (
	// ParallelUnsafe indicates the program cannot be parallelized.
	ParallelUnsafe ParallelSafety = iota
	// ParallelStateless indicates the program is embarrassingly parallel.
	ParallelStateless
	// ParallelAggregatable indicates the program can be parallelized with aggregation.
	ParallelAggregatable
)

type ParseError

type ParseError struct {
	Line    int    // 1-based line number
	Column  int    // 1-based column number
	Message string // Error description
}

ParseError represents a syntax error in AWK source code.

func (*ParseError) Error

func (e *ParseError) Error() string

type Program

type Program struct {
	// contains filtered or unexported fields
}

Program represents a compiled AWK program ready for execution. It is safe for concurrent use; each call to Run creates an independent execution context.

func Compile

func Compile(program string) (*Program, error)

Compile parses and compiles an AWK program for execution. The returned Program can be executed multiple times with different inputs.

Example:

prog, err := uawk.Compile(`{ sum += $1 } END { print sum }`)
if err != nil {
    log.Fatal(err)
}
output1, _ := prog.Run(file1, nil)
output2, _ := prog.Run(file2, nil)
Example
package main

import (
	"fmt"
	"strings"

	"github.com/kolkov/uawk"
)

func main() {
	prog, _ := uawk.Compile(`{ sum += $1 } END { print sum }`)
	output, _ := prog.Run(strings.NewReader("1\n2\n3\n"), nil)
	fmt.Print(output)
}
Output:

6

func MustCompile

func MustCompile(program string) *Program

MustCompile is like Compile but panics if the program cannot be compiled. It simplifies initialization of global program variables.

Example:

var sumProgram = uawk.MustCompile(`{ sum += $1 } END { print sum }`)

func (*Program) CanParallelize added in v0.2.0

func (p *Program) CanParallelize(rs string) *ParallelAnalysis

CanParallelize checks if this program can be safely parallelized. Returns a ParallelAnalysis struct with detailed information about why the program can or cannot be parallelized.

func (*Program) Disassemble

func (p *Program) Disassemble() string

Disassemble returns a human-readable representation of the compiled bytecode. Useful for debugging and understanding program structure.

func (*Program) Run

func (p *Program) Run(input io.Reader, config *Config) (string, error)

Run executes the compiled program with the given input and configuration. Returns the output as a string, or an error if execution fails.

If config is nil, default configuration is used. If config.Output is set, output is written there and the returned string will be empty. If config.Parallel > 1 and the program is parallelizable, it will be executed using multiple worker goroutines.

func (*Program) Source

func (p *Program) Source() string

Source returns the original AWK source code.

type RuntimeError

type RuntimeError struct {
	Message string // Error description
}

RuntimeError represents an error during AWK execution.

func (*RuntimeError) Error

func (e *RuntimeError) Error() string

Directories

Path Synopsis
cmd
uawk command
uawk - Ultra AWK interpreter
uawk - Ultra AWK interpreter
internal
ast
Package ast defines the abstract syntax tree for AWK programs.
Package ast defines the abstract syntax tree for AWK programs.
compiler
Package compiler compiles an AST into bytecode for the VM.
Package compiler compiles an AST into bytecode for the VM.
lexer
Package lexer provides AWK source code tokenization.
Package lexer provides AWK source code tokenization.
parser
Package parser provides an AWK recursive descent parser.
Package parser provides an AWK recursive descent parser.
runtime
Package runtime provides AWK runtime support including regex operations.
Package runtime provides AWK runtime support including regex operations.
semantic
Package semantic provides semantic analysis for AWK programs.
Package semantic provides semantic analysis for AWK programs.
token
Package token defines lexical tokens for AWK.
Package token defines lexical tokens for AWK.
types
Package types defines runtime value types for uawk.
Package types defines runtime value types for uawk.
vm
Package vm provides the AWK virtual machine implementation.
Package vm provides the AWK virtual machine implementation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL