tokendiff

package module

v0.1.1 Latest Latest Go to latest Published: Dec 29, 2025 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dacharyc/tokendiff

Links

Open Source Insights

README ¶

tokendiff

A Go library and CLI for token-level diffing with delimiter support.

tokendiff uses a histogram diff algorithm that groups semantically related changes together, producing more readable output than traditional Myers-based approaches for complex structural changes.

Motivation

Traditional diff tools operate at the line level. Word-based tools like wdiff improve on this but can produce suboptimal results when comparing code. For example, when comparing:

void someFunction(SomeType var)
void someFunction(SomeOtherType var)

wdiff reports that someFunction(SomeType changed to someFunction(SomeOtherType - grouping the function name with the parameter type.

tokendiff treats delimiter characters like ( as separate tokens, correctly identifying that only SomeType changed to SomeOtherType.

Algorithm

This library uses the histogram diff algorithm via diffx. The histogram algorithm is a variant of the patience diff algorithm that performs well on real-world text by:

Finding unique tokens that appear exactly once in each input (strong anchors)
Using frequency analysis to avoid matching common tokens that would create confusing output
Recursively diffing the regions between anchors

This approach produces output that groups semantically related changes together, making diffs easier to read than traditional Myers-based algorithms when comparing files with significant structural changes.

Installation

Library

go get github.com/dacharyc/tokendiff

CLI Tool

go install github.com/dacharyc/tokendiff/cmd/tokendiff@latest

CLI Usage

tokendiff [options] file1 file2
tokendiff [options] -stdin file2

Options

Input/Output:

Flag	Description
`-d "..."`	Custom delimiter characters
`-P, --punctuation`	Use Unicode punctuation as delimiters
`-W, --white-space "..."`	Custom whitespace characters
`--line-mode`	Compare files line by line
`-C N`	Show N lines of context (implies --line-mode)
`-L N, --line-numbers N`	Show line numbers with width N (0 for auto)
`-stdin`	Read first input from stdin
`--diff-input`	Read unified diff from stdin and apply token-level diff

Output Formatting:

Flag	Description
`-w "..."`	String to mark start of deleted text (default: `[-`)
`-x "..."`	String to mark end of deleted text (default: `-]`)
`-y "..."`	String to mark start of inserted text (default: `{+`)
`-z "..."`	String to mark end of inserted text (default: `+}`)
`-c, --color SPEC`	Set colors (format: `del_fg[:bg],ins_fg[:bg]`, or `list`)
`--no-color`	Disable colored output
`-l, --less-mode`	Use overstrike for `less -r` viewing
`-p, --printer`	Use overstrike for printing
`-R, --repeat-markers`	Repeat markers at line boundaries
`-a, --aggregate-changes`	Combine adjacent insertions/deletions

Output Suppression:

Flag	Description
`-1`	Suppress deleted words
`-2`	Suppress inserted words
`-3`	Suppress common words

Comparison:

Flag	Description
`-i, --ignore-case`	Case-insensitive comparison
`-m N, --match-context N`	Minimum matching words between changes

Other:

Flag	Description
`-s, --statistics`	Print diff statistics
`--profile NAME`	Use settings from `~/.tokendiffrc.<NAME>`
`-v, --version`	Show version
`-h`	Show help

The CLI respects the NO_COLOR environment variable.

Configuration Files

tokendiff supports configuration files to set default options:

~/.tokendiffrc - Default configuration (loaded automatically)
~/.config/tokendiff/config - XDG-compliant location (fallback)
~/.tokendiffrc.<profile> - Named profile (use with --profile)

Config file format:

# Comment
option-name
option-name=value

Example ~/.tokendiffrc.html:

# HTML output profile
start-delete=<del>
stop-delete=</del>
start-insert=<ins>
stop-insert=</ins>
no-color

Usage:

tokendiff --profile=html old.txt new.txt

Command-line options override configuration file settings.

Exit Codes

Code	Meaning
0	Files are identical
1	Files differ
2	Error occurred

Examples

# Compare two files
tokendiff old.txt new.txt

# Line-by-line with context
tokendiff --line-mode -C 3 old.go new.go

# Compare git versions
git show HEAD~1:file.go | tokendiff -stdin file.go

# Custom delimiters
tokendiff -d "(){}[]" file1.txt file2.txt

# Case-insensitive comparison with statistics
tokendiff -i -s old.txt new.txt

# HTML-style markers
tokendiff -w '<del>' -x '</del>' -y '<ins>' -z '</ins>' old.txt new.txt

# View in less with overstrike highlighting
tokendiff -l old.txt new.txt | less -r

# Apply token-level diff to a unified diff
git diff | tokendiff --diff-input
diff -u old.txt new.txt | tokendiff --diff-input

Library Usage

Basic Usage

package main

import (
    "fmt"
    "github.com/dacharyc/tokendiff"
)

func main() {
    old := "void someFunction(SomeType var)"
    new := "void someFunction(SomeOtherType var)"

    diffs := tokendiff.DiffStrings(old, new, tokendiff.DefaultOptions())
    fmt.Println(tokendiff.FormatDiff(diffs))
    // Output: void someFunction([-SomeType-]{+SomeOtherType+} var)
}

Working with Tokens

// Tokenize text with delimiter awareness
tokens := tokendiff.Tokenize("foo(bar, baz)", tokendiff.DefaultOptions())
// tokens = ["foo", "(", "bar", ",", "baz", ")"]

// Diff pre-tokenized content
diffs := tokendiff.DiffTokens(tokens1, tokens2)

Custom Delimiters

opts := tokendiff.Options{
    Delimiters: "|:-",  // Custom delimiter set
}
diffs := tokendiff.DiffStrings(text1, text2, opts)

Preserving Whitespace

opts := tokendiff.Options{
    Delimiters:         tokendiff.DefaultDelimiters,
    PreserveWhitespace: true,  // Include whitespace as tokens
}

API

Types

type Operation int
const (
    Equal  Operation = iota  // Token unchanged
    Insert                   // Token was added
    Delete                   // Token was removed
)

type Diff struct {
    Type  Operation
    Token string
}

type Options struct {
    Delimiters         string  // Characters to treat as separate tokens
    Whitespace         string  // Characters to treat as whitespace
    UsePunctuation     bool    // Use Unicode punctuation as delimiters
    PreserveWhitespace bool    // Include whitespace as tokens
    IgnoreCase         bool    // Case-insensitive comparison
}

type FormatOptions struct {
    StartDelete string  // Marker for start of deleted text (default: "[-")
    StopDelete  string  // Marker for end of deleted text (default: "-]")
    StartInsert string  // Marker for start of inserted text (default: "{+")
    StopInsert  string  // Marker for end of inserted text (default: "+}")
    NoDeleted   bool    // Suppress deleted tokens
    NoInserted  bool    // Suppress inserted tokens
    NoCommon    bool    // Suppress unchanged tokens
}

Functions

Tokenizing and Diffing:

Tokenize(text string, opts Options) []string - Split text into tokens
DiffTokens(tokens1, tokens2 []string) []Diff - Diff two token slices
DiffStrings(text1, text2 string, opts Options) []Diff - Tokenize and diff two strings
DefaultOptions() Options - Get default options

Diff Transformations:

AggregateDiffs(diffs []Diff) []Diff - Combine adjacent same-type operations
ApplyMatchContext(diffs []Diff, minContext int) []Diff - Require minimum matching words between changes

Formatting:

FormatDiff(diffs []Diff) string - Format diff with default markers
FormatDiffWithOptions(diffs []Diff, opts FormatOptions) string - Format with custom markers
DefaultFormatOptions() FormatOptions - Get default format options
HasChanges(diffs []Diff) bool - Check if diff contains any changes
NeedsSpaceBefore(token string) bool - Check if space should precede token
NeedsSpaceAfter(token string) bool - Check if space should follow token

Unified Diff Parsing:

ParseUnifiedDiff(input string) ([]UnifiedDiff, error) - Parse unified diff format
ApplyWordDiff(hunk DiffHunk, opts Options) []Diff - Apply token-level diff to a hunk

Default Delimiters

(){}[]<>,.;:!?"'`@#$%^&*+-=/\|~

Performance

Benchmarks on Apple M1:

BenchmarkTokenize      ~2.5 µs/op
BenchmarkDiffStrings   ~10.5 µs/op

License

MIT

Documentation ¶

Overview ¶

Package tokendiff provides token-level diffing with delimiter support.

Unlike traditional line-based diff tools, tokendiff operates at the token level and treats configurable delimiter characters as separate tokens. This allows for more precise diffs when comparing code or structured text.

For example, when comparing:

someFunction(SomeType var)
someFunction(SomeOtherType var)

A line-based diff would show the entire line changed. A word-based diff without delimiter awareness might show "someFunction(SomeType" changed to "someFunction(SomeOtherType". But tokendiff correctly identifies that only "SomeType" changed to "SomeOtherType" because it treats "(" as a delimiter.

This package uses github.com/dacharyc/diffx to provide this functionality.

Index ¶

Constants
Variables
func ColorCode(fg, bg string, bold bool) (string, error)
func ColorNames() []string
func ComputeTokenSimilarity(text1, text2 string, opts Options) float64
func DiscardConfusingTokens(tokens1, tokens2 []string) (filtered1, filtered2 []string, map1, map2 []int)
func FormatDiff(diffs []Diff) string
func FormatDiffResultAdvanced(result DiffResult, opts FormatOptions) string
func FormatDiffWithOptions(diffs []Diff, opts FormatOptions) string
func FormatDiffsAdvanced(diffs []Diff, opts FormatOptions) string
func HasChanges(diffs []Diff) bool
func NeedsSpaceAfter(token string) bool
func NeedsSpaceBefore(token string) bool
func OverstrikeBold(text string) string
func OverstrikeUnderline(text string) string
func ParseColor(spec string) (string, error)
func ParseColorSpec(spec string) (deleteColor, insertColor string, err error)
func ProcessUnifiedDiff(input io.Reader, output io.Writer, opts Options, fmtOpts FormatOptions) error
func Tokenize(text string, opts Options) []string
type Diff
- func AggregateDiffs(diffs []Diff) []Diff
- func ApplyMatchContext(diffs []Diff, minContext int) []Diff
- func ApplyWordDiff(hunk DiffHunk, opts Options) []Diff
- func DiffStrings(text1, text2 string, opts Options) []Diff
- func DiffStringsWithPreprocessing(text1, text2 string, opts Options) []Diff
- func DiffTokens(tokens1, tokens2 []string) []Diff
- func DiffTokensRaw(tokens1, tokens2 []string) []Diff
- func DiffTokensWithPreprocessing(tokens1, tokens2 []string) []Diff
- func EliminateStopwordAnchors(diffs []Diff) []Diff
- func InterleaveDiffs(diffs []Diff) []Diff
- func ShiftBoundaries(diffs []Diff) []Diff
type DiffHunk
type DiffResult
- func DiffStringsWithPositions(text1, text2 string, opts Options) DiffResult
- func DiffStringsWithPositionsAndPreprocessing(text1, text2 string, opts Options) DiffResult
type DiffStatistics
- func ComputeStatistics(text1, text2 string, diffs []Diff, opts Options) DiffStatistics
type FormatOptions
- func DefaultFormatOptions() FormatOptions
type LineDiffOutput
- func DiffLineByLine(text1, text2 string, opts Options, fmtOpts FormatOptions, algorithm string, ...) LineDiffOutput
type LineDiffResult
- func FilterWithContext(lines []LineDiffResult, contextLines int) []LineDiffResult
type LinePairing
- func FindPositionalPairings(deletes, inserts []string) []LinePairing
- func FindSimilarityPairings(deletes, inserts []string, opts Options, threshold float64) []LinePairing
type Operation
- func (o Operation) String() string
type Options
- func DefaultOptions() Options
type TokenPos
- func TokenizeWithPositions(text string, opts Options) ([]string, []TokenPos)
type UnifiedDiff
- func ParseUnifiedDiff(input string) ([]UnifiedDiff, error)
type WholeFileDiffResult
- func DiffWholeFiles(text1, text2 string, opts Options, fmtOpts FormatOptions) WholeFileDiffResult

Constants ¶

View Source

const (
	ANSIReset       = "\033[0m"
	ANSIClearEOL    = "\033[K"
	ANSIDeleteColor = "\033[0;31;1m" // bold red
	ANSIInsertColor = "\033[0;32;1m" // bold green
	ANSIBold        = "\033[1m"
)

ANSI escape code constants

View Source

const DefaultDelimiters = ""

DefaultDelimiters contains the default set of delimiter characters. These are characters that are treated as separate tokens even when not surrounded by whitespace. NOTE: Original dwdiff has NO default delimiters (empty string). Words are split only on whitespace unless -d or -P is specified.

View Source

const DefaultWhitespace = " \t\n\r"

DefaultWhitespace contains the default set of whitespace characters.

Variables ¶

View Source

var BackgroundColors = map[string]string{
	"black":         "\033[40m",
	"red":           "\033[41m",
	"green":         "\033[42m",
	"yellow":        "\033[43m",
	"blue":          "\033[44m",
	"magenta":       "\033[45m",
	"cyan":          "\033[46m",
	"white":         "\033[47m",
	"brightblack":   "\033[100m",
	"brightred":     "\033[101m",
	"brightgreen":   "\033[102m",
	"brightyellow":  "\033[103m",
	"brightblue":    "\033[104m",
	"brightmagenta": "\033[105m",
	"brightcyan":    "\033[106m",
	"brightwhite":   "\033[107m",
}

BackgroundColors maps color names to ANSI background escape codes.

View Source

var ForegroundColors = map[string]string{
	"black":         "\033[30m",
	"red":           "\033[31m",
	"green":         "\033[32m",
	"yellow":        "\033[33m",
	"blue":          "\033[34m",
	"magenta":       "\033[35m",
	"cyan":          "\033[36m",
	"white":         "\033[37m",
	"brightblack":   "\033[90m",
	"brightred":     "\033[91m",
	"brightgreen":   "\033[92m",
	"brightyellow":  "\033[93m",
	"brightblue":    "\033[94m",
	"brightmagenta": "\033[95m",
	"brightcyan":    "\033[96m",
	"brightwhite":   "\033[97m",
}

ForegroundColors maps color names to ANSI foreground escape codes.

Functions ¶

func ColorCode ¶

func ColorCode(fg, bg string, bold bool) (string, error)

ColorCode builds an ANSI escape sequence from component parts. fg is the foreground color name (or empty for default). bg is the background color name (or empty for none). bold adds the bold attribute if true. Returns an error if any color name is not recognized.

func ColorNames ¶

func ColorNames() []string

ColorNames returns a list of all available color names.

func ComputeTokenSimilarity ¶

func ComputeTokenSimilarity(text1, text2 string, opts Options) float64

ComputeTokenSimilarity calculates similarity between two strings based on shared tokens. Returns a value between 0.0 (no similarity) and 1.0 (identical). Similarity is computed as the ratio of Equal tokens to total diff operations.

func DiscardConfusingTokens ¶

func DiscardConfusingTokens(tokens1, tokens2 []string) (filtered1, filtered2 []string, map1, map2 []int)

DiscardConfusingTokens filters tokens that appear too frequently, which would create many spurious match points during diff calculation. Returns filtered token slices and index maps back to original positions.

Algorithm (inspired by GNU diff's discard_confusing_lines): 1. Count occurrences of each token in the OTHER file (equiv_count) 2. Mark tokens:

equiv_count == 0: definitely discard (can't match anything)
equiv_count > √n: provisionally discard (too many potential matches)
else: keep for matching

3. Apply provisional discard rules:

Provisional tokens are kept only if they form runs with non-provisional endpoints AND at least 25% of the run is provisional

Note: Filtered tokens are still included in the final diff output - they're just excluded from the LCS matching to prevent spurious anchoring.

func FormatDiff ¶

func FormatDiff(diffs []Diff) string

FormatDiff returns a human-readable representation of the diff. Deleted tokens are wrapped in [-...-] and inserted tokens in {+...+}.

func FormatDiffResultAdvanced ¶

func FormatDiffResultAdvanced(result DiffResult, opts FormatOptions) string

FormatDiffResultAdvanced formats a DiffResult preserving original spacing for Equal content. This uses position information to extract original text for Equal runs instead of reconstructing from tokens, which loses whitespace information. When opts.ShowLineNumbers is true, it tracks and displays line numbers based on SOURCE positions in the original texts.

func FormatDiffWithOptions ¶

func FormatDiffWithOptions(diffs []Diff, opts FormatOptions) string

FormatDiffWithOptions returns a formatted representation of the diff using the specified formatting options.

func FormatDiffsAdvanced ¶

func FormatDiffsAdvanced(diffs []Diff, opts FormatOptions) string

FormatDiffsAdvanced formats diffs with comprehensive options including colors, line numbers, overstrike modes, and marker repetition. This is a more feature-rich alternative to FormatDiffWithOptions.

func HasChanges ¶

func HasChanges(diffs []Diff) bool

HasChanges returns true if the diff slice contains any non-Equal operations.

func NeedsSpaceAfter ¶

func NeedsSpaceAfter(token string) bool

NeedsSpaceAfter returns true if a space should follow this token when formatting diff output. Used internally by FormatDiff.

func NeedsSpaceBefore ¶

func NeedsSpaceBefore(token string) bool

NeedsSpaceBefore returns true if a space should precede this token when formatting diff output. Used internally by FormatDiff.

func OverstrikeBold ¶

func OverstrikeBold(text string) string

OverstrikeBold returns text with overstrike bold (char\bchar for each char). This is used for printer mode to highlight inserted text.

func OverstrikeUnderline ¶

func OverstrikeUnderline(text string) string

OverstrikeUnderline returns text with overstrike underlining (_\bchar for each char). This is used for less -r mode to highlight deleted text.

func ParseColor ¶

func ParseColor(spec string) (string, error)

ParseColor parses a color specification and returns the ANSI escape sequence. The spec can be:

A single color name: "red" -> foreground red
Foreground:background: "red:white" -> red text on white background
Empty string returns empty string (no color)

Returns an error if the color name is not recognized.

func ParseColorSpec ¶

func ParseColorSpec(spec string) (deleteColor, insertColor string, err error)

ParseColorSpec parses a color specification for diff output. The format is: "delete_color,insert_color" where each color can be "fg" or "fg:bg" (e.g., "red,green" or "red:white,green:black").

If only one color is specified, it's used for deletions and the default insert color (bold green) is used for insertions.

Returns the ANSI escape sequences for delete and insert colors.

func ProcessUnifiedDiff ¶

func ProcessUnifiedDiff(input io.Reader, output io.Writer, opts Options, fmtOpts FormatOptions) error

ProcessUnifiedDiff reads a unified diff from input and applies word-level diffing to each hunk. The result is written to output with diff headers preserved and hunk content replaced with word-level diff output.

func Tokenize ¶

func Tokenize(text string, opts Options) []string

Tokenize splits text into tokens, treating delimiters as separate tokens. Whitespace separates words but is not included in output unless PreserveWhitespace is true.

Types ¶

type Diff ¶

type Diff struct {
	Type  Operation
	Token string
}

Diff represents a single diff operation on a token.

func AggregateDiffs ¶

func AggregateDiffs(diffs []Diff) []Diff

AggregateDiffs combines adjacent diffs of the same type into single tokens. For example, consecutive Delete operations are merged into one Delete with tokens joined appropriately (spaces between words, no spaces between punctuation/delimiters).

func ApplyMatchContext ¶

func ApplyMatchContext(diffs []Diff, minContext int) []Diff

ApplyMatchContext processes diffs to require minimum context between changes. Equal tokens that appear between changes with fewer than minContext matches are converted to both Delete and Insert operations. This reduces noise from coincidental short matches between larger changes.

If minContext is 0 or negative, the diffs are returned unchanged.

func ApplyWordDiff ¶

func ApplyWordDiff(hunk DiffHunk, opts Options) []Diff

ApplyWordDiff applies word-level diffing to a unified diff hunk. It returns the word-level diff result for the changed lines.

func DiffStrings ¶

func DiffStrings(text1, text2 string, opts Options) []Diff

DiffStrings tokenizes both strings and computes their diff.

func DiffStringsWithPreprocessing ¶

func DiffStringsWithPreprocessing(text1, text2 string, opts Options) []Diff

DiffStringsWithPreprocessing tokenizes both strings and computes their diff using histogram-based preprocessing that filters confusing tokens.

func DiffTokens ¶

func DiffTokens(tokens1, tokens2 []string) []Diff

DiffTokens computes the diff between two token slices. It uses the Myers diff algorithm via diffx.

func DiffTokensRaw ¶

func DiffTokensRaw(tokens1, tokens2 []string) []Diff

DiffTokensRaw computes the diff without semantic cleanup. Use this when you need the raw Myers diff output.

func DiffTokensWithPreprocessing ¶

func DiffTokensWithPreprocessing(tokens1, tokens2 []string) []Diff

DiffTokensWithPreprocessing computes the diff using histogram-style preprocessing. This uses diffx's histogram diff algorithm which: 1. Filters stopwords (common words like "the", "for", "in") from anchor selection 2. Uses low-frequency tokens as anchors for divide-and-conquer 3. Produces cleaner output without spurious matches on common words

This produces readable output that groups semantically related changes together.

func EliminateStopwordAnchors ¶

func EliminateStopwordAnchors(diffs []Diff) []Diff

EliminateStopwordAnchors converts stopword Equal tokens to Delete+Insert when they appear as single tokens sandwiched between changes. Unlike ApplyMatchContext, this only affects specific stopwords, preserving meaningful single-token Equals like "support", "config", etc.

The stopword is added to both the preceding Delete run and the following Insert run, so they merge together during formatting instead of appearing as separate `[---] {+-+}` markers.

func InterleaveDiffs ¶

func InterleaveDiffs(diffs []Diff) []Diff

InterleaveDiffs reorders diffs so that Delete/Insert pairs are interleaved. When there's a sequence of Deletes followed by Inserts, this function pairs them positionally: Delete[0] Insert[0] Delete[1] Insert[1], etc. Excess Deletes or Inserts (if the counts don't match) are output at the end.

func ShiftBoundaries ¶

func ShiftBoundaries(diffs []Diff) []Diff

ShiftBoundaries adjusts diff boundaries to create cleaner output. When a deleted token matches an adjacent equal token, shift the boundary.

This is a standard diff post-processing step (similar to GNU diff's shift_boundaries).

Patterns detected and shifted:

EQUAL[...x] DELETE[x] INSERT[y] → EQUAL[...x] INSERT[y] (shift delete into equal)
DELETE[x] INSERT[x...] EQUAL[...] → INSERT[...] EQUAL[x...] (shift common prefix)

type DiffHunk ¶

type DiffHunk struct {
	// OldStart is the starting line number in the old file.
	OldStart int
	// OldCount is the number of lines from the old file.
	OldCount int
	// NewStart is the starting line number in the new file.
	NewStart int
	// NewCount is the number of lines in the new file.
	NewCount int
	// OldLines contains the removed lines (without the leading "-").
	OldLines []string
	// NewLines contains the added lines (without the leading "+").
	NewLines []string
	// ContextBefore contains context lines before the change.
	ContextBefore []string
	// ContextAfter contains context lines after the change.
	ContextAfter []string
}

DiffHunk represents a single hunk from a unified diff.

type DiffResult ¶

type DiffResult struct {
	Diffs      []Diff
	Text1      string     // original old text
	Text2      string     // original new text
	Positions1 []TokenPos // token positions in text1
	Positions2 []TokenPos // token positions in text2
}

DiffResult contains diff output along with position information needed to reconstruct original spacing for Equal content.

func DiffStringsWithPositions ¶

func DiffStringsWithPositions(text1, text2 string, opts Options) DiffResult

DiffStringsWithPositions tokenizes and diffs strings, returning position info. This allows formatters to preserve original spacing for Equal content.

func DiffStringsWithPositionsAndPreprocessing ¶

func DiffStringsWithPositionsAndPreprocessing(text1, text2 string, opts Options) DiffResult

DiffStringsWithPositionsAndPreprocessing tokenizes and diffs strings using histogram-based preprocessing, returning position info for formatting. This allows formatters to preserve original spacing for Equal content.

type DiffStatistics ¶

type DiffStatistics struct {
	OldWords      int // total words in old text
	NewWords      int // total words in new text
	DeletedWords  int // words deleted (present in old but not new)
	InsertedWords int // words inserted (present in new but not old)
	CommonWords   int // words common to both texts
}

DiffStatistics holds statistics about a diff operation.

func ComputeStatistics ¶

func ComputeStatistics(text1, text2 string, diffs []Diff, opts Options) DiffStatistics

ComputeStatistics calculates statistics for a diff.

type FormatOptions ¶

type FormatOptions struct {
	// StartDelete is the string to mark the beginning of deleted text.
	// Default: "[-"
	StartDelete string

	// StopDelete is the string to mark the end of deleted text.
	// Default: "-]"
	StopDelete string

	// StartInsert is the string to mark the beginning of inserted text.
	// Default: "{+"
	StartInsert string

	// StopInsert is the string to mark the end of inserted text.
	// Default: "+}"
	StopInsert string

	// NoDeleted, when true, suppresses deleted tokens from output.
	NoDeleted bool

	// NoInserted, when true, suppresses inserted tokens from output.
	NoInserted bool

	// NoCommon, when true, suppresses unchanged tokens from output.
	NoCommon bool

	// UseColor enables ANSI color output. When true, DeleteColor and InsertColor
	// are used instead of text markers.
	UseColor bool

	// DeleteColor is the ANSI escape sequence for deleted text color.
	// Example: "\033[31m" for red
	DeleteColor string

	// InsertColor is the ANSI escape sequence for inserted text color.
	// Example: "\033[32m" for green
	InsertColor string

	// ColorReset is the ANSI escape sequence to reset colors.
	// Default: "\033[0m"
	ColorReset string

	// ClearToEOL is the ANSI escape sequence to clear to end of line.
	// Default: "\033[K"
	ClearToEOL string

	// RepeatMarkers, when true, repeats markers at line boundaries for
	// multi-line changes.
	RepeatMarkers bool

	// AggregateChanges, when true, combines adjacent changes of the same type.
	AggregateChanges bool

	// LessMode uses overstrike underlining for deleted text (for less -r).
	LessMode bool

	// PrinterMode uses overstrike bold for inserted text (for printing).
	PrinterMode bool

	// MatchContext is the minimum number of matching words between changes.
	// Equal tokens sandwiched between changes with fewer than this many
	// matches are converted to Delete+Insert pairs. 0 disables this feature.
	MatchContext int

	// ShowLineNumbers enables dual line number display (old:new format).
	ShowLineNumbers bool

	// LineNumWidth is the minimum width for line numbers. 0 means auto-calculate.
	LineNumWidth int

	// HeuristicSpacing uses NeedsSpaceBefore/After heuristics for spacing
	// when PreserveWhitespace is false. When true, spaces are not tokens and
	// spacing is determined heuristically.
	HeuristicSpacing bool
}

FormatOptions configures diff output formatting.

func DefaultFormatOptions ¶

func DefaultFormatOptions() FormatOptions

DefaultFormatOptions returns FormatOptions with default settings.

type LineDiffOutput ¶

type LineDiffOutput struct {
	Lines      []LineDiffResult // individual line results
	HasChanges bool             // true if there are any differences
	Statistics DiffStatistics   // aggregate statistics
}

LineDiffOutput holds the results of a line-by-line diff operation.

func DiffLineByLine ¶

func DiffLineByLine(text1, text2 string, opts Options, fmtOpts FormatOptions, algorithm string, threshold float64) LineDiffOutput

DiffLineByLine compares files line by line with proper line-level diff tracking. This correctly tracks dual line numbers: - For equal lines: both old and new line numbers increment - For deleted lines: only old line number increments - For inserted lines: only new line number increments

The algorithm parameter controls how deleted and inserted lines are paired: - "best": similarity-based matching (pairs lines with highest token overlap) - "normal" or "fast": positional matching (pairs lines by position)

type LineDiffResult ¶

type LineDiffResult struct {
	OldLineNum int    // line number in old file
	NewLineNum int    // line number in new file
	HasChanges bool   // true if this line contains changes
	Output     string // formatted output for this line
}

LineDiffResult holds diff results for a single line in line-by-line mode.

func FilterWithContext ¶

func FilterWithContext(lines []LineDiffResult, contextLines int) []LineDiffResult

FilterWithContext returns only the lines that are changes or within contextLines of a change.

type LinePairing ¶

type LinePairing struct {
	DeleteIndex int     // index in deletes slice
	InsertIndex int     // index in inserts slice
	Similarity  float64 // similarity score (0.0-1.0)
}

LinePairing represents a pairing between a deleted line and an inserted line.

func FindPositionalPairings ¶

func FindPositionalPairings(deletes, inserts []string) []LinePairing

FindPositionalPairings pairs deleted and inserted lines by position. Delete[0] pairs with Insert[0], Delete[1] with Insert[1], etc. Returns pairings only up to min(len(deletes), len(inserts)).

func FindSimilarityPairings ¶

func FindSimilarityPairings(deletes, inserts []string, opts Options, threshold float64) []LinePairing

FindSimilarityPairings pairs deleted and inserted lines by content similarity. Uses a greedy algorithm: for each deleted line, find the most similar unmatched inserted line. Lines with similarity below threshold are left unpaired.

type Operation ¶

type Operation int

Operation represents a diff operation type.

const (
	// Equal indicates the token is unchanged.
	Equal Operation = iota
	// Insert indicates the token was added.
	Insert
	// Delete indicates the token was removed.
	Delete
)

func (Operation) String ¶

func (o Operation) String() string

String returns a human-readable representation of the operation.

type Options ¶

type Options struct {
	// Delimiters is the set of characters to treat as separate tokens.
	// If empty, DefaultDelimiters is used.
	// This is ignored if UsePunctuation is true.
	Delimiters string

	// Whitespace is the set of characters to treat as whitespace (word separators).
	// If empty, DefaultWhitespace is used.
	Whitespace string

	// UsePunctuation, when true, uses Unicode punctuation characters as
	// delimiters instead of the Delimiters string. This matches dwdiff's
	// -P/--punctuation flag behavior.
	UsePunctuation bool

	// PreserveWhitespace, when true, includes whitespace as separate tokens.
	// When false (default), whitespace is used only to separate words and
	// is not included in the diff output.
	PreserveWhitespace bool

	// IgnoreCase, when true, performs case-insensitive comparison.
	// The original case is preserved in the output.
	IgnoreCase bool
}

Options configures the diff behavior.

func DefaultOptions ¶

func DefaultOptions() Options

DefaultOptions returns Options with default settings.

type TokenPos ¶

type TokenPos struct {
	Start int // byte offset of token start
	End   int // byte offset of token end (exclusive)
}

TokenPos represents a token's position in the original text.

func TokenizeWithPositions ¶

func TokenizeWithPositions(text string, opts Options) ([]string, []TokenPos)

TokenizeWithPositions splits text into tokens and tracks their positions. This allows reconstructing original spacing for Equal content in diffs.

type UnifiedDiff ¶

type UnifiedDiff struct {
	// OldFile is the name of the old file (from "---" line).
	OldFile string
	// NewFile is the name of the new file (from "+++" line).
	NewFile string
	// Hunks contains all the diff hunks.
	Hunks []DiffHunk
}

UnifiedDiff represents a parsed unified diff.

func ParseUnifiedDiff ¶

func ParseUnifiedDiff(input string) ([]UnifiedDiff, error)

ParseUnifiedDiff parses a unified diff string into structured data. It handles standard unified diff format as produced by diff -u or git diff.

type WholeFileDiffResult ¶

type WholeFileDiffResult struct {
	Result     DiffResult     // the raw diff result
	Formatted  string         // formatted output
	HasChanges bool           // true if there are any differences
	Statistics DiffStatistics // statistics about the diff
}

WholeFileDiffResult holds the result of a whole-file diff operation.

func DiffWholeFiles ¶

func DiffWholeFiles(text1, text2 string, opts Options, fmtOpts FormatOptions) WholeFileDiffResult

DiffWholeFiles performs a whole-file word-level diff and returns structured results. This is the main API for comparing two complete texts.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
tokendiff command Command tokendiff performs token-level diffs with delimiter support.	Command tokendiff performs token-level diffs with delimiter support.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL