Skip to content

Empty character class [^\S\s] matches instead of failing #88

@kolkov

Description

@kolkov

Bug Description

Empty character classes like [^\S\s] (which logically match nothing) incorrectly match empty strings instead of failing.

Reproduction

package main

import (
    "fmt"
    "regexp"
    "github.com/coregx/coregex"
)

func main() {
    pattern := `[^\S\s]`
    input := "abc"

    // stdlib - correct (no match)
    reStd := regexp.MustCompile(pattern)
    fmt.Println("stdlib:", reStd.MatchString(input)) // false

    // coregex - incorrect (matches!)
    reCg := coregex.MustCompile(pattern)
    fmt.Println("coregex:", reCg.MatchString(input)) // true
}

Root Cause

In nfa/compile.go:365-367, when compileCharClass receives an empty rune slice, it calls compileEmptyMatch():

if len(ranges) == 0 {
    return c.compileEmptyMatch()  // WRONG!
}

compileEmptyMatch() creates an epsilon transition that matches empty string. But an empty character class should never match - it's an impossible condition.

Semantics

  • [\S\s] = any character = OpAnyChar
  • [^\S\s] = NOT (any character) = empty set = matches nothing
  • Go's parser correctly sets Rune: [] for empty classes

Proposed Fix

Add compileNoMatch() function that creates an NFA fragment with no path from start to end, making it impossible to match. Use this for empty character classes instead of compileEmptyMatch().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions