Skip to content

Case-insensitive patterns don't match lowercase when unanchored #87

@kolkov

Description

@kolkov

Bug Description

Case-insensitive patterns like (?i)HELLO fail to match lowercase text hello when unanchored.

Reproduction

package main

import (
    "fmt"
    "regexp"
    "github.com/coregx/coregex"
)

func main() {
    pattern := `(?i)HELLO`
    input := "hello"

    // stdlib - WORKS
    reStd := regexp.MustCompile(pattern)
    fmt.Println("stdlib:", reStd.FindAllString(input, -1)) // [hello]

    // coregex - FAILS
    reCg := coregex.MustCompile(pattern)
    fmt.Println("coregex:", reCg.FindAllString(input, -1)) // []
}

Root Cause

The literal extractor in literal/extractor.go:129-135 ignores the FoldCase flag from syntax.Regexp. It extracts HELLO as a literal prefix, and the prefilter searches for HELLO case-sensitively, missing hello.

Impact

All unanchored case-insensitive patterns are affected:

  • (?i)abc on ABC works (finds first match)
  • (?i)abc on ABCabc fails (misses lowercase abc)

Anchored patterns work correctly because they bypass prefilter.

Proposed Fix

Option 1: Don't extract literals when FoldCase flag is set (safe, simple)
Option 2: Expand case-insensitive literals into all variations (complex)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions