Skip to content

Incorrect Regex matching in Turkish culture when ignoring case #58958

@veanes

Description

@veanes

Description

The combination of ignoring case and using intervals that involve \u0130 (Turkish I with dot) and \u0131 (Turkish i without dot) gives wrong matching results as the repo shows.

Configuration

.NET 6.0 preview

Regression?

Seems so. At least the below code works correctly in .NET 5.0.

Other information

Expected behavior is that the following code prints True but it prints False.
The pattern below must trivially match the input because all of the letters fall in the given intervals
IgnoreCase can only add letters (not remove letters) so the match must hold.
If the IgnoreCase option is omitted the code works correctly.

using System.Text.RegularExpressions;
using System.Globalization;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "I\u0131\u0130i";
            string pattern = "[H-J][\u0131-\u0140][\u0120-\u0130][h-j]";

            var culture = CultureInfo.CurrentCulture;
            CultureInfo.CurrentCulture = new CultureInfo("tr-TR");
            Regex re = new Regex(pattern, RegexOptions.IgnoreCase);
            CultureInfo.CurrentCulture = culture;
            Console.WriteLine(re.IsMatch(input));
        }
    }
}

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions