Regex Match, Split and Matches should support RegexOptions.AnyNewLine as (?=\r\z|\n\z|\r\n\z|\z)

From [MSDN](https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-options#Multiline):

> By default, $ matches only the end of the input string. If you specify the RegexOptions.Multiline option, it matches either the newline character (\n) or the end of the input string. It does not, however, match the carriage return/line feed character combination. To successfully match them, use the subexpression \r?$ instead of just $.

This is by far one of the biggest gotchas with using .NET Regex class.

I suggest adding a RegexOptions.AnyNewLine which treats $ as matching _both_ Windows' Environment.NewLine and UNIX' Environment.NewLine, regardless of the Environment running corefx.

*Portability concerns*
According to Wikipedia, there are a ton of different operating systems, all with different line ending settings. The current implementation hardcodes Unix line-ending style. RegexOptions.AnyNewLine, defined as (?=[\r\n]|\z), would add support for Windows line-ending style.

The advise written in the current docs is actually not portable on Unix, which is becoming a more popular option. As it is suggested, \r?$ will capture one or two lines on Unix, and one on Windows. If you try running Windows assemblies with this hack on Linux, you will change the semantics of programs.

*Backward compatibility concerns*
Fully backward compatible: This RegexOptions enum extension would not be a default, and so it would not break any clients with reasonably written code. The only existing code that might display different behavior would be reflection code that sets every option on RegexOptions enum variable. I really can't envision anyone doing this on purpose.

Here is Petr Onderka (@svick)'s summary: 

OS | Line-ending style | Current | Environment.NewLine | AnyNewLine
-- | -- | -- | -- | --
Windows | Windows | ✗ | ✓ | ✓
Windows | Unix | ✓ | ✗ | ✓
Unix | Windows | ✗ | ✗ | ✓
Unix | Unix | ✓ | ✓ | ✓

## Api Proposal
edit by @ViktorHofer

```diff
namespace System.Text.RegularExpressions
{
    [Flags]
    public enum RegexOptions
    {
        None                    = 0x0000,
        IgnoreCase              = 0x0001, // "i"
        Multiline               = 0x0002, // "m"
        ExplicitCapture         = 0x0004, // "n"
        Compiled                = 0x0008, // "c"
        Singleline              = 0x0010, // "s"
        IgnorePatternWhitespace = 0x0020, // "x"
        RightToLeft             = 0x0040, // "r"

#if DEBUG
        Debug                   = 0x0080, // "d"
#endif

        ECMAScript              = 0x0100, // "e"
        CultureInvariant        = 0x0200,
+       AnyNewLine              = 0x0400 // Treat "$" as (?=[\r\n]|\z)
    }
}
```

## [API Review Notes](https://github.com/dotnet/corefx/issues/28410#issuecomment-392863409)

[Video](https://www.youtube.com/watch?v=ZHKLi8qWTCs&t=0h0m0s)

Looks good. A few comments:

* ~~We cannot use the proposed value of 128 because it's already taken (see #if DBG in code)~~ Spec updated so that `AnyNewLine = 0x400` (1024).
* ~~The table looks wrong (Windows on Windows on the Current should work IMHO)~~ The table is correct. The fact this trips up experts just speaks to why this is a profound GOTCHA in the Core SDK.
* ~~May be AcceptAllLineEndings?~~ Some hallway testing I've done indicates AnyNewLine is a good name.  Plus, (argument after the final name was chosen) this enumeration will be transliterated into a checkbox on Regular Expression Visualization tools like Regex Hero, so it is preferable to have a concise explanation for the feature to avoid excessive screen space.

## PR Review Notes
After work had started on the approved proposal, @danmosemsft [asked if the scope of this feature should be changed](https://github.com/dotnet/corefx/pull/41195#issuecomment-538600562) to also adjust the meaning `\Z` anchor.  @jzabroski suggested writing how the end user documentation will look after this change, as good docs will determine if it is a function step improvement in usability and reducing gotchas.

Also, during the PR, it seems @shishirchawla also proposed AnyEndZ as a way to use AnyNewLine as an "Anchor Modifier", which will alter the meaning of '\Z' anchor **in addition to altering the meaning of '$' anchor**.  The intent of this improvement appears to be to remove all platform-specific language from the [Anchors documentation](https://docs.microsoft.com/en-us/dotnet/standard/base-types/anchors-in-regular-expressions), which seems like a great improvement.

# AnyNewLine as Anchor Modifier to \Z and $ Anchors

| flags | $ is treated as | $ documentation | \Z is treated as | \Z documentation |
| ----- | --------------- | ------------------- | ----------------- | -------------------- |
| neither | `(?=\n\z\|\z)` | The match must occur at the end of the string or before `\n` at the end of the string. | (Same as `$` with this option.) | (Same as `$` with this option.) |
| `RegexOptions.Multiline` | `(?=\n\|\n\z\|\z)` | The match must occur at the end of the string or before `\n` anywhere in the string. | `(?=\n\z\|\z)` | The match must occur at the end of the string or before `\n` at the end of the string. |
| `RegexOptions.Multiline \| RegexOptions.AnyNewLine` | `(?=\r\n\|\r\|\n\|\r\n\z\|\r\z\|\n\z\|\z)` | The match must occur at the end of the string or before `\r\n`, `\n` or `\r` anywhere in the string. | `(?=\r\n\z\|\r\z\|\n\z\|\z)` | The match must occur at the end of the string or before `\r\n`, `\n` or `\r` at the end of the string. |
| `RegexOptions.AnyNewLine` | `(?=\r\n\z\|\r\z\|\n\z\|\z)` | The match must occur at the end of the string or before `\r\n`, `\n` or `\r` at the end of the string. | (Same as `$` with this option.) | (Same as `$` with this option.) |


flags	$ is treated as	$ documentation	\Z is treated as	\Z documentation
neither	`(?=\n\z\|\z)`	The match must occur at the end of the string or before `\n` at the end of the string.	(Same as `$` with this option.)	(Same as `$` with this option.)
`RegexOptions.Multiline`	`(?=\n\|\n\z\|\z)`	The match must occur at the end of the string or before `\n` anywhere in the string.	`(?=\n\z\|\z)`	The match must occur at the end of the string or before `\n` at the end of the string.
`RegexOptions.Multiline \| RegexOptions.AnyNewLine`	`(?=\r\n\|\r\|\n\|\r\n\z\|\r\z\|\n\z\|\z)`	The match must occur at the end of the string or before `\r\n`, `\n` or `\r` anywhere in the string.	`(?=\r\n\z\|\r\z\|\n\z\|\z)`	The match must occur at the end of the string or before `\r\n`, `\n` or `\r` at the end of the string.
`RegexOptions.AnyNewLine`	`(?=\r\n\z\|\r\z\|\n\z\|\z)`	The match must occur at the end of the string or before `\r\n`, `\n` or `\r` at the end of the string.	(Same as `$` with this option.)	(Same as `$` with this option.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regex Match, Split and Matches should support RegexOptions.AnyNewLine as (?=\r\z|\n\z|\r\n\z|\z) #25598

Api Proposal

API Review Notes

PR Review Notes

AnyNewLine as Anchor Modifier to \Z and $ Anchors

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OS	Line-ending style	Current	Environment.NewLine	AnyNewLine
Windows	Windows	✗	✓	✓
Windows	Unix	✓	✗	✓
Unix	Windows	✗	✗	✓
Unix	Unix	✓	✓	✓

Regex Match, Split and Matches should support RegexOptions.AnyNewLine as (?=\r\z|\n\z|\r\n\z|\z) #25598

Description

Api Proposal

API Review Notes

PR Review Notes

AnyNewLine as Anchor Modifier to \Z and $ Anchors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions