Split strings by Regular Expressions in SQL Server 2025

Problem

SQL Server users have been asking for native regular expression support for over two decades. There are third-party Common Language Runtime (CLR) modules that offer this functionality, but these can be complicated to install and simply aren’t possible in some environments. I want to split a string using a regular expression instead of a static string. Will that be possible in SQL Server 2025, without CLR?

Solution

Several of the new Regex features have been described already:

But, one function that hasn’t been overly flaunted publicly is REGEXP_SPLIT_TO_TABLE. So I thought I would take a closer look.

Let’s say you have a string that could have multiple delimiters, like this:

Gretzky|Orr,Lemieux;Pastrnak,Bucyk|Esposito

Or a unioned set from different sources that each has its own delimiter, like this:

Gretzky|Orr
Lemieux,Pastrnak
Bucyk;Esposito

Using STRING_SPLIT

You can’t pass a LIKE or other wildcard expression or pattern to STRING_SPLIT – it requires a single, fixed string. You could do something like this:

SELECT value
  FROM STRING_SPLIT(TRANSLATE(input, '|;', ',,'), ',');

Yuck. This falls apart quickly when you have more possible delimiters or if you want to do anything more complex.

Using REGEXP_SPLIT_TO_TABLE

The new function lets you pass in a simpler expression with no translations or counting required:

SELECT value
  FROM REGEXP_SPLIT_TO_TABLE(input, '[|;,]');

And what if you want two or three or 17 consecutive delimiters to be ignored and treated as a single delimiter? This requires one additional character:

SELECT value
  FROM REGEXP_SPLIT_TO_TABLE(input, '[|;,]+');

String Split on Spaces, Tabs, Carriage Returns and Line Feeds

How about splitting on any white space – spaces, tabs, carriage returns, line feeds, etc.? Sure, you could construct a string that contains all the relevant characters, and transform them all to a single delimiter using TRANSLATE or REPLACE. But this is so ugly, I’m not even going to show the code.

Look how easy this is with the new function:

SELECT value
  FROM REGEXP_SPLIT_TO_TABLE(input, '\s');

And if you want consecutive white space characters to be lumped together:

SELECT value
  FROM REGEXP_SPLIT_TO_TABLE(input, '\s+');

So, for example:

SELECT value
  FROM REGEXP_SPLIT_TO_TABLE(N'space: fourspaces:    tab:	crlf:
end', '\s+');

Yields:

value
----------
space:
fourspaces:
tab:
crlf:
end

Split String on Non-alphanumeric Characters

What about splitting on any non-alphanumeric character? All the white space already mentioned, or any punctuation, or any emojis? You can do this by saying any pattern that does not match the property of being a letter or a number:

SELECT value
  FROM REGEXP_SPLIT_TO_TABLE(input, '[^\p{L}\p{N}]+');

So, for example:

SELECT value
  FROM REGEXP_SPLIT_TO_TABLE (N'hello👋world🌍 🌏tab:  end', '[^\p{L}\p{N}]+');

Yields:

value
----------
hello
world
tab
end

It’s powerful to be able to pre-emptively exclude any character outside of a defined set without even having to list out that defined set. And most of these examples – and so many others that are possible – are things I just don’t want to piece together how to accomplish with STRING_SPLIT.

A Wrapper Function

In my case, I found it useful to create a multi-purpose function that could perform either type of split. This can be useful if you have specific rules you want to apply to the output and don’t want to duplicate that business logic in multiple functions:

CREATE OR ALTER FUNCTION dbo.split
(
  @string  nvarchar(4000), 
  @pattern nvarchar(4000), 
  @type    char(6) /* regexp, legacy */
)
RETURNS TABLE
WITH SCHEMABINDING
AS
  RETURN
  (
    SELECT value, ordinal 
      FROM STRING_SPLIT(@string, @pattern, 1)
     WHERE @type = 'legacy'
     UNION ALL
    SELECT value, ordinal 
      FROM REGEXP_SPLIT_TO_TABLE(@string, @pattern)
     WHERE @type = 'regexp'
  );

The union all maybe suggests that SQL Server would waste time trying to evaluate both functions before the predicate is applied, but the execution plan shows that short-circuiting is in full effect:

Plans showing efficiency of union all.

Caveats

The new function doesn’t support lookaheads, for example, and that’s not Microsoft’s fault. The implementation is based on the ICU (International Components for Unicode) library, which doesn’t support advanced functionality like lookaheads, lookbehinds, or conditionals.

Here is what I wanted to be able to do – ignore delimiters that also happen to exist inside quoted strings, a common problem when dealing with data from external sources:

DECLARE @str nvarchar(4000), @regex nvarchar(4000);
SET @str   = N'Aaron,B,"33 Front St, Suite 4B",Whitehorse,YT,Y1A 3BH';
SET @regex = N',(?=(?:[^"]*"[^"]*")*[^"]*$)';
SELECT value FROM REGEXP_SPLIT_TO_TABLE(@str, @regex);

However, when I try that, I receive the following error:

Msg 19300, Level 16, State 2
An invalid Pattern ',(?=(?:[^"]*"[^"]*")*[^"]*$)' was provided. 
Error 'invalid perl operator: (?=' occurred during evaluation of the Pattern.

Another issue I came across is that, in spite of the documentation directly stating that LOB types are supported (up to 2MB), even data less than 2MB will yield this error:

Msg 19304, Level 16, State 5
Currently, 'REGEXP_SPLIT_TO_TABLE' function does not support NVARCHAR(max)/VARCHAR(max) inputs.

Based on the “Currently,” bit, I can only assume they’ll fix this one soon, but I don’t have any inside information on that.

Conclusion

This is a welcome addition to SQL Server, and while not perfect, I can already envision how it will help solve some interesting parsing problems I’ve encountered as recently as this year.

Next Steps

Review the following tips and other resources:

Leave a Reply

Your email address will not be published. Required fields are marked *