While REGEXP_REPLACE is for changing text, what if you just want to extract text that matches a pattern? For this, Oracle provides the powerful REGEXP_SUBSTR function.
REGEXP_SUBSTR stands for "Regular Expression Substring." It searches a string for a regular expression pattern and returns the actual text that it finds.
This is perfect for:
- Pulling an email address from a block of text.
- Extracting a zip code from an address string.
- Getting the text from inside parentheses or brackets.
REGEXP_SUBSTR Function Syntax
The syntax has several parts, but you'll often just use the first two or three.
REGEXP_SUBSTR(source_char, pattern, [position], [occurrence], [match_param], [subexpr])
Let's break that down:
source_char: The original string or column you want to search (e.g.,'Notes: (confidential)'oruser_comments).pattern: The regular expression pattern you are looking for (e.g.,'[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}'to find an email).[position](Optional): An integer telling Oracle where to start searching. The default is1(the beginning of the string).[occurrence](Optional): An integer telling Oracle which match to return. The default is1(it returns the first match).[match_param](Optional): A text string that changes the matching behavior. For example,'i'makes the search case-insensitive.[subexpr](Optional): An integer (0-9) to return a specific "capture group" (a part of the pattern inside parentheses) instead of the whole match.
Oracle REGEXP_SUBSTR Function Examples
Here are two practical examples of how to use REGEXP_SUBSTR.
Example 1: Extracting an Email Address
This is a classic use case. You have a free-text field, and you need to pull out the email address.
The pattern '[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}' is a basic regular expression for an email.
Query:
SELECT
REGEXP_SUBSTR('Contact us at support@example.com for help.',
'[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}') AS "Email"
FROM DUAL;
Result:
Email ------------------- support@example.com
Example 2: Extracting Text from Parentheses
Imagine you want to extract only the text from inside a set of parentheses. We can do this using the subexpr argument.
- Pattern:
\(([^)]+)\)\(: Matches the literal(.([^)]+): This is our capture group (Group 1). It matches one or more characters that are not a).\): Matches the literal).
subexpr: We set this to1to return only what Group 1 captured.
Query:
SELECT
REGEXP_SUBSTR('Order #123 (SKU: 45A-B)',
'\(([^)]+)\)', 1, 1, NULL, 1) AS "SKU"
FROM DUAL;
Result:
SKU ----------- SKU: 45A-B
If we had left out the subexpr argument, the result would have been (SKU: 45A-B), including the parentheses.
