Oracle REGEXP_SUBSTR Function: A Simple Guide

While REGEXP_REPLACE is for changing text, what if you just want to extract text that matches a pattern? For this, Oracle provides the powerful REGEXP_SUBSTR function.

REGEXP_SUBSTR stands for "Regular Expression Substring." It searches a string for a regular expression pattern and returns the actual text that it finds.

This is perfect for:

  • Pulling an email address from a block of text.
  • Extracting a zip code from an address string.
  • Getting the text from inside parentheses or brackets.

REGEXP_SUBSTR Function Syntax

The syntax has several parts, but you'll often just use the first two or three.

REGEXP_SUBSTR(source_char, pattern, [position], [occurrence], [match_param], [subexpr])

Let's break that down:

  • source_char: The original string or column you want to search (e.g., 'Notes: (confidential)' or user_comments).
  • pattern: The regular expression pattern you are looking for (e.g., '[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}' to find an email).
  • [position] (Optional): An integer telling Oracle where to start searching. The default is 1 (the beginning of the string).
  • [occurrence] (Optional): An integer telling Oracle which match to return. The default is 1 (it returns the first match).
  • [match_param] (Optional): A text string that changes the matching behavior. For example, 'i' makes the search case-insensitive.
  • [subexpr] (Optional): An integer (0-9) to return a specific "capture group" (a part of the pattern inside parentheses) instead of the whole match.

Oracle REGEXP_SUBSTR Function Examples

Here are two practical examples of how to use REGEXP_SUBSTR.

Example 1: Extracting an Email Address

This is a classic use case. You have a free-text field, and you need to pull out the email address.

The pattern '[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}' is a basic regular expression for an email.

Query:

SELECT 
  REGEXP_SUBSTR('Contact us at support@example.com for help.',
                '[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}') AS "Email"
FROM DUAL;

Result:

Email
-------------------
support@example.com

Example 2: Extracting Text from Parentheses

Imagine you want to extract only the text from inside a set of parentheses. We can do this using the subexpr argument.

  • Pattern:\(([^)]+)\)
    • \(: Matches the literal (.
    • ([^)]+): This is our capture group (Group 1). It matches one or more characters that are not a ).
    • \): Matches the literal ).
  • subexpr: We set this to 1 to return only what Group 1 captured.

Query:

SELECT 
  REGEXP_SUBSTR('Order #123 (SKU: 45A-B)',
                '\(([^)]+)\)', 1, 1, NULL, 1) AS "SKU"
FROM DUAL;

Result:

SKU
-----------
SKU: 45A-B

If we had left out the subexpr argument, the result would have been (SKU: 45A-B), including the parentheses.

Vinish Kapoor
Vinish Kapoor

Vinish Kapoor is a seasoned software development professional and a fervent enthusiast of artificial intelligence (AI). His impressive career spans over 25+ years, marked by a relentless pursuit of innovation and excellence in the field of information technology. As an Oracle ACE, Vinish has distinguished himself as a leading expert in Oracle technologies, a title awarded to individuals who have demonstrated their deep commitment, leadership, and expertise in the Oracle community.

guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments