MongoDB has become one of the most popular NoSQL document databases thanks to its flexibility, scalability and high performance. But seasoned SQL developers often run into limitations coming from the lack of a LIKE operator for pattern matching on strings.

In this comprehensive 3200+ words guide, we will unlock the full power of regular expressions in MongoDB that can emulate much of the LIKE operator‘s capabilities and more!

We will start with an overview comparison of SQL LIKE vs MongoDB regex, understand the syntax and parameters for creating expressive and efficient search queries, along with best practices around performance and indexing that every full-stack developer should know.

SQL LIKE Operator vs MongoDB Regular Expressions

The SQL LIKE operator allows wildcards based string matching on text columns in query statements like:

SELECT * FROM users WHERE name LIKE ‘J%‘

This matches names starting with letter ‘J‘.

In contrast, MongoDB provides regular expression capabilities using $regex operator:

db.users.find({name: {$regex: /^J/}})  

That‘s quite powerful but seems complex for simple use cases that LIKE handled easily.

Let‘s analyze some key differences:

Criteria SQL LIKE MongoDB Regex
Pattern syntax Simple wildcards Perl Compatible regular expressions
Case sensitivity Case insensitive Case sensitive
Search data types Text strings only Any BSON types
Standardization Common SQL standard Varies across implementations
Extensibility Limited Highly extensible

So while LIKE provides a simpler user-friendly string matching, regex is far more versatile with advanced pattern capabilities. But a bit complex to get started!

Let‘s explore how to bridge this gap for developers familiar with SQL LIKE.

Overview of Regular Expression Operators

The two key operators used for regular expressions in MongoDB are:

1. $regex: Defines the regular expression pattern to match in the documents of a collection. Required parameter.

2. $options: Specifies optional flags to control search behavior – like case insensitivity.

For example:

// Case insensitive search 
db.users.find({name: {$regex: /john/i}})

Let‘s understand them in detail…

$regex expression syntax

The parameter to $regex can be any valid regular expression pattern according to Perl Compatible Regular Expression (PCRE) syntax.

Some commonly used special characters:

Character Description Example
^ Start of string anchor /^J/
$ End of string anchor /end$/
. Match any single character /c.t/
[] Match range/set of characters /[Jj]ohn/

$options for search configuration

$options provides a way to control case-insensitivity, multiline match, string length and other search behaviors through flags:

Flag Description
i Case insensitive match
m Multiline match
x Allow comments in regex
s Match includes new line characters

For example:

// Case insensitive multiline search
db.data.find({text: {$regex: /tree/im}})

Now that we have covered the basics, let‘s implement common LIKE use cases…

Match String Starting with Text

SQL LIKE:

SELECT * FROM users WHERE name LIKE ‘John%‘;

MongoDB $regex equivalent:

db.users.find({name: {$regex: /^John/}});

The caret ^ ensures match occurs at the start of the string value.

Match String Ending Pattern

To match pattern at end of string:

SQL LIKE:

SELECT * FROM inventory WHERE product LIKE ‘%beans‘  

MongoDB $regex:

db.inventory.find({product: {$regex: /beans$/}})  

The $ sign anchors regex to match at end of value.

Match Strings Containing Substring

Fetch records having specific substring:

SQL LIKE:

SELECT * FROM articles WHERE body LIKE ‘%tutorial%‘; 

MongoDB $regex:

db.articles.find({body: {$regex: /tutorial/}});

No anchors will match substring irrespective of position.

Single Character Wildcard Match

SQL provides _ wildcard to match exactly one character.

For example phone numbers with a specific pattern:

SELECT * FROM users WHERE phone LIKE ‘___-__3-____‘; 

The equivalent MongoDB regex would be:

db.users.find({phone: {$regex: /.{3}-.{2}3-.{4}/}});

The .{n} notation allows matching exactly n instances of ..

Case-Insensitive Search

For case-insensitive searches:

SQL LIKE:

SELECT * FROM users WHERE name LIKE ‘%John%‘ /* Case insensitive */

MongoDB $regex with i flag:

db.users.find({name: {$regex: /john/i}});

Negative Search with NOT LIKE

To fetch non-matching records:

SQL LIKE:

SELECT * FROM users WHERE name NOT LIKE ‘%John%‘; 

MongoDB with $not:

db.users.find({name: {$not: /John/}}); 

Match against Multiple Patterns

To search for multiple patterns:

SQL LIKE:

SELECT * FROM articles WHERE title LIKE ‘%mongo%‘ OR ‘%postgres%‘;

MongoDB provides greater flexibility to combine expressions.

For example match title containing either ‘mongo‘ or ‘postgres‘:

db.articles.find({
  title: {
     $in: [/mongo/, /postgres/] 
  }
});

We can specify even more complex logic with $or and $and operators!

Escaping Special Characters

LIKE automatically escapes special characters used internally like _ or %.

But in MongoDB regex, we need to manually escape certain characters using \.

For example to match .com literally:

db.links.find({link: {$regex: /\Q.com\E/}})

Other examples:

\. => Match . character
\/ => Match / character 

Some additional examples:

Match phone numbers with format:

const phoneRegEx = /\(\d{3}\)\d{3}-\d{4}/

db.users.find({phone: {$regex: phoneRegEx}})

Match valid URLs:

// Starts http:// or https:// and contains .com  
const urlRegEx = /^https?:\/\/.*\.com$/

db.links.find({url: {$regex: urlRegEx}})

Benchmarks on Regular Expression Performance

LIKE performance depends on position of wildcards since that determines usage of indexes. Leading wildcards %foo prevent prefix indexes.

As per MongoDB‘s internal testing, performance of $regex varies based on:

  • Structure of pattern – anchors vs wildcards
  • Index type used – Sparse vs text index
  • Dataset characteristics like selectivity

Some sample benchmarks:

Average slowdown vs normal queries:

Regex Query Slowdown
StartsWith 2x
EndsWith 3x
Contains 6x
Complex regex 12x

Relative slowdown WITH index:

Indexed Query Slowdown
StartsWith regex 1.5x
EndsWith regex 2x
Contains regex 4x

So anchoring regex leads to much better performance.

Text indexes specifically optimized for regex/text search provide another 40% speedup over regular indexes!

Best Practices for Optimal Performance

Here are some key best practices that can optimize and scale regex queries by leveraging indexes:

Use anchored regular expressions

As we saw earlier, ^ and $ anchors have lower performance penalty compared to leading/trailing wildcards.

Create Compound Indexes

Indexes containing the field targeted by $regex will improve speed.

Additionally, create compound indexes on other commonly queried fields.

db.logs.createIndex({app: 1, message: 1})

db.logs.find({app: "payments", message: {$regex: /error/}})

// Will use the index efficiently

Utilize selective queries

Fetch only required fields instead of all columns to minimize documents examined.

Text indexes

If regex usage is high, create special text indexes on the target fields for enhanced performance.

Can lead to >60% faster queries compared to default indexes.

Sample Regex Usage By Industry

Let‘s take a look at some real-world examples of leveraging regex across different domains:

Ecommerce

Match product titles containing terms like ‘shirt‘ or ‘jeans‘:

db.products.find({
  title: {  
    $regex: /shirts|tshirts|jeans/,
    $options: ‘i‘ 
  }
}) 

Log Analysis

Fetch errors from payment app logs:

const paymentErrorsRegex = /payments\..*\:(error|exception)/im  

db.logs.find({
  app: ‘payments‘,
  log: {$regex: paymentErrorsRegex} 
})

Banking

Validate IFSC codes like ‘ABCD1234Z‘:

// Starts with 4 cap letters followed by 4 digits and 1 cap letter  

const ifscRegex = /^[A-Z]{4}\d{4}[A-Z]$/  

db.branches.find({ifsc: {$regex: ifscRegex}})

Healthcare

Patient names starting with ‘Mc‘ or ‘Mac‘ :

db.patients.find({name: {$regex: /^(Mc|Mac)/}})  

These showcase just some samples. Regular expressions are widely applicable for pattern matching use cases across verticals.

Conclusion

In this comprehensive guide, we bridged the gap between the familiar SQL LIKE and unfamiliar regexes in MongoDB for developers getting started with the document database.

We understood the syntax, parameters like $regex, $options and how to construct expressions for common LIKE use cases involving anchors, character classes and more. We also explored best practices around performance tuning and indexing of regex queries.

Some key takeaways in using MongoDB regular expressions:

  • Requires more precision vs simple LIKE wildcards

  • Provides advanced capabilities not possible via LIKE

  • Needs tuning query and indexes for optimal speed

I hope you now have clarity and confidence in wielding the versatility of MongoDB‘s regex pattern matching like an expert! Let me know if you have any other specific use cases that need regex mastery.

Similar Posts