{"id":340,"date":"2024-01-28T11:28:13","date_gmt":"2024-01-28T11:28:13","guid":{"rendered":"https:\/\/learnpython.elegantwallp.com\/?p=340"},"modified":"2024-01-28T11:28:14","modified_gmt":"2024-01-28T11:28:14","slug":"python-regular-expressions","status":"publish","type":"post","link":"https:\/\/learnpython.elegantwallp.com\/2024\/01\/28\/python-regular-expressions\/","title":{"rendered":"Python Regular Expressions"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you\u2019ll learn about Python regular expressions and how to use the most commonly used regular expression functions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction to the Python regular expressions<\/h2>\n\n\n\n<p>Regular expressions (called regex or regexp) specify search patterns. Typical examples of regular expressions are the patterns for matching email addresses, phone numbers, and credit card numbers.<\/p>\n\n\n\n<p>Regular expressions are essentially a specialized programming language embedded in Python. And you can interact with regular expressions via the built-in&nbsp;<code>re<\/code>&nbsp;module in Python.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.pythontutorial.net\/wp-content\/uploads\/2021\/11\/python-regular-expressions.svg\" alt=\"\" class=\"wp-image-3131\"\/><\/figure>\n\n\n\n<p>The following shows an example of a simple regular expression:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'\\d'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, a regular expression is a string that contains a search pattern. The\u00a0<code>'\\d'<\/code>\u00a0is a digit\u00a0character set\u00a0that matches any single digit from 0 to 9.<\/p>\n\n\n\n<p>Note that you\u2019ll learn how to construct more complex and advanced patterns in the next tutorials. This tutorial focuses on the functions that deal with regular expressions.<\/p>\n\n\n\n<p>To use this regular expression, you follow these steps:<\/p>\n\n\n\n<p>First, import the\u00a0<code>re<\/code>\u00a0module:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Second, compile the regular expression into a\u00a0<code>Pattern<\/code>\u00a0object:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>p = re.compile('\\d')<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Third, use one of the methods of the\u00a0<code>Pattern<\/code>\u00a0object to match a string:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>s = \"Python 3.10 was released on October 04, 2021\" result = p.findall(s) print(result)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'3', '1', '0', '0', '4', '2', '0', '2', '1']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>The&nbsp;<code>findall()<\/code>&nbsp;method returns a list of single digits in the string s.<\/p>\n\n\n\n<p>The following shows the complete program:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re p = re.compile('\\d') s = \"Python 3.10 was released on October 04, 2021\" results = p.findall(s) print(results)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Besides the&nbsp;<code>findall()<\/code>&nbsp;method, the&nbsp;<code>Pattern<\/code>&nbsp;object has other essential methods that allow you to match a string:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Method<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td><code><a href=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-match\/\">match()<\/a><\/code><\/td><td>Find the pattern at the beginning of a string<\/td><\/tr><tr><td><code>search()<\/code><\/td><td>Return the first match of a pattern in a string<\/td><\/tr><tr><td><code>findall()<\/code><\/td><td>Return all matches of a pattern in a string<\/td><\/tr><tr><td><code>finditer()<\/code><\/td><td>Return all matches of a pattern as an&nbsp;<a href=\"https:\/\/www.pythontutorial.net\/advanced-python\/python-iterators\/\">iterator<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Python regular expression functions<\/h2>\n\n\n\n<p>Besides the&nbsp;<code>Pattern<\/code>&nbsp;class, the&nbsp;<code>re<\/code>&nbsp;module has some functions that match a string for a pattern:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>match()<\/code><\/li>\n\n\n\n<li><code>search()<\/code><\/li>\n\n\n\n<li><code>findall()<\/code><\/li>\n\n\n\n<li><code>finditer()<\/code><\/li>\n<\/ul>\n\n\n\n<p>These functions have the same names as the methods of the&nbsp;<code>Pattern<\/code>&nbsp;object. Also, they take the same arguments as the corresponding methods of the&nbsp;<code>Pattern<\/code>&nbsp;object. However, you don\u2019t have to manually compile the regular expression before using it.<\/p>\n\n\n\n<p>The following example shows the same program that uses the\u00a0<code>findall()<\/code>\u00a0function instead of the\u00a0<code>findall()<\/code>\u00a0method of a\u00a0<code>Pattern<\/code>\u00a0object:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = \"Python 3.10 was released on October 04, 2021.\" results = re.findall('\\d',s) print(results)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Using the functions in the&nbsp;<code>re<\/code>&nbsp;module is more concise than the methods of the&nbsp;<code>Pattern<\/code>&nbsp;object because you don\u2019t have to compile regular expressions manually.<\/p>\n\n\n\n<p>Under the hood, these functions create a&nbsp;<code>Pattern<\/code>&nbsp;object and call the appropriate method on it. They also store the compiled regular expression in a cache for speed optimization.<\/p>\n\n\n\n<p>It means that if you call the same regular expression from the second time, these functions will not need to recompile the regular expression. Instead, they get the compiled regular expression from the cache.<\/p>\n\n\n\n<p>Should you use the&nbsp;<code>re<\/code>&nbsp;functions or methods of the&nbsp;<code>Pattern<\/code>&nbsp;object?<\/p>\n\n\n\n<p>If you use a regular expression within a\u00a0loop, the\u00a0<code>Pattern<\/code>\u00a0object may save a few function calls. However, if you use it outside of loops, the difference is very little due to the internal cache.<\/p>\n\n\n\n<p>The following sections discuss the most commonly used functions in the&nbsp;<code>re<\/code>&nbsp;module including&nbsp;<code>search()<\/code>,&nbsp;<code>match()<\/code>, and&nbsp;<code>fullmatch()<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">search() function<\/h3>\n\n\n\n<p>The\u00a0<code>search()<\/code>\u00a0function searches for a pattern within a string. If there is a match, it returns the first Match object or None otherwise. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = \"Python 3.10 was released on October 04, 2021.\" pattern = '\\d{2}' match = re.search(pattern, s) print(type(match)) print(match)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&lt;class 're.Match'> &lt;re.Match object; span=(9, 11), match='10'><\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, the&nbsp;<code>search()<\/code>&nbsp;function returns the first two digits in the string&nbsp;<code>s<\/code>&nbsp;as the&nbsp;<code>Match<\/code>&nbsp;object.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Match object<\/h3>\n\n\n\n<p>The&nbsp;<code>Match<\/code>&nbsp;object provides the information about the matched string. It has the following important methods:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Method<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td><code>group()<\/code><\/td><td>Return the matched string<\/td><\/tr><tr><td><code>start()<\/code><\/td><td>Return the starting position of the match<\/td><\/tr><tr><td><code>end()<\/code><\/td><td>Return the ending position of the match<\/td><\/tr><tr><td><code>span()<\/code><\/td><td>Return a tuple (start, end) that specifies the positions of the match<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The following example examines the\u00a0<code>Match<\/code>\u00a0object:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = \"Python 3.10 was released on October 04, 2021.\" result = re.search('\\d', s) print('Matched string:',result.group()) print('Starting position:', result.start()) print('Ending position:',result.end()) print('Positions:',result.span())<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>Matched string: 3 Starting position: 7 Ending position: 8 Positions: (7, 8)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">match() function<\/h3>\n\n\n\n<p>The match() function returns a\u00a0<code>Match<\/code>\u00a0object if it finds a pattern at the beginning of a string. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re l = &#91;'Python', 'CPython is an implementation of Python written in C', 'Jython is a Java implementation of Python', 'IronPython is Python on .NET framework'] pattern = '\\wython' for s in l: result = re.match(pattern,s) print(result)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&lt;re.Match object; span=(0, 6), match='Python'> None &lt;re.Match object; span=(0, 6), match='Jython'> None<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, the&nbsp;<code>\\w<\/code>&nbsp;is the word character set that matches any single character.<\/p>\n\n\n\n<p>The&nbsp;<code>\\wython<\/code>&nbsp;matches any string that starts with any sing word character and is followed by the literal string&nbsp;<code>ython<\/code>, for example,&nbsp;<code>Python<\/code>.<\/p>\n\n\n\n<p>Since the\u00a0<code>match()<\/code>\u00a0function only finds the pattern at the beginning of a string, the following strings match the pattern:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>Python Jython is a Java implementation of Python<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>And the following string doesn\u2019t match:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'CPython is an implementation of Python written in C' 'IronPython is Python on .NET framework'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">fullmatch() function<\/h3>\n\n\n\n<p>The\u00a0<code>fullmatch()<\/code>\u00a0function returns a\u00a0<code>Match<\/code>\u00a0object if the whole string matches a pattern or\u00a0<code>None<\/code>\u00a0otherwise. The following example uses the\u00a0<code>fullmatch()<\/code>\u00a0function to match a string with four digits:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = \"2021\" pattern = '\\d{4}' result = re.fullmatch(pattern, s) print(result)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&lt;re.Match object; span=(0, 4), match='2019'><\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>The pattern&nbsp;<code>'\\d{4}'<\/code>&nbsp;matches a string with four digits. Therefore, the&nbsp;<code>fullmatch()<\/code>&nbsp;function returns the string&nbsp;<code>2021<\/code>.<\/p>\n\n\n\n<p>If you place the number\u00a0<code>2021<\/code>\u00a0at the middle or the end of the string, the\u00a0<code>fullmatch()<\/code>\u00a0will return\u00a0<code>None<\/code>. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = \"Python 3.10 released in 2021\" pattern = '\\d{4}' result = re.fullmatch(pattern, s) print(result)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>None<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Regular expressions and raw strings<\/h2>\n\n\n\n<p>It\u2019s important to note that Python and regular expression are different programming languages. They have their own syntaxes.<\/p>\n\n\n\n<p>The&nbsp;<code>re<\/code>&nbsp;module is the interface between Python and regular expression programming languages. It behaves like an interpreter between them.<\/p>\n\n\n\n<p>To construct a pattern, regular expressions often use a backslash&nbsp;<code>'\\'<\/code>&nbsp;for example&nbsp;<code>\\d<\/code>&nbsp;and&nbsp;<code>\\w<\/code>&nbsp;. But this collides with Python\u2019s usage of the backslash for the same purpose in string literals.<\/p>\n\n\n\n<p>For example, suppose you need to match the following string:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>s = '\\section'<\/code><small>Code language: JavaScript (javascript)<\/small><\/code><\/pre>\n\n\n\n<p>In Python, the\u00a0backslash\u00a0(<code>\\<\/code>) is a special character. To construct a regular expression, you need to escape any backslashes by preceding each of them with a backslash (<code>\\<\/code>):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>pattern = '\\\\section'<\/code><small>Code language: JavaScript (javascript)<\/small><\/code><\/pre>\n\n\n\n<p>In regular expressions, the pattern must be\u00a0<code>'\\\\section'<\/code>. However, to express this pattern in a string literal in Python, you need to use two more backslashes to escape both backslashes again:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>pattern = '\\\\\\\\section'<\/code><small>Code language: JavaScript (javascript)<\/small><\/code><\/pre>\n\n\n\n<p>Simply put, to match a literal backslash (<code>'\\'<\/code>), you have to write&nbsp;<code>'\\\\\\\\'<\/code>&nbsp;because the regular expression must be&nbsp;<code>'\\\\'<\/code>&nbsp;and each backslash must be expressed as&nbsp;<code>'\\\\'<\/code>&nbsp;inside a string literal in Python.<\/p>\n\n\n\n<p>This results in lots of repeated backslashes. Hence, it makes the regular expressions difficult to read and understand.<\/p>\n\n\n\n<p>A solution is to use the\u00a0raw strings\u00a0in Python for regular expressions because raw strings treat the backslash (<code>\\<\/code>) as a literal character, not a special character.<\/p>\n\n\n\n<p>To turn a regular string into a raw string, you prefix it with the letter\u00a0<code>r<\/code>\u00a0or\u00a0<code>R<\/code>. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '\\section' pattern = r'\\\\section' result = re.findall(pattern, s) print(result) <\/code><small>Code language: JavaScript (javascript)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'\\\\section']<\/code><small>Code language: JSON \/ JSON with Comments (json)<\/small><\/code><\/pre>\n\n\n\n<p>Note that in Python \u2018\\section\u2019 and \u2018\\\\section\u2019 are the same:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>p1 = '\\\\section' p2 = '\\section' print(p1==p2) <em># true<\/em><\/code><small>Code language: PHP (php)<\/small><\/code><\/pre>\n\n\n\n<p>In practice, you\u2019ll find the regular expressions constructed in Python using the raw strings.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: in this tutorial, you\u2019ll learn about Python regular expressions and how to use the most commonly used regular expression functions. Introduction to the Python regular expressions Regular expressions (called regex or regexp) specify search patterns. Typical examples of regular expressions are the patterns for matching email addresses, phone numbers, and credit card numbers. Regular [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[],"class_list":["post-340","post","type-post","status-publish","format-standard","hentry","category-2-python-regex"],"_links":{"self":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/340","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/comments?post=340"}],"version-history":[{"count":1,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/340\/revisions"}],"predecessor-version":[{"id":341,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/340\/revisions\/341"}],"wp:attachment":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/media?parent=340"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/categories?post=340"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/tags?post=340"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}