{"id":394,"date":"2024-01-28T12:18:30","date_gmt":"2024-01-28T12:18:30","guid":{"rendered":"https:\/\/learnpython.elegantwallp.com\/?p=394"},"modified":"2024-01-28T12:18:31","modified_gmt":"2024-01-28T12:18:31","slug":"python-regex-sub","status":"publish","type":"post","link":"https:\/\/learnpython.elegantwallp.com\/2024\/01\/28\/python-regex-sub\/","title":{"rendered":"Python Regex sub()"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you\u2019ll learn about the Python regex&nbsp;<code>sub()<\/code>&nbsp;function that returns a string after replacing the matched pattern in a string with a replacement.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction to the Python regex sub function<\/h2>\n\n\n\n<p>The\u00a0<code>sub()<\/code>\u00a0is a function in the built-in\u00a0<code>re<\/code>\u00a0module that handles\u00a0regular expressions. The\u00a0<code>sub()<\/code>\u00a0function has the following syntax:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>re.sub(pattern, repl, string, count=0, flags=0)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this syntax:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>pattern<\/code>&nbsp;is a regular expression that you want to match. Besides a regular expression, the&nbsp;<code>pattern<\/code>&nbsp;can be&nbsp;<code>Pattern<\/code>&nbsp;object.<\/li>\n\n\n\n<li><code>repl<\/code>&nbsp;is the replacement<\/li>\n\n\n\n<li><code>string<\/code>&nbsp;is the input string<\/li>\n\n\n\n<li><code>count<\/code>&nbsp;parameter specifies the maximum number of matches that the&nbsp;<code>sub()<\/code>&nbsp;function should replace. If you pass zero to the&nbsp;<code>count<\/code>&nbsp;parameter or completely skip it, the&nbsp;<code>sub()<\/code>&nbsp;function will replace all the matches.<\/li>\n\n\n\n<li><code>flags<\/code>\u00a0is one or more\u00a0regex flags\u00a0that modify the standard behavior of the pattern.<\/li>\n<\/ul>\n\n\n\n<p>The&nbsp;<code>sub()<\/code>&nbsp;function searches for the pattern in the string and replaces the matched strings with the replacement (<code>repl<\/code>).<\/p>\n\n\n\n<p>If the&nbsp;<code>sub()<\/code>&nbsp;function couldn\u2019t find a match, it returns the original string. Otherwise, the&nbsp;<code>sub()<\/code>&nbsp;function returns the string after replacing the matches.<\/p>\n\n\n\n<p>Note that the&nbsp;<code>sub()<\/code>&nbsp;function replaces the leftmost non-overlapping occurrences of the pattern. And you\u2019ll see it in detail in the following example.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Python regex sub function examples<\/h2>\n\n\n\n<p>Let\u2019s take some examples of using the regex&nbsp;<code>sub()<\/code>&nbsp;function.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Using the regex&nbsp;<code>sub()<\/code>&nbsp;function to return the plain phone number<\/h3>\n\n\n\n<p>The following example uses the\u00a0<code>sub()<\/code>\u00a0function to turn the phone number\u00a0<code>(212)-456-7890<\/code>\u00a0into\u00a0<code>2124567890<\/code>\u00a0:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re phone_no = '(212)-456-7890' pattern = '\\D' result = re.sub(pattern, '',phone_no) print(result)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>2124567890<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, the\u00a0<code>\\D<\/code>\u00a0is an inverse digit\u00a0character set\u00a0that matches any single character which is not a digit. Therefore, the\u00a0<code>sub()<\/code>\u00a0function replaces all non-digit characters with the empty string\u00a0<code>''<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) Using the regex&nbsp;<code>sub()<\/code>&nbsp;function to replace the leftmost non-overlapping occurrences of a pattern<\/h3>\n\n\n\n<p>The following example replaces the\u00a0<code>00<\/code>\u00a0with the\u00a0<code>''<\/code>\u00a0in the string\u00a0<code>'000000'<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re pattern = '00' s = '00000' result = re.sub(pattern,'',s) print(result)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>0<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, we replace two zeros with empty strings. So the first two are matched and replaced, then the following two zeroes are matches and replaced too, and finally, the last digit remains unchanged.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3) Using the regex&nbsp;<code>sub()<\/code>&nbsp;with a backreference example<\/h3>\n\n\n\n<p>The following example uses the\u00a0<code>sub()<\/code>\u00a0function to replace the text surrounded with (<code>*<\/code>) (it\u2019s markdown format by the way) with the\u00a0<code>&lt;b><\/code>\u00a0tag in HTML:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'Make the World a *Better Place*' pattern = r'\\*(.*?)\\*' replacement = r'&lt;b>\\1&lt;\\\\b>' html = re.sub(pattern, replacement, s) print(html)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'Make the World a *Better Place*' pattern = r'\\*(.*?)\\*' replacement = r'&lt;b>\\1&lt;\\\\b>' html = re.sub(pattern, replacement, s) print(html)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>Make the World a &lt;b>Better Place&lt;\\b><\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, the pattern&nbsp;<code>r'\\*(.*?)\\*'<\/code>&nbsp;find the text that begins and ends with the asterisk (<code>*<\/code>). It has a capturing group that captures the text between asterisks (<code>*<\/code>).<\/p>\n\n\n\n<p>The replacement is a regular expression with a\u00a0backreference. The backreference\u00a0<code>\\1<\/code>\u00a0refers to the first group in the pattern, which is the text between the asterisks (<code>*<\/code>).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4) Using the regex&nbsp;<code>sub()<\/code>&nbsp;function with the replacement as a function<\/h3>\n\n\n\n<p>Suppose you have a list of strings where each element contain both alphabet and number:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>l = &#91;'A1','A2','A3']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>And you want to square the number in each list element. For example, A1 becomes A1, A2 becomes A4, and A3 becomes A9. To do this, you can use the&nbsp;<code>sub()<\/code>&nbsp;function.<\/p>\n\n\n\n<p>The second argument of the&nbsp;<code>sub()<\/code>&nbsp;function (<code>repl<\/code>) can be a function. In this case, the&nbsp;<code>sub()<\/code>&nbsp;function will call this function for every non-overlapping occurrence of the pattern.<\/p>\n\n\n\n<p>This function (<code>repl<\/code>) takes a single&nbsp;<code>Match<\/code>&nbsp;object argument and returns the replacement string.<\/p>\n\n\n\n<p>The following illustrates how to use the second argument as a function:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re def square(match): num = int(match.group()) return str(num*num) l = &#91;'A1','A2','A3'] pattern = r'\\d+' new_l = &#91;re.sub(pattern, square, s) for s in l] print(new_l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'A1', 'A4', 'A9']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>How it works.<\/p>\n\n\n\n<p>First, define a list of strings:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>l = &#91;'A1','A2','A3']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Second, define a pattern\u00a0<code>\\d+<\/code>\u00a0that match one or more digits:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>pattern = r'\\d+'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Third, replace the digits with their squares by calling the\u00a0<code>sub()<\/code>\u00a0function and passing the\u00a0<code>square()<\/code>\u00a0function:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>new_l = &#91;re.sub(pattern, square, s) for s in l]<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Finally, define the\u00a0<code>square()<\/code>\u00a0function that squares the matched digit and returns it:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>def square(match): num = int(match.group()) return str(num*num)<\/code><\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Summary: in this tutorial, you\u2019ll learn about the Python regex&nbsp;sub()&nbsp;function that returns a string after replacing the matched pattern in a string with a replacement. Introduction to the Python regex sub function The\u00a0sub()\u00a0is a function in the built-in\u00a0re\u00a0module that handles\u00a0regular expressions. The\u00a0sub()\u00a0function has the following syntax: In this syntax: The&nbsp;sub()&nbsp;function searches for the pattern in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[],"class_list":["post-394","post","type-post","status-publish","format-standard","hentry","category-2-python-regex"],"_links":{"self":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/comments?post=394"}],"version-history":[{"count":1,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/394\/revisions"}],"predecessor-version":[{"id":395,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/394\/revisions\/395"}],"wp:attachment":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/media?parent=394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/categories?post=394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/tags?post=394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}