{"id":358,"date":"2024-01-28T11:46:16","date_gmt":"2024-01-28T11:46:16","guid":{"rendered":"https:\/\/learnpython.elegantwallp.com\/?p=358"},"modified":"2024-01-28T11:46:17","modified_gmt":"2024-01-28T11:46:17","slug":"python-regex-backreferences","status":"publish","type":"post","link":"https:\/\/learnpython.elegantwallp.com\/2024\/01\/28\/python-regex-backreferences\/","title":{"rendered":"Python Regex Backreferences"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you\u2019ll learn about Python regex backreferences and how to apply them effectively.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction to the Python regex backreferences<\/h2>\n\n\n\n<p>Backreferences like\u00a0variables\u00a0in Python. The backreferences allow you to reference\u00a0capturing groups\u00a0within a\u00a0regular expression.<\/p>\n\n\n\n<p>The following shows the syntax of a backreference:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>\\N<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Alternatively, you can use the following syntax:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>\\g&lt;N><\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this syntax,&nbsp;<code>N<\/code>&nbsp;can be 1, 2, 3, etc. that represents the corresponding capturing group.<\/p>\n\n\n\n<p>Note that the&nbsp;<code>\\g&lt;0&gt;<\/code>&nbsp;refer to the entire match, which has the same value as the&nbsp;<code>match.group(0)<\/code>.<\/p>\n\n\n\n<p>Suppose you have a string with the duplicate word\u00a0<code>Python<\/code>\u00a0like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>s = 'Python Python is awesome'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>And you want to remove the duplicate word (<code>Python<\/code>) so that the result string will be:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>Python is awesome<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>To do that, you can use a regular expression with a backreference.<\/p>\n\n\n\n<p>First, match a word with one or more characters and one or more space:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'\\w+\\s+'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Second, create a capturing group that contains only the word characters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'(\\w+)\\s+'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Third, create a backreference that references the first capturing group:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'(\\w+)\\s+\\1'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this pattern, the&nbsp;<code>\\1<\/code>&nbsp;is a backreference that references the (<code>\\w+<\/code>) capturing group.<\/p>\n\n\n\n<p>Finally, replace the entire match with the first capturing group using the\u00a0<code>sub()<\/code>\u00a0function from the\u00a0<code>re<\/code>\u00a0module:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'Python Python is awesome' new_s = re.sub(r'(\\w+)\\s+\\1', r'\\1', s) print(new_s)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>Python is awesome<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">More Python regex backreference examples<\/h2>\n\n\n\n<p>Let\u2019s take some more examples of using backreferences.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Using Python regex backreferences to get text inside quotes<\/h3>\n\n\n\n<p>Suppose you want to get the text within double quotes:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>\"This is regex backreference example\"<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Or single quote:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'This is regex backreference example'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>But not mixed of single and double-quotes. The following will not match:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'not match\"<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>To do this, you may use the following pattern:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'&#91;\\'\"](.*?)&#91;\\'\"]'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>However, this pattern will match text that starts with a single quote (\u2018) and ends with a double quote (\u201c) or vice versa. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '\"Python\\'s awsome\". She said' pattern = '&#91;\\'\"].*?&#91;\\'\"]' match = re.search(pattern, s) print(match.group(0))<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>It returns the\u00a0<code>\"Python'<\/code>\u00a0not\u00a0<code>\"Python's awesome\"<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>\"Python'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>To fix it, you can use a backreference:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>r'(&#91;\\'\"]).*?\\1'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>The backreference&nbsp;<code>\\1<\/code>&nbsp;refers to the first capturing group. So if the subgroup starts with a single quote, the&nbsp;<code>\\1<\/code>&nbsp;will match the single quote. And if the subgroup starts with a double-quote, the&nbsp;<code>\\1<\/code>&nbsp;will match the double-quote.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '\"Python\\'s awsome\". She said' pattern = r'(&#91;\\'\"])(.*?)\\1' match = re.search(pattern, s) print(match.group())<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>\"Python's awsome\"<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">2) Using Python regex backreferences to find words that have at least one consecutive repeated character<\/h3>\n\n\n\n<p>The following example uses a backreference to find words that have at least one consecutive repeated character:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re words = &#91;'apple', 'orange', 'strawberry'] pattern = r'\\b\\w*(\\w)\\1\\w*\\b' results = &#91;w for w in words if re.search(pattern, w)] print(results)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'apple', 'strawberry']<\/code><\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Summary: in this tutorial, you\u2019ll learn about Python regex backreferences and how to apply them effectively. Introduction to the Python regex backreferences Backreferences like\u00a0variables\u00a0in Python. The backreferences allow you to reference\u00a0capturing groups\u00a0within a\u00a0regular expression. The following shows the syntax of a backreference: Alternatively, you can use the following syntax: In this syntax,&nbsp;N&nbsp;can be 1, 2, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[],"class_list":["post-358","post","type-post","status-publish","format-standard","hentry","category-2-python-regex"],"_links":{"self":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/358","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/comments?post=358"}],"version-history":[{"count":1,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/358\/revisions"}],"predecessor-version":[{"id":359,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/358\/revisions\/359"}],"wp:attachment":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/media?parent=358"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/categories?post=358"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/tags?post=358"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}