{"id":356,"date":"2024-01-28T11:43:42","date_gmt":"2024-01-28T11:43:42","guid":{"rendered":"https:\/\/learnpython.elegantwallp.com\/?p=356"},"modified":"2024-01-28T11:43:46","modified_gmt":"2024-01-28T11:43:46","slug":"python-regex-capturing-group","status":"publish","type":"post","link":"https:\/\/learnpython.elegantwallp.com\/2024\/01\/28\/python-regex-capturing-group\/","title":{"rendered":"Python Regex Capturing Group"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you\u2019ll learn about Python regex capturing groups to create subgroups for a match.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"introduction-to-the-python-regex-capturing-groups\">Introduction to the Python regex capturing groups<\/h2>\n\n\n\n<p>Suppose you have the following path that shows the news with the id 100 on a website:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>news\/100<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>The following\u00a0regular expression\u00a0matches the above path:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>\\w+\/\\d+<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Note that the above regular expression also matches any path that starts with one or more word characters, e.g.,&nbsp;<code>posts<\/code>,&nbsp;<code>todos<\/code>, etc. not just&nbsp;<code>news<\/code>.<\/p>\n\n\n\n<p>In this pattern:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>\\w+<\/code>\u00a0is a word\u00a0character set\u00a0with a\u00a0quantifier\u00a0(+) that matches one or more word characters.<\/li>\n\n\n\n<li><code>\/<\/code>&nbsp;mathes the forward slash&nbsp;<code>\/<\/code>&nbsp;character.<\/li>\n\n\n\n<li><code>\\d+<\/code>&nbsp;is digit character set with a quantfifer (<code>+<\/code>) that matches one or more digits.<\/li>\n<\/ul>\n\n\n\n<p>The following program uses the \\w+\/\\d+ pattern to match the string \u2018<code>news\/100'<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'news\/100' pattern = '\\w+\/\\d+' matches = re.finditer(pattern,s) for match in matches: print(match)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&lt;re.Match object; span=(0, 8), match='news\/100'><\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>It shows one match as expected.<\/p>\n\n\n\n<p>To get the\u00a0<code>id<\/code>\u00a0from the path, you use a capturing group. To define a capturing group for a pattern, you place the rule in parentheses<code>:<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>(rule)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>For example, to create a capturing group that captures the\u00a0<code>id<\/code>\u00a0from the path, you use the following pattern:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'\\w+\/(\\d+)'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this pattern, we place the rule\u00a0<code>\\d+<\/code>\u00a0inside the parentheses\u00a0<code>()<\/code>. If you run the program with the new pattern, you\u2019ll see that it displays one match:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'news\/100' pattern = '\\w+\/(\\d+)' matches = re.finditer(pattern, s) for match in matches: print(match)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&lt;re.Match object; span=(0, 8), match='news\/100'><\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>To get the capturing groups from a match, you the\u00a0<code>group()<\/code>\u00a0method of the\u00a0<code>Match<\/code>\u00a0object:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>match.group(index)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>The&nbsp;<code>group(0)<\/code>&nbsp;will return the entire match while the&nbsp;<code>group(1)<\/code>,&nbsp;<code>group(2)<\/code>, etc., return the first, second, \u2026 group.<\/p>\n\n\n\n<p>The\u00a0<code>lastindex<\/code>\u00a0property of the\u00a0<code>Match<\/code>\u00a0object returns the last index of all subgroups. The following program shows the entire match (<code>group(0)<\/code>) and all the subgroups:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'news\/100' pattern = '\\w+\/(\\d+)' matches = re.finditer(pattern, s) for match in matches: for index in range(0, match.lastindex + 1): print(match.group(index))<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>news\/100 100<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In the output, the&nbsp;<code>news\/100<\/code>&nbsp;is the entire match while&nbsp;<code>100<\/code>&nbsp;is the subgroup.<\/p>\n\n\n\n<p>If you want to capture also the resource (<code>news<\/code>) in the path (<code>news\/100<\/code>), you can create an additional capturing group like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'(\\w+)\/(\\d+)'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this pattern, we have two capturing groups one for\u00a0<code>\\w+<\/code>\u00a0and the other for\u00a0<code>\\d+<\/code>\u00a0. The following program shows the entire match and all the subgroups:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'news\/100' pattern = '(\\w+)\/(\\d+)' matches = re.finditer(pattern, s) for match in matches: for index in range(0, match.lastindex + 1): print(match.group(index))<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>news\/100 news 100<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In the output, the&nbsp;<code>news\/100<\/code>&nbsp;is the entire match while&nbsp;<code>news<\/code>&nbsp;and&nbsp;<code>100<\/code>&nbsp;are the subgroups.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"named-capturing-groups\">Named capturing groups<\/h2>\n\n\n\n<p>By default, you can access a subgroup in a match using an index, for example,&nbsp;<code>match.group(1)<\/code>. Sometimes, accessing a subgroup by a meaningful name is more convenient.<\/p>\n\n\n\n<p>You use the named capturing group to assign a name to a group. The following shows the syntax for assigning a name to a capturing group:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>(?P&lt;name>rule)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this syntax:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>()<\/code>&nbsp;indicates a capturing group.<\/li>\n\n\n\n<li><code>?P&lt;name&gt;<\/code>&nbsp;specifies the name of the capturing group.<\/li>\n\n\n\n<li><code>rule<\/code>&nbsp;is a rule in the pattern.<\/li>\n<\/ul>\n\n\n\n<p>For example, the following creates the names:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'(?P&lt;resource>\\w+)\/(?P&lt;id>\\d+)'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this syntax, the&nbsp;<code>resource<\/code>&nbsp;is the name for the first capturing group and the&nbsp;<code>id<\/code>&nbsp;is the name for the second capturing group.<\/p>\n\n\n\n<p>To get all the named subgroups of a match, you use the\u00a0<code>groupdict()<\/code>\u00a0method of the\u00a0<code>Match<\/code>\u00a0object. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'news\/100' pattern = '(?P&lt;resource>\\w+)\/(?P&lt;id>\\d+)' matches = re.finditer(pattern, s) for match in matches: print(match.groupdict())<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>{'resource': 'news', 'id': '100'}<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, the&nbsp;<code>groupdict()<\/code>&nbsp;method returns a dictionary where the keys are group names and values are the subgroups.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"more-named-capturing-group-example\">More named capturing group example<\/h3>\n\n\n\n<p>The following pattern:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>\\w+\/d{4}\/d{2}\/d{2}<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>matches this path:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>news\/2021\/12\/31<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>And you can add the named capturing groups to the pattern like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>'(?P&lt;resource>\\w+)\/(?P&lt;year>\\d{4})\/(?P&lt;month>\\d{1,2})\/(?P&lt;day>\\d{1,2})'<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>This program uses the patterns to match the path and shows all the subgroups:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'news\/2021\/12\/31' pattern = '(?P&lt;resource>\\w+)\/(?P&lt;year>\\d{4})\/(?P&lt;month>\\d{1,2})\/(?P&lt;day>\\d{1,2})' matches = re.finditer(pattern, s) for match in matches: print(match.groupdict())<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>{'resource': 'news', 'year': '2021', 'month': '12', 'day': '31'}<\/code><\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Summary: in this tutorial, you\u2019ll learn about Python regex capturing groups to create subgroups for a match. Introduction to the Python regex capturing groups Suppose you have the following path that shows the news with the id 100 on a website: The following\u00a0regular expression\u00a0matches the above path: Note that the above regular expression also matches [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[],"class_list":["post-356","post","type-post","status-publish","format-standard","hentry","category-2-python-regex"],"_links":{"self":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/356","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/comments?post=356"}],"version-history":[{"count":1,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/356\/revisions"}],"predecessor-version":[{"id":357,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/356\/revisions\/357"}],"wp:attachment":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/media?parent=356"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/categories?post=356"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/tags?post=356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}