{"id":398,"date":"2024-01-28T12:22:31","date_gmt":"2024-01-28T12:22:31","guid":{"rendered":"https:\/\/learnpython.elegantwallp.com\/?p=398"},"modified":"2024-01-28T12:22:32","modified_gmt":"2024-01-28T12:22:32","slug":"python-regex-flags","status":"publish","type":"post","link":"https:\/\/learnpython.elegantwallp.com\/2024\/01\/28\/python-regex-flags\/","title":{"rendered":"Python Regex Flags"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you\u2019ll learn about the Python regex flags and how they change the behavior of the regex engine for pattern matching.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction to the Python regex flags<\/h2>\n\n\n\n<p>The regular expression functions like\u00a0findall,\u00a0finditer,\u00a0search,\u00a0match,\u00a0split,\u00a0sub, \u2026 have the parameter (<code>flags<\/code>) that accepts one or more regex flags.<\/p>\n\n\n\n<p>Since Python 3.6, regex flags are instances of the\u00a0<code>RegexFlag<\/code>\u00a0enumeration\u00a0class in the\u00a0<code>re<\/code>\u00a0module. The following table shows the available regex flags and their meanings:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Flag<\/th><th>Alias<\/th><th>Inline Flag<\/th><th>Meaning<\/th><\/tr><\/thead><tbody><tr><td><code>re.ASCII<\/code><\/td><td><code>re.A<\/code><\/td><td><code>?m<\/code><\/td><td>The&nbsp;<code>re.ASCII<\/code>&nbsp;is relevant to the byte patterns only. It makes the&nbsp;<code>\\w<\/code>,&nbsp;<code>\\W<\/code>,<code>\\b<\/code>,&nbsp;<code>\\B<\/code>,&nbsp;<code>\\d<\/code>, \\D, and&nbsp;<code>\\S<\/code>&nbsp;perform ASCII-only matching instead of full Unicode matching.<\/td><\/tr><tr><td><code>re.DEBUG<\/code><\/td><td>N\/A<\/td><td>N\/A<\/td><td>The&nbsp;<code>re.DEBUG<\/code>&nbsp;shows the debug information of compiled pattern.<\/td><\/tr><tr><td><code>re.IGNORECASE<\/code><\/td><td><code>re.I<\/code><\/td><td><code>?i<\/code><\/td><td>perform case-insensitive matching. It means that the&nbsp;<code>[A-Z]<\/code>&nbsp;will also match lowercase letters.<\/td><\/tr><tr><td><code>re.LOCALE<\/code><\/td><td><code>re.L<\/code><\/td><td><code>?L<\/code><\/td><td>The&nbsp;<code>re.LOCALE<\/code>&nbsp;is relevant only to the byte pattern. It makes the&nbsp;<code>\\w<\/code>,&nbsp;<code>\\W<\/code>,&nbsp;<code>\\b<\/code>,&nbsp;<code>\\B<\/code>&nbsp;and case-sensitive matching dependent on the current locale. The&nbsp;<code>re.LOCALE<\/code>&nbsp;is not compatible with the&nbsp;<code>re.ASCII<\/code>&nbsp;flag.<\/td><\/tr><tr><td><code>re.MUTILINE<\/code><\/td><td><code>re.M<\/code><\/td><td><code>?m<\/code><\/td><td>The&nbsp;<code>re.MULTILINE<\/code>&nbsp;makes the&nbsp;<code>^<\/code>&nbsp;matches at the beginning of a string and at the beginning of each line and&nbsp;<code>$<\/code>&nbsp;matches at the end of a string and at the end of each line.<\/td><\/tr><tr><td><code>re.DOTALL<\/code><\/td><td><code>re.S<\/code><\/td><td><code>?s<\/code><\/td><td>By default, the dot (<code>.<\/code>) matches any characters except a newline. The&nbsp;<code>re.DOTALL<\/code>&nbsp;makes the dot (<code>.<\/code>) matches all characters including a newline.<\/td><\/tr><tr><td><code>re.VERBOSE<\/code><\/td><td><code>re.X<\/code><\/td><td><code>?x<\/code><\/td><td>The&nbsp;<code>re.VERBOSE<\/code>&nbsp;flag allows you to organize a pattern into logical sections visually and add comments.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>To combine two or more flags, you use the\u00a0<code>|<\/code>\u00a0operator like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>re. A | re.M | re.S<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Python regex flags<\/h2>\n\n\n\n<p>Let\u2019s take some examples of using the Python regex flags.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) The re.IGNORECASE flag example<\/h3>\n\n\n\n<p>The following example uses the\u00a0<code>findall()<\/code>\u00a0function to match all lowercase characters in the set\u00a0<code>[a-z]<\/code>\u00a0in a string:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'Python is awesome' pattern = '&#91;a-z]+' l = re.findall(pattern, s) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'ython', 'is', 'awesome']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Note that the letter&nbsp;<code>P<\/code>&nbsp;is not included in the result because it is not in the set&nbsp;<code>[a-z]<\/code>.<\/p>\n\n\n\n<p>The following example uses the\u00a0<code>re.INGORECASE<\/code>\u00a0flag:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'Python is awesome' pattern = '&#91;a-z]+' l = re.findall(pattern, s, re.IGNORECASE) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'Python', 'is', 'awesome']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Even though the pattern matches only characters in the set&nbsp;<code>[a-z]<\/code>, the&nbsp;<code>re.IGNORECASE<\/code>&nbsp;flag instructs the regex engine to also match characters in&nbsp;<code>[A-Z]<\/code>&nbsp;set.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) The re.MULTILINE flag example<\/h3>\n\n\n\n<p>The following example uses the ^ anchor to match one or more word characters at the beginning of a string:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '''Regex Flags''' pattern ='^\\w+' l = re.findall(pattern,s) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'Regex']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>The s string has two lines. The&nbsp;<code>^<\/code>&nbsp;only match at the beginning of the string as expected.<\/p>\n\n\n\n<p>If you use the\u00a0<code>re.MULTILINE<\/code>\u00a0flag, the\u00a0<code>^<\/code>\u00a0will match at the beginning of each line. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '''Regex Flags''' pattern = '^\\w+' l = re.findall(pattern, s, re.MULTILINE) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'Regex', 'Flags']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">3) The re.DOTALL flag example<\/h3>\n\n\n\n<p>In this example, the dot\u00a0<code>.+<\/code>\u00a0pattern match one or more characters except for the new line:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '''Regex Flags''' pattern = '.+' l = re.findall(pattern, s) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'Regex', 'Flags']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>If you use the\u00a0<code>re.DOTALL<\/code>\u00a0flag, the\u00a0<code>.+<\/code>\u00a0will also match the new line:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '''Regex Flags''' pattern = '.+' l = re.findall(pattern, s, re.DOTALL) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'Regex\\nFlags']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">4) The re.VERBOSE flag example<\/h3>\n\n\n\n<p>The following example shows how to use the\u00a0<code>re.VERBOSE<\/code>\u00a0flag to write a pattern in sections with comments:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = 'Python 3' pattern = r'''^(\\w+) # match one or more characters at the beginning of the string \\s* # match zero or more spaces (\\d+)$ # match one or more digits at the end of the string''' l = re.findall(pattern, s, re.VERBOSE) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;('Python', '3')]<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>In this example, the&nbsp;<code>re.VERBOSE<\/code>&nbsp;flag allows us to add spaces and comments to the regular expression to explain each individual rule.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5) The re.ASCII flag example<\/h3>\n\n\n\n<p>The following example matches words with two characters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '\u4f5c\u6cd5 is Pythonic in Japanese' pattern = r'\\b\\w{2}\\b' l = re.findall(pattern, s) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'\u4f5c\u6cd5', 'is', 'in']<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>However, if you use the\u00a0<code>re.ASCII<\/code>\u00a0flag, the matches will contain only ASCII characters:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>import re s = '\u4f5c\u6cd5 is Pythonic in Japanese' pattern = r'\\b\\w{2}\\b' l = re.findall(pattern, s, re.ASCII) print(l)<\/code><small>Code language: Python (python)<\/small><\/code><\/pre>\n\n\n\n<p>Output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><code>&#91;'is', 'in']<\/code><\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Summary: in this tutorial, you\u2019ll learn about the Python regex flags and how they change the behavior of the regex engine for pattern matching. Introduction to the Python regex flags The regular expression functions like\u00a0findall,\u00a0finditer,\u00a0search,\u00a0match,\u00a0split,\u00a0sub, \u2026 have the parameter (flags) that accepts one or more regex flags. Since Python 3.6, regex flags are instances of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[],"class_list":["post-398","post","type-post","status-publish","format-standard","hentry","category-2-python-regex"],"_links":{"self":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/398","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/comments?post=398"}],"version-history":[{"count":1,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/398\/revisions"}],"predecessor-version":[{"id":399,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/posts\/398\/revisions\/399"}],"wp:attachment":[{"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/media?parent=398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/categories?post=398"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learnpython.elegantwallp.com\/wp-json\/wp\/v2\/tags?post=398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}