Skip to content

⚡️ Speed up method Parser._generate_expression by 20% in parse.py [codeflash]#194

Closed
KRRT7 wants to merge 2 commits intor1chardj0n3s:masterfrom
KRRT7:master
Closed

⚡️ Speed up method Parser._generate_expression by 20% in parse.py [codeflash]#194
KRRT7 wants to merge 2 commits intor1chardj0n3s:masterfrom
KRRT7:master

Conversation

@KRRT7
Copy link
Copy Markdown

@KRRT7 KRRT7 commented Sep 11, 2024

📄 Parser._generate_expression() in parse.py

📈 Performance improved by 20% (0.20x faster)

⏱️ Runtime went down from 5.39 milliseconds to 4.51 milliseconds

Explanation and details

Sure, let's optimize the code. Your code already does quite a lot, so optimizations will primarily focus on reducing redundant calculations, avoiding unnecessary data structures, and streamlining control flow where possible.

Here’s the revised code with improvements for efficiency.

Optimization Highlights.

  1. Early return with ternary conditional operators: Simplified the conditions for creating the regular expression pattern.
  2. Regex matching to split expressions - Single iteration: The logic to add elements to the pattern list is concise and reduces the number of iterations.
  3. Direct dictionary lookup for type patterns: Avoids multiple if-elif checks.
  4. Use of partition instead of split: Provides a faster mechanism to separate the name and format in the _handle_field method.

This refactoring improves readability and performance by reducing redundant checks and optimizing string operations.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 60 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
# function to test
from __future__ import absolute_import

import re
from decimal import Decimal
from functools import partial

import pytest  # used for our unit tests
from parse import Parser

PARSE_RE = re.compile(r"({{|}}|{[\w-]*(?:\.[\w-]+|\[[^]]+])*(?::[^}]+)?})")

REGEX_SAFETY = re.compile(r"([?\\.[\]()*+^$!|])")

# unit tests
@pytest.mark.parametrize("format_string, expected", [
    # Basic Format Strings
    ("Hello World", r"Hello World"),
    ("Sample text with no fields", r"Sample text with no fields"),
    ("{{}}", r"\{\}"),
    ("{{Hello}}", r"\{Hello\}"),
    ("{name}", r"(?P<name>.+?)"),
    ("{0}", r"(.+?)"),
    ("{name} {age}", r"(?P<name>.+?) (?P<age>.+?)"),
    ("{0} {1}", r"(.+?) (.+?)"),

    # Fields with Format Specifiers
    ("{value:d}", r"(?P<value>\d+)"),
    ("{value:.2f}", r"(?P<value>\d*\.\d+)"),
    ("{value:x}", r"(?P<value>(0[xX])?[0-9a-fA-F]+)"),
    ("{name:s}", r"(?P<name>.+?)"),
    ("{name:20s}", r"(?P<name>.+?)"),
    ("{date:%Y-%m-%d}", r"(?P<date>.+?)"),
    ("{time:%H:%M:%S}", r"(?P<time>.+?)"),

    # Edge Cases
    ("", r""),
    ("{name", r"\{name"),
    ("name}", r"name\}"),
    ("{outer{inner}}", r"\{outer(?P<inner>.+?)\}"),

    # Complex Format Strings
    ("Name: {name}, Age: {age}, Score: {score:.2f}", r"Name: (?P<name>.+?), Age: (?P<age>.+?), Score: (?P<score>\d*\.\d+)"),
    ("Date: {date:%Y-%m-%d}, Time: {time:%H:%M:%S}", r"Date: (?P<date>.+?), Time: (?P<time>.+?)"),
    ("{name} {name}", r"(?P<name>.+?) (?P=name)"),
    ("{0} {0}", r"(.+?) \1"),
    ("{user-name}", r"(?P<user_name>.+?)"),
    ("{user.name}", r"(?P<user__name>.+?)"),

    # Performance and Scalability
    (" ".join(f"{{field{i}}}" for i in range(1000)), r" ".join(f"(?P<field{i}>.+?)" for i in range(1000))),

    # Case Sensitivity
    ("{Name}", r"(?P<Name>.+?)"),
    ("{name}", r"(?P<name>.+?)"),

    # Extra Types
    ("{custom_field:custom_type}", r"(?P<custom_field>.+?)"),

    # Special Characters in Text
    ("Text with special characters: .+*?^$()[]{}", r"Text with special characters: \.\+\*\?\^\$\(\)\[\]\{\}"),
    ("Escaped characters: \\ \\.", r"Escaped characters: \\\\ \\\\."),

    # Alignment and Padding
    ("{name:<10}", r"(?P<name>.+?) *"),
    ("{name:>10}", r" *(.+?)"),
    ("{name:^10}", r" *(.+?) *"),
    ("{name:*>10}", r"\**(.+?)"),
    ("{name:_<10}", r"(?P<name>.+?)_*"),

    # Numeric Specifics
    ("{value:+d}", r"[-+ ]?\d+"),
    ("{value: d}", r"[-+ ]?\d+"),
    ("{value:0=10d}", r"0*\d+"),

    # Rare or Unexpected Edge Cases
    ("{name!me}", r"\{name!me\}"),
    ("{na@me}", r"\{na@me\}"),
    ("{name with spaces}", r"\{name with spaces\}"),
    ("{value:10z}", r"(?P<value>\%z+)"),
    ("{value:10q}", r"(?P<value>\%q+)"),
    ("{outer{inner}}", r"\{outer(?P<inner>.+?)\}"),
    ("{outer{{inner}}}", r"\{outer\{inner\}\}"),
    ("{user-name}", r"(?P<user_name>.+?)"),
    ("{user.name}", r"(?P<user__name>.+?)"),
    ("{}", r"(.+?)"),
    ("{ }", r"(?P< >.+?)"),
    ("{{{name}}", r"\{\{(?P<name>.+?)\}"),
    ("{name}}}", r"(?P<name>.+?)\}"),
    ("{value:!@#}", r"(?P<value>\%#)"),
    ("{value:10.2!}", r"(?P<value>\%!)"),
    ("{value:*>10}", r"\**(.+?)"),
    ("{value:_<10}", r"(?P<value>.+?)_*"),
    ("{value:-10d}", r"(?P<value>\d+)"),
    ("{value:0d}", r"(?P<value>\d+)"),
    ("{date:%Q-%W-%E}", r"(?P<date>.+?)"),
    ("{date:%Y-%m}", r"(?P<date>.+?)"),
    ("{outer{inner{deep}}}", r"\{outer(?P<inner>.+?)\{deep\}\}"),
    ("{json_field:{'key':'value'}}", r"\{json_field:\{'key':'value'\}\}"),
    ("{名前}", r"(?P<名前>.+?)"),
    ("{用户}", r"(?P<用户>.+?)"),
    ("{name\\n}", r"\{name\\n\}"),
    ("{name\\t}", r"\{name\\t\}")
])
def test_generate_expression(format_string, expected):
    parser = Parser(format_string)
    codeflash_output = parser._generate_expression()
    # Outputs were verified to be equal to the original implementation

🔘 (none found) − ⏪ Replay Tests

codeflash-ai bot and others added 2 commits September 10, 2024 07:17
Sure, let's optimize the code. Your code already does quite a lot, so optimizations will primarily focus on reducing redundant calculations, avoiding unnecessary data structures, and streamlining control flow where possible.

Here’s the revised code with improvements for efficiency.



### Optimization Highlights.
1. **Early return with ternary conditional operators**: Simplified the conditions for creating the regular expression pattern.
2. **Regex matching to split expressions - Single iteration**: The logic to add elements to the pattern list is concise and reduces the number of iterations.
3. **Direct dictionary lookup for type patterns**: Avoids multiple `if-elif` checks.
4. **Use of `partition` instead of `split`**: Provides a faster mechanism to separate the name and format in the `_handle_field` method.

This refactoring improves readability and performance by reducing redundant checks and optimizing string operations.
…expression-2024-09-10T07.17.25

⚡️ Speed up method `Parser._generate_expression` by 20% in `parse.py`
@wimglenn
Copy link
Copy Markdown
Collaborator

No thanks.

@wimglenn wimglenn closed this Sep 11, 2024
@KRRT7 KRRT7 mentioned this pull request Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants