Skip to content

[SECURITY][SONAR][LOW]: ReDoS vulnerability in plugin regex patterns #2370

@crivetimihai

Description

@crivetimihai

Severity: LOW
Files:

  • plugins/html_to_markdown/html_to_markdown.py (lines 74, 76)
  • plugins/robots_license_guard/robots_license_guard.py (line 37)
    Rule: Security - ReDoS

Description

Several plugin regex patterns use constructs that could cause polynomial backtracking on malformed input.

Vulnerable Code

html_to_markdown.py:

# Line 74 - Link extraction
text = re.sub(r"<a[^>]*href=\"([^\"]+)\"[^>]*>(.*?)</a>", ..., text, flags=re.IGNORECASE | re.DOTALL)

# Line 76 - Image extraction
text = re.sub(r"<img[^>]*alt=\"([^\"]*)\"[^>]*src=\"([^\"]+)\"[^>]*>", ..., text, flags=re.IGNORECASE)

robots_license_guard.py:

# Line 37 - Meta tag extraction
META_PATTERN = re.compile(
    r"<meta\s+[^>]*name=\"(?P<name>robots|x-robots-tag|genai|permissions-policy|license)\"[^>]*content=\"(?P<content>[^\"]+)\"[^>]*>",
    re.IGNORECASE,
)

Impact

Lower risk than the validators.py ReDoS because:

  • Input is HTML from resource fetches, not direct user validation
  • Patterns use bounded character classes [^>]* instead of unbounded .*
  • Non-greedy .*? limits worst-case backtracking
  • Plugin context means fewer attack vectors

However, a malicious MCP server could return crafted HTML to cause CPU spikes during resource transformation.

Suggested Fix

Use more specific patterns or restructure to avoid multiple [^>]* segments.

Metadata

Metadata

Assignees

Labels

MUSTP1: Non-negotiable, critical requirements without which the product is non-functional or unsafepluginspythonPython / backend development (FastAPI)securityImproves securitysonarSonarQube code quality findings

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions