Skip to content

Add more robust support for HTML5 anchor tags.#16

Merged
k3a merged 1 commit into
k3a:masterfrom
xStrom:anchor
May 10, 2023
Merged

Add more robust support for HTML5 anchor tags.#16
k3a merged 1 commit into
k3a:masterfrom
xStrom:anchor

Conversation

@xStrom

@xStrom xStrom commented May 4, 2023

Copy link
Copy Markdown
Contributor

This PR makes the anchor tag regex more robust so it can handle more of the HTML5 spec.

Old   a.*href=('([^']*?)'|"([^"]*?)")
New   ^(?i:a)(?:$|\s).*(?i:href)\s*=\s*('([^']*?)'|"([^"]*?)"|([^\s"'`=<>]+))

To break down the changes, in order:

Regex Reasoning
^ Check that the tag actually starts with the letter a, as opposed to say in the case of head.
(?i:a) Tags and attribute names are case insensitive, so we need a case-insensetive check.
(?:$|\s) We want only the letter a and not say article, so we check for the end the same way as badTagnamesRE.
(?i:href) Case-insensetive check for attribute names as well.
\s*=\s* The equals sign can be surrounded by zero or more spaces.
|([^\s"'`=<>]+) Attribute values don't have to be enclosed in quotes if they follow certain rules.

@k3a

k3a commented May 10, 2023

Copy link
Copy Markdown
Owner

Thanks for the nice PR! I will make other tag regexpes case-insensitive as well in the followup commits.

@k3a k3a merged commit a58537e into k3a:master May 10, 2023
@xStrom xStrom deleted the anchor branch May 10, 2023 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants