Skip to content

Ensure that angle brackets in pyscript tag are escaped before parsing#684

Merged
madhur-tandon merged 3 commits into
mainfrom
angle_bracket_escape
Aug 16, 2022
Merged

Ensure that angle brackets in pyscript tag are escaped before parsing#684
madhur-tandon merged 3 commits into
mainfrom
angle_bracket_escape

Conversation

@philippjfr

@philippjfr philippjfr commented Aug 12, 2022

Copy link
Copy Markdown
Contributor

Without escaping angle brackets (< and >) the DOMParser will strip out anything that looks like an HTML tag.

  • Add test

@philippjfr philippjfr added the type: bug Something isn't working label Aug 12, 2022
Comment thread pyscriptjs/tests/test_01_basic.py Outdated
Co-authored-by: James A. Bednar <jbednar@users.noreply.github.com>
@philippjfr philippjfr requested a review from fpliger August 13, 2022 08:55
@madhur-tandon madhur-tandon merged commit 8275aa2 into main Aug 16, 2022
@madhur-tandon madhur-tandon deleted the angle_bracket_escape branch August 16, 2022 16:11
Comment thread pyscriptjs/src/utils.ts
function escape(str: string): string {
return str.replace(/</g, "&lt;").replace(/>/g, "&gt;")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should escape more?
I'm not an expert in the field, but a quick googling found this:
https://stackoverflow.com/a/6234804

I guess we should probably escape ', " and & as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was indeed quite conservative here, however I think < and > may indeed be special in the regard that they absolutely break the parser while the others are generally parsed correctly. Might be best to simply write some tests to confirm.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, imagine the following code:

<py-script>
js.console.info("a &amp; b");
</py-script>

I would expect it to print literally a &amp; b, what it actually prints is a & b.
And if you try to print "a &quot b" is even worse, because it is parsed as a quote " and so python read "a " b", which results in a python SyntaxError.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed bad, sounds like we actually have to unescape those HTML entities.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm right, for those it's the opposite direction.
Btw, I just checked what JS does:

<script>
    console.info("a &amp; b");
</script>

prints a &amp; b, so we should probably do the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: bug Something isn't working

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants