Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize ANSII control characters returned from the server #6916

Merged
merged 4 commits into from Feb 1, 2023

Conversation

samcoe
Copy link
Member

@samcoe samcoe commented Jan 25, 2023

@samcoe samcoe requested a review from a team as a code owner January 25, 2023 17:51
@samcoe samcoe self-assigned this Jan 25, 2023
@samcoe samcoe requested review from mislav and removed request for a team January 25, 2023 17:51
@cliAutomation cliAutomation added this to Needs review 🤔 in The GitHub CLI Jan 25, 2023
@samcoe samcoe requested a review from vilmibm January 25, 2023 18:05
Copy link
Member

@mislav mislav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite elegant! Thanks!

// the values of \u0080 to \u009F represent C1 ASCII control characters. These control
// characters will be interpreted by the terminal, this behaviour can be used maliciously
// as an attack vector, especially the control character \u001B. This function escapes
// all non-printable characters between \u0000 and \u00FF so that the terminal will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. My understanding is that non-printable ASCII characters are in the range \u0000 and \u001F. If we escape every character up until \u00FF, wouldn't that also affect all printable characters in ASCII and beyond?
  2. Would we explicitly allow-list some control ASCII characters such as LF and CR so that they don't get mangled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The server only returns \u string sequence for non-printable characters so we do not need to worry about escaping printable characters.
  2. LF and CR are printable characters and the server does not return them as \u sequences.

@@ -64,6 +68,8 @@ func NewHTTPClient(opts HTTPClientOptions) (*http.Client, error) {
client.Transport = AddAuthTokenHeader(client.Transport, opts.Config)
}

client.Transport = SanitizeASCIIControlCharacters(client.Transport)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this middleware also affect the gh api command? Since gh api is meant for low-level, programmatic use, maybe it would be good if it remained unaffected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does currently affect the gh api command. In my mind I think we should sanitize those requests as well as we want to protect our users throughout our tool. I can envision people copying and pasting gh api commands that request a URL with malicious data in the response. If there is a valid use case for leaving these requests un-sanitized then I am open to it but I have been unable to think of one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it, gh api is meant for scripts, not necessarily to implement presentation. But agreed that flags like --jq and especially --template can be used to format data for the terminal. In those cases it's good if ANSI escape codes are disabled. Maybe it's okay that this middleware sanitizes all those.

var sanitized bytes.Buffer
err = replaceControlCharacters(res.Body, &sanitized)
if err != nil {
err = fmt.Errorf("ascii control characters sanitization error: %w", err)
}
res.Body.Close()
res.Body = io.NopCloser(&sanitized)
Copy link
Member

@mislav mislav Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consequence of this middleware is that by default, all Bodies of every response will first be fully read into a Buffer before the middleware yields control to other middleware or to other response processing code. This prevents any kind of streaming processing of response JSON.

Would it be feasible to replace resp.Body with an io.ReadCloser that does escaping on the fly as it forwards reads to the original response body and returns each result? That way, every response from the server will be immediately available to the code that makes the request without this middleware potentially delaying that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. I will investigate implementing it in that manner.

// mapControlCharacterToCaret maps C0 control sequences to caret notation and
// C1 control sequences to hex notation. C1 control sequences do
// not have caret notation representation.
func mapControlCharacterToCaret(b []byte) ([]byte, bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the bit of added UX by mapping unsafe escape codes to their printable equivalents 👍

@mislav
Copy link
Member

mislav commented Jan 30, 2023

Forgot one comment in my last review: could there be a test that confirms that this middleware will not try to escape already escaped unicode codepoints in JSON? E.g. that \\u001b should be allowed to stay \\u001b without being converted to \\\u001b

@samcoe
Copy link
Member Author

samcoe commented Jan 31, 2023

@mislav I changed the implementation to allow for streaming JSON responses, and I also moved the sanitization code to its own files for clarity.

The test I have does include some escaped sequences as well, let me know if you want me to make it more explicit.

mislav
mislav approved these changes Feb 1, 2023
Copy link
Member

@mislav mislav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement; thank you! Only style nits remain

break
}

if win[0] == 92 && win[1] == 117 && win[2] == 48 && win[3] == 48 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: for clarity, these numbers could be saved into consts and expressed using Go single quotes instead of ASCII codes. For example, '\\' and 'u' should be more clear than 92 and 117, respectively.

However, in this case, maybe bytes.HasPrefix could suffice?

Suggested change
if win[0] == 92 && win[1] == 117 && win[2] == 48 && win[3] == 48 {
if bytes.HasPrefix(win, []byte("\\u00")) {

}

// Close closes the wrapped ReadCloser.
func (s *sanitizeASCIIReadCloser) Close() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since your struct is embedding an io.ReadCloser interface, I don't think you need to explicitly declare this method. Your type should already automatically have a Close() method and forward it to the underlying ReadCloser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the ReadCloser wrapped by sanitizeASCIIReadCloser a name in the struct so Go won't promote the Close method to be on sanitizeASCIIReadCloser. Having said that I think it makes sense to making the ReadCloser actually embedded.

The GitHub CLI automation moved this from Needs review 🤔 to Needs to be merged 🎉 Feb 1, 2023
@samcoe samcoe enabled auto-merge (squash) February 1, 2023 21:08
@samcoe samcoe merged commit ced071f into trunk Feb 1, 2023
10 checks passed
The GitHub CLI automation moved this from Needs to be merged 🎉 to Pending Release 🥚 Feb 1, 2023
@samcoe samcoe deleted the ansi-escaping branch February 1, 2023 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
The GitHub CLI
  
Pending Release 🥚
Development

Successfully merging this pull request may close these issues.

Escape ANSI escape sequences when displaying user-generated content
2 participants