New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize ANSII control characters returned from the server #6916
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite elegant! Thanks!
api/http_client.go
Outdated
| // the values of \u0080 to \u009F represent C1 ASCII control characters. These control | ||
| // characters will be interpreted by the terminal, this behaviour can be used maliciously | ||
| // as an attack vector, especially the control character \u001B. This function escapes | ||
| // all non-printable characters between \u0000 and \u00FF so that the terminal will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- My understanding is that non-printable ASCII characters are in the range \u0000 and \u001F. If we escape every character up until \u00FF, wouldn't that also affect all printable characters in ASCII and beyond?
- Would we explicitly allow-list some control ASCII characters such as LF and CR so that they don't get mangled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The server only returns
\ustring sequence for non-printable characters so we do not need to worry about escaping printable characters. - LF and CR are printable characters and the server does not return them as
\usequences.
api/http_client.go
Outdated
| @@ -64,6 +68,8 @@ func NewHTTPClient(opts HTTPClientOptions) (*http.Client, error) { | |||
| client.Transport = AddAuthTokenHeader(client.Transport, opts.Config) | |||
| } | |||
|
|
|||
| client.Transport = SanitizeASCIIControlCharacters(client.Transport) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this middleware also affect the gh api command? Since gh api is meant for low-level, programmatic use, maybe it would be good if it remained unaffected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does currently affect the gh api command. In my mind I think we should sanitize those requests as well as we want to protect our users throughout our tool. I can envision people copying and pasting gh api commands that request a URL with malicious data in the response. If there is a valid use case for leaving these requests un-sanitized then I am open to it but I have been unable to think of one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see it, gh api is meant for scripts, not necessarily to implement presentation. But agreed that flags like --jq and especially --template can be used to format data for the terminal. In those cases it's good if ANSI escape codes are disabled. Maybe it's okay that this middleware sanitizes all those.
api/http_client.go
Outdated
| var sanitized bytes.Buffer | ||
| err = replaceControlCharacters(res.Body, &sanitized) | ||
| if err != nil { | ||
| err = fmt.Errorf("ascii control characters sanitization error: %w", err) | ||
| } | ||
| res.Body.Close() | ||
| res.Body = io.NopCloser(&sanitized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One consequence of this middleware is that by default, all Bodies of every response will first be fully read into a Buffer before the middleware yields control to other middleware or to other response processing code. This prevents any kind of streaming processing of response JSON.
Would it be feasible to replace resp.Body with an io.ReadCloser that does escaping on the fly as it forwards reads to the original response body and returns each result? That way, every response from the server will be immediately available to the code that makes the request without this middleware potentially delaying that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea. I will investigate implementing it in that manner.
api/http_client.go
Outdated
| // mapControlCharacterToCaret maps C0 control sequences to caret notation and | ||
| // C1 control sequences to hex notation. C1 control sequences do | ||
| // not have caret notation representation. | ||
| func mapControlCharacterToCaret(b []byte) ([]byte, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the bit of added UX by mapping unsafe escape codes to their printable equivalents
|
Forgot one comment in my last review: could there be a test that confirms that this middleware will not try to escape already escaped unicode codepoints in JSON? E.g. that |
|
@mislav I changed the implementation to allow for streaming JSON responses, and I also moved the sanitization code to its own files for clarity. The test I have does include some escaped sequences as well, let me know if you want me to make it more explicit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement; thank you! Only style nits remain
api/sanitize_ascii.go
Outdated
| break | ||
| } | ||
|
|
||
| if win[0] == 92 && win[1] == 117 && win[2] == 48 && win[3] == 48 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: for clarity, these numbers could be saved into consts and expressed using Go single quotes instead of ASCII codes. For example, '\\' and 'u' should be more clear than 92 and 117, respectively.
However, in this case, maybe bytes.HasPrefix could suffice?
| if win[0] == 92 && win[1] == 117 && win[2] == 48 && win[3] == 48 { | |
| if bytes.HasPrefix(win, []byte("\\u00")) { |
api/sanitize_ascii.go
Outdated
| } | ||
|
|
||
| // Close closes the wrapped ReadCloser. | ||
| func (s *sanitizeASCIIReadCloser) Close() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: since your struct is embedding an io.ReadCloser interface, I don't think you need to explicitly declare this method. Your type should already automatically have a Close() method and forward it to the underlying ReadCloser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave the ReadCloser wrapped by sanitizeASCIIReadCloser a name in the struct so Go won't promote the Close method to be on sanitizeASCIIReadCloser. Having said that I think it makes sense to making the ReadCloser actually embedded.
Fixes https://github.com/github/cli/issues/149
Fixes #932