Unescape regex captures#1446
Conversation
| regexp_exists = pattern.is_a?(Mustermann::Regular) || (pattern.respond_to?(:patterns) && pattern.patterns.any? {|subpattern| subpattern.is_a?(Mustermann::Regular)} ) | ||
| if regexp_exists | ||
| captures = pattern.match(route).captures | ||
| captures = pattern.match(route).captures.map { |c| force_encoding URI_INSTANCE.unescape(c) if c } |
There was a problem hiding this comment.
I can't seem to get the tests running here, but I think this might not be correct. It seems like force_encoding will set the strings to UTF-8, but you can encode any arbitrary bytes with URI encoding. force_encoding means that we'll end up with strings that are tagged as UTF-8 but aren't actually UTF-8 data.
This problem is actually demonstrated by the test in this PR. \x80 is not valid unicode:
irb(main):005:0> x = '/foo%80/bar%e2%80%8cbaz'
=> "/foo%80/bar%e2%80%8cbaz"
irb(main):006:0> y = URI.unescape x
=> "/foo\x80/barbaz"
irb(main):007:0> y.force_encoding "UTF-8"
=> "/foo\x80/barbaz"
irb(main):008:0> y.valid_encoding?
=> false
There was a problem hiding this comment.
Hm good point. I was trying to take into account how things were done in v1.4.8:
https://github.com/sinatra/sinatra/blob/v1.4.8/lib/sinatra/base.rb#L1006
There was a problem hiding this comment.
Wow! Good job tracking that down!!
I think just not calling force_encoding is the right thing to do (that's what we're doing in a monkey patch), but it looks like calling force_encoding is what Sinatra used to do, so maybe everyone has written their apps based on that behavior.
I'm not sure what is the best path forward (pun intended but also a true statement).
There was a problem hiding this comment.
It turns out URI::Parser#unescape already calls force_encoding, so that was redundant anyway. You're right though, the results are tagged UTF-8 but valid_encoding? returns false.
There was a problem hiding this comment.
Yes, it will call force_encoding, but it uses the encoding of the string that was passed in. When we read the path from the socket, the socket will return a string that is encoded in binary, not utf-8.
For example, a path encoded with UTF-8:
irb(main):002:0> path = "/foo%80/bar%e2%80%8cbaz"
=> "/foo%80/bar%e2%80%8cbaz"
irb(main):003:0> path.encoding
=> #<Encoding:UTF-8>
irb(main):004:0> x = URI.unescape path
=> "/foo\x80/barbaz"
irb(main):005:0> x.encoding
=> #<Encoding:UTF-8>
irb(main):006:0> x.valid_encoding?
=> false
Now if the path is binary:
irb(main):007:0> path = "/foo%80/bar%e2%80%8cbaz".b
=> "/foo%80/bar%e2%80%8cbaz"
irb(main):008:0> path.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):009:0> x = URI.unescape path
=> "/foo\x80/bar\xE2\x80\x8Cbaz"
irb(main):010:0> x.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):011:0> x.valid_encoding?
=> true
The test differs from production here because the default encoding of strings inside a file is UTF-8 (so the string literal will be tagged with UTF-8), but the strings we read from the socket will be tagged as binary.
Doing this might be more closely represent production:
get '/foo%80/bar%e2%80%8cbaz'.b
| it 'unescapes named parameters and splats' do | ||
| mock_app { | ||
| get '/:foo/*' do |a, b| | ||
| assert_equal "foo\x80", params['foo'] |
There was a problem hiding this comment.
Here we should also do:
assert_predicate params['foo'], :valid_encoding?I suspect it will fail.
There was a problem hiding this comment.
You were right on, this assertion fails.
|
I think in Rails, we default to force encoding things to UTF-8 since that is the most common case. At GitHub we have to deal with non UTF-8 data, so I added a thing to Rails such that a particular controller / action can override the encoding. Maybe the right thing to do here is to call Thanks for looking in to this! |
|
After a little more digging, I see Sinatra provides an option to set This doesn't help though if you only need it for particular routes, I'll try to see what that takes 👍 |
tenderlove
left a comment
There was a problem hiding this comment.
For what it's worth, this looks good to me. 😀
|
I just realized I was confusing code for handling "regular" routes with regex routes. I think we're probably good on the regex front with this PR, but I realized a relatively recent commit is conflicting a bit: I think we'll need to decide on a consistent approach for handling both. @namusyaka any thoughts? |
| mock_app { | ||
| get '/:foo/*' do |a, b| | ||
| assert_equal "foo\x80", params['foo'] | ||
| assert_predicate params['foo'], :valid_encoding? |
There was a problem hiding this comment.
This fails because params are force encoded with the default encoding here:
https://github.com/sinatra/sinatra/blob/master/lib/sinatra/base.rb#L1026
There was a problem hiding this comment.
Just droping the permalink version of that link here as the line it points to might change in the future:
Line 1026 in 1b35eaa
|
@namusyaka I don't know all of the history of #1412, but the question has come up if we should try to force encode route params the default_encoding if it potentially results in an invalid encoding? |
|
@jkowens Sorry, I'm busy for a few days. Thanks for your patch. |
namusyaka
left a comment
There was a problem hiding this comment.
@tenderlove @jkowens Thank you for your contributions.
I'm really pleasure to see your discussions and fixes.
The current change set looks good to me.
If you have potential invalid encoding case, please open a new issue.
## 2.0.4 / 2018-09-15 * Don't blow up when passing frozen string to `send_file` disposition [#1137](sinatra/sinatra#1137) by Andrew Selder * Fix ubygems LoadError [#1436](sinatra/sinatra#1436) by Pavel Rosick�«ò * Unescape regex captures [#1446](sinatra/sinatra#1446) by Jordan Owens * Slight performance improvements for IndifferentHash [#1427](sinatra/sinatra#1427) by Mike Pastore * Improve development support and documentation and source code by Will Yang, Jake Craige, Grey Baker and Guilherme Goettems Schneider ## 2.0.3 / 2018-06-09 * Fix the backports gem regression [#1442](sinatra/sinatra#1442) by Marc-Andr�«± Lafortune ## 2.0.2 / 2018-06-05 * Escape invalid query parameters [#1432](sinatra/sinatra#1432) by Kunpei Sakai * The patch fixes [CVE-2018-11627](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11627). * Fix undefined method error for `Sinatra::RequiredParams` with hash key [#1431](sinatra/sinatra#1431) by Arpit Chauhan * Add xml content-types to valid html_types for Rack::Protection [#1413](sinatra/sinatra#1413) by Reenan Arbitrario * Encode route parameters using :default_encoding setting [#1412](sinatra/sinatra#1412) by Brian m. Carlson * Fix unpredictable behaviour from Sinatra::ConfigFile [#1244](sinatra/sinatra#1244) by John Hope * Add Sinatra::IndifferentHash#slice [#1405](sinatra/sinatra#1405) by Shota Iguchi * Remove status code 205 from drop body response [#1398](sinatra/sinatra#1398) by Shota Iguchi * Ignore empty captures from params [#1390](sinatra/sinatra#1390) by Shota Iguchi * Improve development support and documentation and source code by Zp Yuan, Andreas Finger, Olle Jonsson, Shota Iguchi, Nikita Bulai and Joshua O'Brien
## 2.0.4 / 2018-09-15 * Don't blow up when passing frozen string to `send_file` disposition [#1137](sinatra/sinatra#1137) by Andrew Selder * Fix ubygems LoadError [#1436](sinatra/sinatra#1436) by Pavel Rosick�«ò * Unescape regex captures [#1446](sinatra/sinatra#1446) by Jordan Owens * Slight performance improvements for IndifferentHash [#1427](sinatra/sinatra#1427) by Mike Pastore * Improve development support and documentation and source code by Will Yang, Jake Craige, Grey Baker and Guilherme Goettems Schneider ## 2.0.3 / 2018-06-09 * Fix the backports gem regression [#1442](sinatra/sinatra#1442) by Marc-Andr�«± Lafortune ## 2.0.2 / 2018-06-05 * Escape invalid query parameters [#1432](sinatra/sinatra#1432) by Kunpei Sakai * The patch fixes [CVE-2018-11627](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11627). * Fix undefined method error for `Sinatra::RequiredParams` with hash key [#1431](sinatra/sinatra#1431) by Arpit Chauhan * Add xml content-types to valid html_types for Rack::Protection [#1413](sinatra/sinatra#1413) by Reenan Arbitrario * Encode route parameters using :default_encoding setting [#1412](sinatra/sinatra#1412) by Brian m. Carlson * Fix unpredictable behaviour from Sinatra::ConfigFile [#1244](sinatra/sinatra#1244) by John Hope * Add Sinatra::IndifferentHash#slice [#1405](sinatra/sinatra#1405) by Shota Iguchi * Remove status code 205 from drop body response [#1398](sinatra/sinatra#1398) by Shota Iguchi * Ignore empty captures from params [#1390](sinatra/sinatra#1390) by Shota Iguchi * Improve development support and documentation and source code by Zp Yuan, Andreas Finger, Olle Jonsson, Shota Iguchi, Nikita Bulai and Joshua O'Brien
## 2.0.4 / 2018-09-15 * Don't blow up when passing frozen string to `send_file` disposition [#1137](sinatra/sinatra#1137) by Andrew Selder * Fix ubygems LoadError [#1436](sinatra/sinatra#1436) by Pavel Rosick�«ò * Unescape regex captures [#1446](sinatra/sinatra#1446) by Jordan Owens * Slight performance improvements for IndifferentHash [#1427](sinatra/sinatra#1427) by Mike Pastore * Improve development support and documentation and source code by Will Yang, Jake Craige, Grey Baker and Guilherme Goettems Schneider ## 2.0.3 / 2018-06-09 * Fix the backports gem regression [#1442](sinatra/sinatra#1442) by Marc-Andr�«± Lafortune ## 2.0.2 / 2018-06-05 * Escape invalid query parameters [#1432](sinatra/sinatra#1432) by Kunpei Sakai * The patch fixes [CVE-2018-11627](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11627). * Fix undefined method error for `Sinatra::RequiredParams` with hash key [#1431](sinatra/sinatra#1431) by Arpit Chauhan * Add xml content-types to valid html_types for Rack::Protection [#1413](sinatra/sinatra#1413) by Reenan Arbitrario * Encode route parameters using :default_encoding setting [#1412](sinatra/sinatra#1412) by Brian m. Carlson * Fix unpredictable behaviour from Sinatra::ConfigFile [#1244](sinatra/sinatra#1244) by John Hope * Add Sinatra::IndifferentHash#slice [#1405](sinatra/sinatra#1405) by Shota Iguchi * Remove status code 205 from drop body response [#1398](sinatra/sinatra#1398) by Shota Iguchi * Ignore empty captures from params [#1390](sinatra/sinatra#1390) by Shota Iguchi * Improve development support and documentation and source code by Zp Yuan, Andreas Finger, Olle Jonsson, Shota Iguchi, Nikita Bulai and Joshua O'Brien
This adds tests to make sure named parameters and regex captures are unescaped, and also provides a potential fix for #1443. I'm not familiar enough with Mustermann to know if it should handle unescaping captures.