Skip to content

Unescape regex captures#1446

Merged
namusyaka merged 2 commits into
sinatra:masterfrom
jkowens:fix-1443
Aug 4, 2018
Merged

Unescape regex captures#1446
namusyaka merged 2 commits into
sinatra:masterfrom
jkowens:fix-1443

Conversation

@jkowens

@jkowens jkowens commented Jun 11, 2018

Copy link
Copy Markdown
Member

This adds tests to make sure named parameters and regex captures are unescaped, and also provides a potential fix for #1443. I'm not familiar enough with Mustermann to know if it should handle unescaping captures.

Comment thread lib/sinatra/base.rb Outdated
regexp_exists = pattern.is_a?(Mustermann::Regular) || (pattern.respond_to?(:patterns) && pattern.patterns.any? {|subpattern| subpattern.is_a?(Mustermann::Regular)} )
if regexp_exists
captures = pattern.match(route).captures
captures = pattern.match(route).captures.map { |c| force_encoding URI_INSTANCE.unescape(c) if c }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to get the tests running here, but I think this might not be correct. It seems like force_encoding will set the strings to UTF-8, but you can encode any arbitrary bytes with URI encoding. force_encoding means that we'll end up with strings that are tagged as UTF-8 but aren't actually UTF-8 data.

This problem is actually demonstrated by the test in this PR. \x80 is not valid unicode:

irb(main):005:0> x = '/foo%80/bar%e2%80%8cbaz'
=> "/foo%80/bar%e2%80%8cbaz"
irb(main):006:0> y = URI.unescape x
=> "/foo\x80/bar‌baz"
irb(main):007:0> y.force_encoding "UTF-8"
=> "/foo\x80/bar‌baz"
irb(main):008:0> y.valid_encoding?
=> false

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm good point. I was trying to take into account how things were done in v1.4.8:

https://github.com/sinatra/sinatra/blob/v1.4.8/lib/sinatra/base.rb#L1006

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Good job tracking that down!!

I think just not calling force_encoding is the right thing to do (that's what we're doing in a monkey patch), but it looks like calling force_encoding is what Sinatra used to do, so maybe everyone has written their apps based on that behavior.

I'm not sure what is the best path forward (pun intended but also a true statement).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out URI::Parser#unescape already calls force_encoding, so that was redundant anyway. You're right though, the results are tagged UTF-8 but valid_encoding? returns false.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will call force_encoding, but it uses the encoding of the string that was passed in. When we read the path from the socket, the socket will return a string that is encoded in binary, not utf-8.

For example, a path encoded with UTF-8:

irb(main):002:0> path = "/foo%80/bar%e2%80%8cbaz"
=> "/foo%80/bar%e2%80%8cbaz"
irb(main):003:0> path.encoding
=> #<Encoding:UTF-8>
irb(main):004:0> x = URI.unescape path
=> "/foo\x80/bar‌baz"
irb(main):005:0> x.encoding
=> #<Encoding:UTF-8>
irb(main):006:0> x.valid_encoding?
=> false

Now if the path is binary:

irb(main):007:0> path = "/foo%80/bar%e2%80%8cbaz".b
=> "/foo%80/bar%e2%80%8cbaz"
irb(main):008:0> path.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):009:0> x = URI.unescape path
=> "/foo\x80/bar\xE2\x80\x8Cbaz"
irb(main):010:0> x.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):011:0> x.valid_encoding?
=> true

The test differs from production here because the default encoding of strings inside a file is UTF-8 (so the string literal will be tagged with UTF-8), but the strings we read from the socket will be tagged as binary.

Doing this might be more closely represent production:

get '/foo%80/bar%e2%80%8cbaz'.b

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha!

Comment thread test/routing_test.rb Outdated
it 'unescapes named parameters and splats' do
mock_app {
get '/:foo/*' do |a, b|
assert_equal "foo\x80", params['foo']

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should also do:

assert_predicate params['foo'], :valid_encoding?

I suspect it will fail.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right on, this assertion fails.

@tenderlove

Copy link
Copy Markdown

I think in Rails, we default to force encoding things to UTF-8 since that is the most common case. At GitHub we have to deal with non UTF-8 data, so I added a thing to Rails such that a particular controller / action can override the encoding.

Maybe the right thing to do here is to call force_encoding, but also provide a way to override which encoding gets used on a particular route.

Thanks for looking in to this!

@jkowens

jkowens commented Jun 14, 2018

Copy link
Copy Markdown
Member Author

After a little more digging, I see Sinatra provides an option to set default_encoding for the app:

set :default_encoding,  'ascii-8bit'

This doesn't help though if you only need it for particular routes, I'll try to see what that takes 👍

@tenderlove tenderlove left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, this looks good to me. 😀

@jkowens

jkowens commented Jun 14, 2018

Copy link
Copy Markdown
Member Author

I just realized I was confusing code for handling "regular" routes with regex routes. I think we're probably good on the regex front with this PR, but I realized a relatively recent commit is conflicting a bit:

b058835

I think we'll need to decide on a consistent approach for handling both. @namusyaka any thoughts?

Comment thread test/routing_test.rb
mock_app {
get '/:foo/*' do |a, b|
assert_equal "foo\x80", params['foo']
assert_predicate params['foo'], :valid_encoding?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails because params are force encoded with the default encoding here:

https://github.com/sinatra/sinatra/blob/master/lib/sinatra/base.rb#L1026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just droping the permalink version of that link here as the line it points to might change in the future:

force_encoding(params)

@jkowens

jkowens commented Jun 15, 2018

Copy link
Copy Markdown
Member Author

@namusyaka I don't know all of the history of #1412, but the question has come up if we should try to force encode route params the default_encoding if it potentially results in an invalid encoding?

@namusyaka

Copy link
Copy Markdown
Member

@jkowens Sorry, I'm busy for a few days.
I'm going to review this carefuly next week.

Thanks for your patch.

@namusyaka namusyaka added this to the v2.0.4 milestone Jun 24, 2018

@namusyaka namusyaka left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenderlove @jkowens Thank you for your contributions.
I'm really pleasure to see your discussions and fixes.
The current change set looks good to me.

If you have potential invalid encoding case, please open a new issue.

@namusyaka namusyaka merged commit 9590706 into sinatra:master Aug 4, 2018
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Sep 23, 2018
## 2.0.4 / 2018-09-15

* Don't blow up when passing frozen string to `send_file` disposition [#1137](sinatra/sinatra#1137) by Andrew Selder

* Fix ubygems LoadError [#1436](sinatra/sinatra#1436) by Pavel Rosick�«ò

* Unescape regex captures [#1446](sinatra/sinatra#1446) by Jordan Owens

* Slight performance improvements for IndifferentHash [#1427](sinatra/sinatra#1427) by Mike Pastore

* Improve development support and documentation and source code by Will Yang, Jake Craige, Grey Baker and Guilherme Goettems Schneider

## 2.0.3 / 2018-06-09

* Fix the backports gem regression [#1442](sinatra/sinatra#1442) by Marc-Andr�«± Lafortune

## 2.0.2 / 2018-06-05

* Escape invalid query parameters [#1432](sinatra/sinatra#1432) by Kunpei Sakai
  * The patch fixes [CVE-2018-11627](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11627).

* Fix undefined method error for `Sinatra::RequiredParams` with hash key [#1431](sinatra/sinatra#1431) by Arpit Chauhan

* Add xml content-types to valid html_types for Rack::Protection [#1413](sinatra/sinatra#1413) by Reenan Arbitrario

* Encode route parameters using :default_encoding setting [#1412](sinatra/sinatra#1412) by Brian m. Carlson

* Fix unpredictable behaviour from Sinatra::ConfigFile [#1244](sinatra/sinatra#1244) by John Hope

* Add Sinatra::IndifferentHash#slice [#1405](sinatra/sinatra#1405) by Shota Iguchi

* Remove status code 205 from drop body response [#1398](sinatra/sinatra#1398) by Shota Iguchi

* Ignore empty captures from params [#1390](sinatra/sinatra#1390) by Shota Iguchi

* Improve development support and documentation and source code by Zp Yuan, Andreas Finger, Olle Jonsson, Shota Iguchi, Nikita Bulai and Joshua O'Brien
msk pushed a commit to msk/pkgsrc that referenced this pull request May 11, 2026
## 2.0.4 / 2018-09-15

* Don't blow up when passing frozen string to `send_file` disposition [#1137](sinatra/sinatra#1137) by Andrew Selder

* Fix ubygems LoadError [#1436](sinatra/sinatra#1436) by Pavel Rosick�«ò

* Unescape regex captures [#1446](sinatra/sinatra#1446) by Jordan Owens

* Slight performance improvements for IndifferentHash [#1427](sinatra/sinatra#1427) by Mike Pastore

* Improve development support and documentation and source code by Will Yang, Jake Craige, Grey Baker and Guilherme Goettems Schneider

## 2.0.3 / 2018-06-09

* Fix the backports gem regression [#1442](sinatra/sinatra#1442) by Marc-Andr�«± Lafortune

## 2.0.2 / 2018-06-05

* Escape invalid query parameters [#1432](sinatra/sinatra#1432) by Kunpei Sakai
  * The patch fixes [CVE-2018-11627](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11627).

* Fix undefined method error for `Sinatra::RequiredParams` with hash key [#1431](sinatra/sinatra#1431) by Arpit Chauhan

* Add xml content-types to valid html_types for Rack::Protection [#1413](sinatra/sinatra#1413) by Reenan Arbitrario

* Encode route parameters using :default_encoding setting [#1412](sinatra/sinatra#1412) by Brian m. Carlson

* Fix unpredictable behaviour from Sinatra::ConfigFile [#1244](sinatra/sinatra#1244) by John Hope

* Add Sinatra::IndifferentHash#slice [#1405](sinatra/sinatra#1405) by Shota Iguchi

* Remove status code 205 from drop body response [#1398](sinatra/sinatra#1398) by Shota Iguchi

* Ignore empty captures from params [#1390](sinatra/sinatra#1390) by Shota Iguchi

* Improve development support and documentation and source code by Zp Yuan, Andreas Finger, Olle Jonsson, Shota Iguchi, Nikita Bulai and Joshua O'Brien
jperkin pushed a commit to TritonDataCenter/pkgsrc that referenced this pull request May 14, 2026
## 2.0.4 / 2018-09-15

* Don't blow up when passing frozen string to `send_file` disposition [#1137](sinatra/sinatra#1137) by Andrew Selder

* Fix ubygems LoadError [#1436](sinatra/sinatra#1436) by Pavel Rosick�«ò

* Unescape regex captures [#1446](sinatra/sinatra#1446) by Jordan Owens

* Slight performance improvements for IndifferentHash [#1427](sinatra/sinatra#1427) by Mike Pastore

* Improve development support and documentation and source code by Will Yang, Jake Craige, Grey Baker and Guilherme Goettems Schneider

## 2.0.3 / 2018-06-09

* Fix the backports gem regression [#1442](sinatra/sinatra#1442) by Marc-Andr�«± Lafortune

## 2.0.2 / 2018-06-05

* Escape invalid query parameters [#1432](sinatra/sinatra#1432) by Kunpei Sakai
  * The patch fixes [CVE-2018-11627](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-11627).

* Fix undefined method error for `Sinatra::RequiredParams` with hash key [#1431](sinatra/sinatra#1431) by Arpit Chauhan

* Add xml content-types to valid html_types for Rack::Protection [#1413](sinatra/sinatra#1413) by Reenan Arbitrario

* Encode route parameters using :default_encoding setting [#1412](sinatra/sinatra#1412) by Brian m. Carlson

* Fix unpredictable behaviour from Sinatra::ConfigFile [#1244](sinatra/sinatra#1244) by John Hope

* Add Sinatra::IndifferentHash#slice [#1405](sinatra/sinatra#1405) by Shota Iguchi

* Remove status code 205 from drop body response [#1398](sinatra/sinatra#1398) by Shota Iguchi

* Ignore empty captures from params [#1390](sinatra/sinatra#1390) by Shota Iguchi

* Improve development support and documentation and source code by Zp Yuan, Andreas Finger, Olle Jonsson, Shota Iguchi, Nikita Bulai and Joshua O'Brien
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants