Skip to content

Windows: Detect scheme presence in proxy URLs more robustly#268

Merged
janbrummer merged 1 commit intolibproxy:mainfrom
davids-work:windows-scheme-detection
Jan 23, 2024
Merged

Windows: Detect scheme presence in proxy URLs more robustly#268
janbrummer merged 1 commit intolibproxy:mainfrom
davids-work:windows-scheme-detection

Conversation

@davids-work
Copy link
Copy Markdown
Contributor

Windows proxy servers can be set either with or without a scheme where without implies http, and with or without a port where without implies the default port for the scheme.

Unfortunately, g_uri_parse_scheme("proxy.example.com:3128") returns "proxy.example.com" rather than NULL (while
g_uri_parse_scheme("1.2.3.4:3128") returns NULL), meaning that such proxy configurations did not get http:// prepended to them, causing e.g. GProxy to reject them.

With this change we simply search for the presence of :// in the URI instead. While a URI technically doesn't have to contain // after the scheme separator, such URIs should not occur in this context

Windows proxy servers can be set either with or without a scheme where without
implies http, and with or without a port where without implies the
default port for the scheme.

Unfortunately, g_uri_parse_scheme("proxy.example.com:3128") returns
"proxy.example.com" rather than NULL (while
g_uri_parse_scheme("1.2.3.4:3128") returns NULL), meaning that such
proxy configurations did not get http:// prepended to them, causing
e.g. GProxy to reject them.

With this change we simply search for the presence of :// in the URI
instead. While a URI technically doesn't have to contain // after
the scheme separator, such URIs should not occur in this context
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (01b8a24) 73.19% compared to head (d78662e) 73.19%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #268   +/-   ##
=======================================
  Coverage   73.19%   73.19%           
=======================================
  Files          16       16           
  Lines         843      843           
  Branches      238      238           
=======================================
  Hits          617      617           
  Misses        131      131           
  Partials       95       95           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@janbrummer
Copy link
Copy Markdown
Contributor

Sounds more like a bug which needs to be fixed in glib then.

@janbrummer
Copy link
Copy Markdown
Contributor

@mcatanzaro This looks really like a bug in glib as it assumes that the port is the real hostname and everything before is the scheme. I'm not sure whether it should return NULL in both cases or it should default to http?

@janbrummer
Copy link
Copy Markdown
Contributor

FTR: The error is in g_uri_scheme_length

@mcatanzaro
Copy link
Copy Markdown
Contributor

This looks really like a bug in glib as it assumes that the port is the real hostname and everything before is the scheme. I'm not sure whether it should return NULL in both cases or it should default to http?

I'm not too familiar with the differences between URIs and URLs, but I suspect everything before the first : is the scheme. Certainly the scheme is not an optional URI component, and without it you don't have a URI that you can pass to GUri.

I would prepend the http:// manually if desired. E.g. Epiphany does this in ephy_embed_utils_normalize_address().

@davids-work
Copy link
Copy Markdown
Contributor Author

I think you could argue that the behavior of glib here is technically correct according to RFC 3986. Key parts:

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]`
hier-part   = "//" authority path-abempty
                  / path-absolute
                  / path-rootless
                  / path-empty

The scheme and path components are required, though the path may be
empty (no characters).

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

Thus I think a valid interpretation of proxy.example.com:3128 is scheme=proxy.example.com, path=3128. It's certainly not intuitive though. g_uri_scheme_length also has a restriction that a scheme can't start with a numeric character, which explains the different behavior for 1.2.3.4:3128. I can't find justification for this in the RFC, but I might be missing something.

As a comparison, here's what the urlparse function from Python's standard library does:

>>> urllib.parse.urlparse("proxy.example.com:3128")
ParseResult(scheme='proxy.example.com', netloc='', path='3128', params='', query='', fragment='')
>>> urllib.parse.urlparse("1.2.3.4:3128")
ParseResult(scheme='1.2.3.4', netloc='', path='3128', params='', query='', fragment='')

As @mcatanzaro says, a string without a scheme is not a valid URI, thus using a URI parsing function to detect the presence/absence of a scheme was probably not the best choice (mea culpa).

@janbrummer
Copy link
Copy Markdown
Contributor

Good research! I'm merging this one. Thanks for your support.

@janbrummer janbrummer merged commit f1a1c30 into libproxy:main Jan 23, 2024
@davids-work davids-work deleted the windows-scheme-detection branch February 14, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants