Allows finding and replacing http:// with exclusions on URLs that https:// cannot be used.
Today it is important for everything, including static sites, to be over https.
It can be difficult to switch to https:// and then to maintain using https://.
This is the core project that allows finding and replacing http:// URLs.
For additional integrations refer to https://github.com/spring-io/nohttp
The recommended process for determining if it is ok to use http:// is:
-
If
httpsis possible then usehttps -
If you cannot use
https, then consider the following:-
If your project uses the URL to make a request over a network, then you need to use
https -
It is acceptable to use
localhostto make a request usinghttpsince it does not leave the machine -
If you need to test URLs that use
http, then consider using TLD oftest,example,invalid, orlocalhostas defined by rfc2606. -
Links that users click on should prefer
https, but if the site does not supporthttpsyou may decide to allow the URL -
If the link is an XML namespace name (which is just an identifier), then you can use
http. The XML namespace location should still behttps.
-
There are times when URLs cannot use https:// that are beyond our control.
Fortunately, nohttp provides a default allowlist and allowing additional URLs.
The default allowlist allows HTTP urls that impact these primary categories:
-
localhost
-
URLs that use a TLD defined in https://tools.ietf.org/html/rfc2606 (i.e. tld of test, .example, invalid, localhost)
-
XML Namespace names (not the locations)
-
Java specific URLs that do not work over
http. For example, Java Properties hard codes usinghttp.
We will not add an entry to allow for arbitrary sites that do not support https.
The reasoning is that context is important.
A specific project may provide a link to a site that does not support https.
Ideally that site is updated to support https, but it is not considered a vulnerability.
This means that the URL could be added to your custom allowlist.
However, another project may be downloading from the same site that does not support https.
In this case it is a vulnerability and needs to be fixed.
Since nohttp cannot understand the context, it will not allow arbitrary sites that do not support https.
This project provides a default allowlist. However, other projects may end up with their own usecases. Fortunately, nohttp supports custom allowlist as well.
You can invoke RegexHttpMatcher.addHttpUrlAllowlist(Predicate<String>) to add additional rules to determine if a URL is allowed or not.
The input to the Predicate is the URL that was found and should be checked as allowed or not.
The simplest way to add custom allowlist is to use RegexPredicate.createAllowlist(InputStream).
The format of the InputStream is defined as:
-
Each line contains a regular expression that should be allowed
-
Lines can begin with
//to create a comment within the file -
Lines are trimmed for whitespace
-
Lines that are empty are ignored
For example:
// Ignore Maven XML Namespace id of http://maven.apache.org/POM/4.0.0
^http://maven\.apache\.org/POM/4.0.0$
// Allow Company XML namespace names but not the locations (which end in .xsd)
^http://mycompany.test/xml/.*(?<!\.(xsd))$