Handle paths with spaces and hashes especially with nested JARs#805
Merged
lukehutch merged 1 commit intoNov 2, 2023
Merged
Conversation
2246990 to
703a0d5
Compare
With nested jars there are two different mechanisms that will be used as the path is not usable as a `java.nio.file.Path` instance. The first is trying to convert the resulting nested path - a path like `jar:file:....!/some/nested/path` - to a `URL` and if that should fail due to a `MalformedURLException` it is tried to convert the path to `URI`. If the URI fallback fails an IOException will be thrown and this eventually will bubble up and discard the whole classpath entry, resulting in a message like the following when enabling verbose output during scanning: ``` 2023-11-02T12:51:42.719+0100 ClassGraph -- Skipping invalid classpath entry .../spring-boot-fully-executable-jar.jar!/BOOT-INF/lib/... : java.io.IOException: Malformed URI: ... ``` Most of the time nothing will be discarded as most paths can be converted to a URL in the first step or at least succeed when converting to a URI. However for paths containing spaces and the hash symbol we can reach a case where both URL conversion and URI conversion fail and so the classpath entry is discarded even though all paths are valid and can be usable. Let us assume a Spring Boot Executable JAR that is located in a directory named `ci-build main classgraph#123` - which is a valid directory name on Windows and Linux. When ClassGraph reaches a nested library here it will construct the paths to the nested jars like `jar:file:<path>!/<nested-path>`. So in this case we end up with something like `jar:file:/opt/ci-build main classgraph#123!/BOOT-INF/lib/my-lib.jar`. When ClassGraph reaches the conversion code it will first try to convert to a URL. This will fail with the following message: `java.net.MalformedURLException: no !/ in spec` If we then fallback to the URI conversion it will try to convert but as our path contains spaces this will also be rejected by an exception: `java.net.URISyntaxException: Illegal character in opaque part at index 66: jar:file:...` The index will point to the first space in the path that is converted. So we can construct nested paths that are neither valid `URL` instances nor valid `URI instances`. To solve this issue we introduce encoding for spaces when the path is handled as a url or multi-section path to ensure that conversion can succeed. This seems to also be what the `java.nio.file.Path` API does when asking for the resulting URI for the same path. So this commit encodes spaces as `%20` and hash symbols as `%23` when going into the URL/Multi-Section branch. Fixes classgraph#804
703a0d5 to
72c52de
Compare
Member
|
I really appreciate your detailed analysis on this! The change looks good to me. Thank you! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With nested jars there are two different mechanisms that will be used as the path is not usable as a
java.nio.file.Pathinstance.The first is trying to convert the resulting nested path - a path like
jar:file:....!/some/nested/path- to aURLand if that should fail due to aMalformedURLExceptionit is tried to convert the path toURI. If the URI fallback fails an IOException will be thrown and this eventually will bubble up and discard the whole classpath entry, resulting in a message like the following when enabling verbose output during scanning:Most of the time nothing will be discarded as most paths can be converted to a URL in the first step or at least succeed when converting to a URI.
However for paths containing spaces and the hash symbol we can reach a case where both URL conversion and URI conversion fail and so the classpath entry is discarded even though all paths are valid and can be usable.
Let us assume a Spring Boot Executable JAR that is located in a directory named
ci-build main #123- which is a valid directory name on Windows and Linux.When ClassGraph reaches a nested library here it will construct the paths to the nested jars like
jar:file:<path>!/<nested-path>.So in this case we end up with something like
jar:file:/opt/ci-build main #123!/BOOT-INF/lib/my-lib.jar.When ClassGraph reaches the conversion code it will first try to convert to a URL. This will fail with the following message:
java.net.MalformedURLException: no !/ in specIf we then fallback to the URI conversion it will try to convert but as our path contains spaces this will also be rejected by an exception:
java.net.URISyntaxException: Illegal character in opaque part at index 66: jar:file:...The index will point to the first space in the path that is converted.
So we can construct nested paths that are neither valid
URLinstances nor validURI instances.To solve this issue we introduce encoding for spaces when the path is handled as a url or multi-section path to ensure that conversion can succeed. This seems to also be what the
java.nio.file.PathAPI does when asking for the resulting URI for the same path.So this commit encodes spaces as
%20and hash symbols as%23when going into the URL/Multi-Section branch.Fixes #804