Skip to content

Fix: map license URLs to SPDX IDs for machine readable format#4244

Closed
Avadhut03 wants to merge 2 commits intoanchore:mainfrom
Avadhut03:fix-license-url
Closed

Fix: map license URLs to SPDX IDs for machine readable format#4244
Avadhut03 wants to merge 2 commits intoanchore:mainfrom
Avadhut03:fix-license-url

Conversation

@Avadhut03
Copy link

This PR fixes an issue in Syft where Java project licenses with URLs were not properly mapped to SPDX license IDs.
Currently, multiple or even single license URLs were being reported as LicenseRef-http---... instead of their proper SPDX identifiers, making the output machine-unreadable.

With this change:

License URLs such as http://www.eclipse.org/legal/epl-v10.html are now correctly mapped to EPL-1.0.

Deprecated or older license URLs like http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html are mapped to LGPL-2.1-only.

This ensures the licenseDeclared and licenseConcluded fields in SPDX and CycloneDX outputs are properly machine-readable.

This addresses the issues reported when analyzing Java dependencies in projects such as spring-petclinic.

Fixes #4233

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have added unit tests for LicenseByURL covering the new URL mappings

  • I have tested the changes in common scenarios (Java Maven projects with single/multiple license URLs)

Signed-off-by: Avadhut03 <avadhutkul60@gmail.com>
@spiffcs
Copy link
Contributor

spiffcs commented Sep 26, 2025

Thanks for the PR @Avadhut03! I think we need this to be in a separate area since internal/spdxlicense/license_list.go is a generated file.

I think I'm open to having two maps here. One generated from the official SPDX source and the other contributed by users who see areas where we can map the URL and get better license answers.

cc @wagoodman for when he get's back to get a +1 on adding a maintainer map that we merge with the generated SPDX map on compile for one single lookup

@Avadhut03
Copy link
Author

Thanks for the feedback @spiffcs. That makes sense. I can update the PR to add a separate map for maintainer/user-contributed URLs and merge it with the generated SPDX map during compile time. Will wait for @wagoodman’s thoughts as well before making the changes.

@spiffcs spiffcs self-assigned this Oct 6, 2025
@spiffcs spiffcs added this to OSS Oct 20, 2025
@spiffcs spiffcs moved this to In Progress in OSS Oct 20, 2025
@spiffcs spiffcs moved this from In Progress to In Review in OSS Oct 20, 2025
@wagoodman wagoodman changed the title Fix: #4233 map license URLs to SPDX IDs for machine readable format Fix: map license URLs to SPDX IDs for machine readable format Oct 20, 2025
var urlToLicense = map[string]string{
"ftp://ftp.tin.org/pub/news/utils/newsx/newsx-1.6.tar.gz": "Zeeff",
"http://apache.org/licenses/LICENSE-1.1": "Apache-1.1",
"http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html": "LGPL-2.1-only",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SPDX license list is really the source of truth for these kinds of changes. If they don't want to accept a contribution for these URLs then we can update this code to account for manual adjustments, but we couldn't take this as-is since it's manually updating a file that is automatically generated.

We pull in the SPDX license list when generating this code, which is maintained in a github repo and gets regular release. It looks like adding these URLs would be a small update to an XML file and this kind of change looks to be regularly accepted.

That way once your URL enhancements are accepted and released in the SPDX license list it would flow downstream to us (and other users of this list, which there are a lot of, get the benefit too).

if name == "" && url == "" {
continue
}
if licInfo, ok := spdxlicense.LicenseByURL(url); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this is already covered here, but I might be missing a nuance with the caller.

@whereIsMyDipp
Copy link

Hi @Avadhut03 ! Are you going to continue on this pr? Just aksing as your last comment was some time ago. We could also offer some help if needed.

@spiffcs
Copy link
Contributor

spiffcs commented Jan 29, 2026

@whereIsMyDipp - let me update this PR so that we can get what I outlined here:

I think I'm open to having two maps here. One generated from the official SPDX source and the other contributed by users who see areas where we can map the URL and get better license answers.

After we merge I'll take some time to make some PRs against the repo that @wagoodman suggested and see how long that takes. Apologies for the staleness of this issue. It got lost in the pile of work 😢

@spiffcs
Copy link
Contributor

spiffcs commented Jan 29, 2026

spdx/license-list-XML#2935

When I get feedback on this PR I'll pull the trigger on adding http://www.eclipse.org/org/documents/edl-v10.php as a PR too and then we'll get this with the next SPDX update

Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs
Copy link
Contributor

spiffcs commented Jan 30, 2026

Alright!

So with #4588 we should fix:
"http://www.eclipse.org/org/documents/edl-v10.php": "BSD-3-Clause",

Given the https link already exists in the upstream list:
https://github.com/spdx/license-list-XML/blob/297da51b1b0ea5aab7de4a35faea34ffc43323a0/src/BSD-3-Clause.xml#L5-L9

spdx/license-list-XML#2935 was accepted so we should have a fix for http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html when the next list comes out

I'm going to close this PR since we're not dong this exact solution anymore. But thank you to the original author @Avadhut03 and @whereIsMyDipp for contributing to the discussion here and helping find a solution to support these gaps in URL lookup. We should see on the next release of syft both of these URL supported (upstream update, and scheme lookup fix).

@spiffcs spiffcs closed this Jan 30, 2026
@github-project-automation github-project-automation bot moved this from In Review to Done in OSS Jan 30, 2026
@spiffcs spiffcs mentioned this pull request Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Wrong format in license

4 participants