Skip to content

feat: Add license enrichment from pypi to python packages#4295

Merged
spiffcs merged 4 commits intoanchore:mainfrom
timols:feat/add-python-license-enrichment
Nov 6, 2025
Merged

feat: Add license enrichment from pypi to python packages#4295
spiffcs merged 4 commits intoanchore:mainfrom
timols:feat/add-python-license-enrichment

Conversation

@timols
Copy link
Copy Markdown
Contributor

@timols timols commented Oct 17, 2025

Description

This PR introduces license enrichment from PyPI for python-related packages (excluding source based packages defined by a wheel or egg since license info should typically be present on the file system)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have added unit tests that cover changed behavior
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

Comment thread syft/pkg/cataloger/python/package.go Outdated
Signed-off-by: Tim Olshansky <456103+timols@users.noreply.github.com>
Signed-off-by: Tim Olshansky <456103+timols@users.noreply.github.com>
@timols timols force-pushed the feat/add-python-license-enrichment branch from cf95dc2 to 519e9f0 Compare October 17, 2025 17:56
Signed-off-by: Tim Olshansky <456103+timols@users.noreply.github.com>
@timols timols requested a review from kzantow October 17, 2025 18:14
Copy link
Copy Markdown
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things I think should be updated

if resp.StatusCode != 200 {
return "", fmt.Errorf("unable to get package from pypi registry")
}
defer func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this precede the status code check? there can be response bodies that need to be closed with other statuses, I think

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

Comment on lines +116 to +118
require.Equal(t, tc.expectedError.Error(), err.Error())
} else {
require.Equal(t, tc.expectedError, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tc.expectedError could be nil here. there are some require helpers for errors like require.NoError and require.ErrorContains. I think this could be a bit better written like:

if tc.expectedError != nil {
  require.ErrorContains(t, err, tc.expectedError.Error())
} else {
  require.NoError(t, err)

or similar

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality we control what the expected error is and this would blow up in a very obvious way when running unit tests. That said, it's a small change to have an even better error message. 😄

return "", fmt.Errorf("unable to format remote request: %w", err)
}

httpClient := &http.Client{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some common configuration for timeout options, retry behavior possibly, rate limiting, etc.. This is becoming more important as we add more online resolution. We also might think about adding these features in a way that could be used in parallel and to enhance existing SBOMs rather than only at creation time. I don't think that has to happen as part of this PR, but I think we're getting to the point that we need to start being conscious of some problems people will start running into more frequently like the Maven rate limiting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense.

Signed-off-by: Tim Olshansky <456103+timols@users.noreply.github.com>
@timols timols requested a review from kzantow October 18, 2025 03:14
@timols
Copy link
Copy Markdown
Contributor Author

timols commented Oct 23, 2025

Hi @kzantow - let me know if there is anything remaining to get this across the line

@timols
Copy link
Copy Markdown
Contributor Author

timols commented Nov 3, 2025

Hi @kzantow anything holding this back from being approved and merged?

@spiffcs
Copy link
Copy Markdown
Contributor

spiffcs commented Nov 6, 2025

LGTM TY @timols these contributions have been excellent and I really appreciate all the thought and effort you put into both PR

@spiffcs spiffcs merged commit bbef262 into anchore:main Nov 6, 2025
12 checks passed
@timols
Copy link
Copy Markdown
Contributor Author

timols commented Nov 6, 2025

Thanks @spiffcs - happy to contribute! 😄

@timols timols deleted the feat/add-python-license-enrichment branch November 6, 2025 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants