Skip to content

refactor & restructure LicenseEvidenceFetcher#1

Merged
AugustusKling merged 7 commits intoAugustusKling:gather-license-textsfrom
CycloneDX:jk_gather-license-texts_changes
Dec 4, 2024
Merged

refactor & restructure LicenseEvidenceFetcher#1
AugustusKling merged 7 commits intoAugustusKling:gather-license-textsfrom
CycloneDX:jk_gather-license-texts_changes

Conversation

@jkowalleck
Copy link
Copy Markdown

@jkowalleck jkowalleck commented Nov 8, 2024

this copies the existing art from https://github.com/CycloneDX/cyclonedx-webpack-plugin/blob/master/src/extractor.ts

I'd rather take it as is - with all possible flaws - and improve it in a later issue/PR, than mixing too many things at a time.

Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
@jkowalleck jkowalleck marked this pull request as draft November 8, 2024 10:26
@jkowalleck jkowalleck changed the title refactor & restructure LicenseEvidenceFetcher [WIP] refactor & restructure LicenseEvidenceFetcher Nov 8, 2024
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Comment thread src/_helpers.ts
Comment thread src/builders.ts
}
const LICENSE_FILENAME_PATTERN = this.#LICENSE_FILENAME_PATTERN
const logger = this.console
return async function * (pkg: Package): AsyncGenerator<License> {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coppied from webpack implementation with the needed adjustments for yarn

Comment thread tests/_data/snapshots/license-evidence-dev_juice-shop.json.bin
Comment thread tests/_data/snapshots/license-evidence-dev_juice-shop.json.bin
Comment thread tests/_data/snapshots/license-evidence-dev_juice-shop.json.bin
@jkowalleck jkowalleck marked this pull request as ready for review November 8, 2024 11:23
@jkowalleck
Copy link
Copy Markdown
Author

@AugustusKling reverted most of your work and dump copied over the existing art.

it showes some minor flaws, that i would love to tackle in an independent issue/PR

@jkowalleck jkowalleck changed the title [WIP] refactor & restructure LicenseEvidenceFetcher refactor & restructure LicenseEvidenceFetcher Nov 8, 2024
Comment thread src/_helpers.ts
Comment thread src/builders.ts
reproducible?: BomBuilder['reproducible']
shortPURLs?: BomBuilder['shortPURLs']
gatherLicenseTexts: BomBuilder['gatherLicenseTexts']
gatherLicenseTexts?: BomBuilder['gatherLicenseTexts']
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to have this optionally. The parameter definition of Clipanion has a default value of false set for the gather-license-texts parameter. Therefore the boolean is always present, either true or false.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood.
I went with the code "style" the existing interface had.
I think it should be changed/improved in a dedicated pullrequest.

Changing everything at once makes review hard; keeping up the changes and intended outcome gets harder when modifying out-of-scope code.

Comment thread src/builders.ts
async buildFromWorkspace (workspace: Workspace): Promise<Bom> {
// @TODO make switch to disable load from fs
const fetchManifest: ManifestFetcher = await this.makeManifestFetcher(workspace.project)
const fetchLicenseEvidences: LicenseEvidenceFetcher = await this.makeLicenseEvidenceFetcher(workspace.project)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your argument from CycloneDX#193 (review) but 2 separate fetchers means that all packages have to be extracted twice. This is a waste resource-wise.

We should therefore have only 1 fetcher but not call it manifest fetcher nor license evidence fetcher. The fetcher is something general that makes package contents available. Depending on the linker used it goes to a real file system or a virtual one like the content of ZIP files.

What do you think about letting the fetcher take multiple extractors? Then the fetcher could open the packages (using fetcher.fetch(...)), call all extractors and close the packages (using releaseFs()). In between one extractor could get the package.json while the other could search for license evidence files.

Copy link
Copy Markdown
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally understand. During development of this patch, I've made attempts of extracting/refactoring the fetcher. I rolled back these results, as I thought they were premature optimizations.

I would like to have the unoptimized most-basic solution in place first, so any optimization is measurable easily, and can be introduced in an isolated pullrequest.
A process for this would be:

  1. introduce a new feature as a minimum viable product (MVP)- merge it
  2. test/benchmark this product and baseline your outcome
  3. identify what can/must be improved - and write a very brief ticket/report for it
  4. pullrequest a scoped fix for the issues

Changing all at once might be appealing, but it makes reviews harder and collaboration processes slower.
This is why I encourage small/scoped increments. :-D

Comment thread src/builders.ts Outdated
// option `withFileTypes:true` is not supported and casues crashes
files = packageFs.readdirSync(prefixPath)
} catch (e) {
logger?.warn('collecting license evidence in', prefixPath, 'failed:', e)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst I prefer to simply call console.warn(...) like here, it does not follow the style of cyclonedx-node-yarn which uses console.warn('WARN | ...) in the other code.

Copy link
Copy Markdown
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for spotting this.
fixed it via 2973e59

Comment thread src/builders.ts Outdated
try {
// option `withFileTypes:true` is not supported and casues crashes
files = packageFs.readdirSync(prefixPath)
} catch (e) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch should not be here as it hides errors. The warning message is not enough.

I'd assume users of cyclonedx-node-yarn would call it as part of a build pipeline and expect a non zero exit code if anything fails that affects the produced SBOM. It's better to explode than to log a human-friendly message.

Copy link
Copy Markdown
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this over from webpack implementation, without thinking too much.
With webpack, the SBOM is a side-artifact that is generated during build time, and it is designed to not break build processes.

the yarn plugin is called explicitly, and - like you said - it is expected to explode on errors.
I will change this accordingly.
PS: done via 643f95e

Comment thread src/builders.ts

const contentType = getMimeForTextFile(file)
if (contentType === undefined) {
continue
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not silently skip over files matching LICENSE_FILENAME_PATTERN.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current implementation is a copy/past of existing art.
lets discuss improvements in a different issue/pullrequest (see #1 (comment))

Comment thread src/builders.ts Outdated
manifest.name = pkg.scope ? `@${pkg.scope}/${pkg.name}` : pkg.name
manifest.version = pkg.version
const component = this.makeComponent(pkg, manifest, type)
if (component instanceof Component) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how makeComponent could conditionally return an instance. I'd guess something is wrong if we do not get an Component instance here and thus the code should abort execution.

Copy link
Copy Markdown
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets discuss the out-of-scope improvements in a different issue/pullrequest (see #1 (comment))

Comment thread tests/_data/snapshots/license-evidence-dev_juice-shop.json.bin
Comment thread tests/_data/snapshots/plain_alternative-package-registry.json.bin Outdated
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
@jkowalleck
Copy link
Copy Markdown
Author

exising art got some fixes: CycloneDX/cyclonedx-webpack-plugin#1339

will update this very PR accordingly, soon

@jkowalleck
Copy link
Copy Markdown
Author

updated to match latest existing art - via 1dad58e

@jkowalleck
Copy link
Copy Markdown
Author

@AugustusKling ,

how do you see the current state of this PR? Would you be okay with merging it and have it published and tested downstream?

I mean, this software is not meant to be final, it is intended to be improved step-by-step over time, right?

"licenses": [
{
"license": {
"name": "file: LICENSE.APACHE2",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be found. The library allows using it with BSD-2-Clause OR MIT OR Apache-2.0

@AugustusKling
Copy link
Copy Markdown
Owner

@AugustusKling ,

how do you see the current state of this PR? Would you be okay with merging it and have it published and tested downstream?

I mean, this software is not meant to be final, it is intended to be improved step-by-step over time, right?

Firstly, sorry for leaving this for many days. I would like to spend more time to improve this tool but time is sadly limited.

We can merge this in my opinion as it is. When I compare with my PR to your repo https://github.com/CycloneDX/cyclonedx-node-yarn/pull/193/files then I think these things should be addressed soon (before or after merge):

  • Add --gather-license-texts to /README.md file, maybe with a warning that it's not reliable yet
  • Take over changes from /tests/README.md file

These improvement could be addressed in follow-up PRs:

Just make sure to call the license extraction experimental until we can trust it does not silently skip over errors.

@jkowalleck
Copy link
Copy Markdown
Author

[...] sorry for leaving this for many days.

No need to apologize, I totally understand that everybody has their own time/budget for working on OSS. You are doing great.

We can merge this in my opinion as it is. [...] I think these things should be addressed soon (before or after merge):

Feel free to merge my changes to your own branch, and apply the mentioned changes as needed.
We then might continue discussions in CycloneDX#193 - to identify follow-up tasks and changes for improvement.

merging this PR is expected to automatically update the existing CycloneDX#193

These improvement could be addressed in follow-up PRs. [...]

Fully agree.

Just make sure to call the license extraction experimental until we can trust it does not silently skip over errors.

Thank you for calling this out.
Stating the experimental nature in the README and in the CLI help page is probably the best we can do.

@AugustusKling AugustusKling merged commit 7696ebf into AugustusKling:gather-license-texts Dec 4, 2024
@jkowalleck jkowalleck deleted the jk_gather-license-texts_changes branch December 4, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants