Skip to content

Add REUSE compliance for machine-readable licensing and SBOM generation#3968

Merged
zuiderkwast merged 5 commits into
valkey-io:unstablefrom
zuiderkwast:reuse
Jun 16, 2026
Merged

Add REUSE compliance for machine-readable licensing and SBOM generation#3968
zuiderkwast merged 5 commits into
valkey-io:unstablefrom
zuiderkwast:reuse

Conversation

@zuiderkwast

@zuiderkwast zuiderkwast commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Adds REUSE 3.3 structure (REUSE.toml, LICENSES/) covering all source files and vendored dependencies.

Replaces the non-standard dual-license COPYING file with a standard BSD-3-Clause text.

Updates the description of custom patches to Lua and Jemalloc in deps/README.md.

Benefits:

  • GitHub and OpenSSF Scorecard correctly detect the project license
  • reuse spdx generates a complete SPDX SBOM covering first-party code and all vendored deps
  • CI check prevents future contributors from introducing invalid SPDX identifiers
  • Per-file license clarity for downstream consumers (distro packagers, enterprises, embedded users)

The jemalloc section was stale: it still described custom source patches
for active defragmentation that were removed in valkey-io#1266. The only remaining
source modification is the VALKEY_VENDORED_JEMALLOC macro in jemalloc.sh.

The Lua section was incomplete: it was missing readonly tables, globals
protection, and CVE patches that have been applied over the years.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Replace the non-standard dual-license COPYING file with a single
standard BSD-3-Clause text. The previous format (two full license
blocks in one file) was not recognized by GitHub's license detection,
OpenSSF Scorecard, or other automated compliance tools.

Add REUSE structure (.reuse/dep5 and LICENSES/) to provide
machine-readable per-file copyright and license annotations, covering
both Valkey source code and all vendored dependencies.

Fix invalid SPDX-License-Identifier headers in 6 source files that
used 'BSD 3-Clause' (with space) instead of 'BSD-3-Clause'.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Runs 'reuse lint' on push and pull requests using the official
fsfe/reuse-action to catch invalid SPDX identifiers and missing
license/copyright annotations.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@github-actions

Copy link
Copy Markdown

❌ Provenance Check Alert

Potential code similarities detected with upstream repository.

  • 2026-06-11 08:42:47 [INFO] - matches redis/redis PR #14609 (similarity: 0.980, method: file_simhash+deep); file pairs: COPYING <- REDISCONTRIBUTIONS.txt
  • 2026-06-11 08:42:47 [INFO] - matches redis/redis PR #13157 (similarity: 0.980, method: file_simhash+deep); file pairs: COPYING <- REDISCONTRIBUTIONS.txt
  • 2026-06-11 08:42:47 [INFO] - matches redis/redis PR #15162 (similarity: 0.976, method: file_simhash+deep); file pairs: LICENSES/BSD-2-Clause.txt <- deps/tre/LICENSE
  • 2026-06-11 08:42:47 [INFO] - matches redis/redis PR #15214 (similarity: 0.976, method: file_simhash+deep); file pairs: LICENSES/BSD-2-Clause.txt <- deps/tre/LICENSE
  • 2026-06-11 08:42:47 [INFO] - matches redis/redis PR #14433 (similarity: 0.887, method: file_simhash+deep); file pairs: LICENSES/BSD-2-Clause.txt <- deps/xxhash/LICENSE

This check was performed automatically by the Provenance Guard Action.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request establishes REUSE compliance infrastructure by adding a GitHub Actions workflow for automated checks, populating license text files, normalizing SPDX headers across source files to BSD-3-Clause format, and updating dependency documentation for Jemalloc and Lua integration.

Changes

REUSE Compliance and License Standardization

Layer / File(s) Summary
REUSE Infrastructure and License Files
.github/workflows/reuse.yml, REUSE.toml, LICENSES/Apache-2.0.txt, LICENSES/BSD-2-Clause.txt, LICENSES/BSD-3-Clause.txt, LICENSES/BSL-1.0.txt, LICENSES/CC0-1.0.txt, LICENSES/ISC.txt, LICENSES/MIT.txt, LICENSES/Zlib.txt
Adds REUSE GitHub Actions workflow that runs on push/pull_request (excluding Markdown files), configures read-only permissions and branch-scoped concurrency, and populates REUSE.toml manifest with SPDX copyright and license annotations. Introduces license text files for Apache-2.0, BSD-2-Clause, BSD-3-Clause, BSL-1.0, CC0-1.0, ISC, MIT, and Zlib.
Source License Header Standardization
COPYING, src/cluster_migrateslots.c, src/endianconv.h, src/fmtargs.h, src/hashtable.c, src/valkey-benchmark-dataset.c, src/valkey-benchmark-dataset.h
Reformats COPYING to present BSD 3-Clause license text with updated spacing and removes SPDX/bullet-style formatting. Normalizes SPDX-License-Identifier headers across six source files from BSD 3-Clause to BSD-3-Clause format.
Dependency Documentation Updates
deps/README.md
Simplifies Jemalloc upgrade guidance to emphasize upstream usage with minimal vendoring (build-time detection macro and CMakeLists.txt only) and updates Lua 5.1 differences list to document security patches and feature modifications including readonly table support and globals-whitelist protection.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: adding REUSE compliance structure for machine-readable licensing and SBOM generation, which aligns with the addition of REUSE.toml, LICENSES/ files, workflow, and license identifier updates throughout the codebase.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly relates to the changeset, explaining REUSE 3.3 structure additions, COPYING file replacement, and deps/README.md updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

This comment was marked as duplicate.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
.github/workflows/reuse.yml (1)

24-25: ⚡ Quick win

Consider adding persist-credentials: false to the checkout action.

The actions/checkout action defaults to persisting GitHub credentials in the local .git/config, which can be a security risk if subsequent steps are compromised. Since this workflow only performs a read-only compliance check, explicitly disabling credential persistence is a security best practice.

🔒 Proposed fix
       - name: Checkout code
         uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/reuse.yml around lines 24 - 25, Update the
actions/checkout step to explicitly disable persisting GitHub credentials by
adding persist-credentials: false to the checkout action configuration; locate
the checkout step that uses actions/checkout@de0fac2e... (the "Checkout code"
step) and add the persist-credentials: false key under it so the workflow does
not store repo credentials in .git/config during this read-only compliance run.
deps/README.md (2)

31-31: ⚡ Quick win

Capitalize "GitHub" per proper noun convention.

The platform name should be capitalized as "GitHub" rather than "github".

📝 Proposed fix
-The jemalloc directory is pulled as a subtree from the upstream jemalloc github repo. To update it you should run from the project root:
+The jemalloc directory is pulled as a subtree from the upstream jemalloc GitHub repo. To update it you should run from the project root:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deps/README.md` at line 31, Change the lowercase platform name "github" in
the sentence "The jemalloc directory is pulled as a subtree from the upstream
jemalloc github repo." to the proper-cased "GitHub" (i.e., replace "github" with
"GitHub") so the README uses the correct proper noun casing.

Source: Linters/SAST tools


29-29: ⚡ Quick win

Consider adjusting heading level for proper hierarchy.

The heading uses #### (h4), but should use ### (h3) since it follows the --- (h2) "Jemalloc" section. Markdown heading levels should increment by one level at a time for proper document structure.

📝 Proposed fix
-#### Updating/upgrading jemalloc
+### Updating/upgrading jemalloc
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deps/README.md` at line 29, Change the heading "#### Updating/upgrading
jemalloc" to one level higher ("### Updating/upgrading jemalloc") so it properly
follows the H2 "Jemalloc" section; update the heading text in the README (look
for the exact string "#### Updating/upgrading jemalloc") to maintain correct
markdown hierarchy.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cluster_migrateslots.c`:
- Line 4: This change touches cluster_migrateslots.c which requires an
architectural review by `@core-team`; update the PR by adding an explicit request
for `@core-team` review (e.g., add the `@core-team` mention in the PR description
and/or add the “needs-architectural-review” label), and update the commit
message or PR checklist to state “Request `@core-team` architectural review for
changes to cluster_migrateslots.c” so the mandatory review is clearly flagged
before merging.

---

Nitpick comments:
In @.github/workflows/reuse.yml:
- Around line 24-25: Update the actions/checkout step to explicitly disable
persisting GitHub credentials by adding persist-credentials: false to the
checkout action configuration; locate the checkout step that uses
actions/checkout@de0fac2e... (the "Checkout code" step) and add the
persist-credentials: false key under it so the workflow does not store repo
credentials in .git/config during this read-only compliance run.

In `@deps/README.md`:
- Line 31: Change the lowercase platform name "github" in the sentence "The
jemalloc directory is pulled as a subtree from the upstream jemalloc github
repo." to the proper-cased "GitHub" (i.e., replace "github" with "GitHub") so
the README uses the correct proper noun casing.
- Line 29: Change the heading "#### Updating/upgrading jemalloc" to one level
higher ("### Updating/upgrading jemalloc") so it properly follows the H2
"Jemalloc" section; update the heading text in the README (look for the exact
string "#### Updating/upgrading jemalloc") to maintain correct markdown
hierarchy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: bc0c986f-83c3-4a97-a1d0-3985d485225a

📥 Commits

Reviewing files that changed from the base of the PR and between f3bdf50 and 27fcc02.

📒 Files selected for processing (18)
  • .github/workflows/reuse.yml
  • .reuse/dep5
  • COPYING
  • LICENSES/Apache-2.0.txt
  • LICENSES/BSD-2-Clause.txt
  • LICENSES/BSD-3-Clause.txt
  • LICENSES/BSL-1.0.txt
  • LICENSES/CC0-1.0.txt
  • LICENSES/ISC.txt
  • LICENSES/MIT.txt
  • LICENSES/Zlib.txt
  • deps/README.md
  • src/cluster_migrateslots.c
  • src/endianconv.h
  • src/fmtargs.h
  • src/hashtable.c
  • src/valkey-benchmark-dataset.c
  • src/valkey-benchmark-dataset.h

Comment thread src/cluster_migrateslots.c
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.69%. Comparing base (f769037) to head (7ff1a6a).
⚠️ Report is 2 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3968      +/-   ##
============================================
+ Coverage     76.58%   76.69%   +0.10%     
============================================
  Files           162      162              
  Lines         80753    80752       -1     
============================================
+ Hits          61844    61932      +88     
+ Misses        18909    18820      -89     
Files with missing lines Coverage Δ
src/cluster_migrateslots.c 91.98% <ø> (ø)
src/endianconv.h 100.00% <ø> (ø)
src/hashtable.c 97.88% <ø> (ø)
src/valkey-benchmark-dataset.c 85.54% <ø> (ø)

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zuiderkwast zuiderkwast requested a review from a team June 11, 2026 09:45
@zuiderkwast

Copy link
Copy Markdown
Contributor Author

REUSE is from FSF Europe, https://reuse.software/

Why it matters? Maybe mostly in Europe so far, but similar requirements may come to other continents.

Motivation: EU Cyber Resilience Act (CRA) and SBOMs

The EU Cyber Resilience Act (CRA) requires manufacturers of "products with digital elements" placed on the EU market to provide a Software Bill of Materials (SBOM) as part of their technical documentation.

What the CRA requires

  • An SBOM documenting at minimum the top-level dependencies of the product
  • In a "commonly used and machine-readable format" (SPDX or CycloneDX)
  • Shared with market surveillance authorities on request (not required to be public)

Timeline

  • September 2026: Reporting obligations apply
  • December 2027: Full compliance required, including SBOM

Open source exemption

Free and open source software developed outside the course of a commercial activity is exempt from CRA obligations. However, when a company takes Valkey and sells a product or service based on it, that company becomes the manufacturer and must produce the SBOM.

Why this matters for Valkey

Valkey itself has no obligation, but our downstream commercial users (cloud providers, appliance vendors, embedded integrators) will need SBOMs for their products containing Valkey. The REUSE structure introduced in this PR means they can generate a complete SPDX SBOM with a single command (reuse spdx) rather than doing manual license archaeology across our vendored dependencies.

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you! Posted some suggestions, questions.

Comment thread .reuse/dep5 Outdated
Comment thread .reuse/dep5 Outdated
Comment thread LICENSES/ISC.txt
Comment thread COPYING
@lucasyonge

Copy link
Copy Markdown
Contributor

Does every module related to Valkey need to have this?, such as Lua, vectorsearch, and json etc..

@zuiderkwast

Copy link
Copy Markdown
Contributor Author

Does every module related to Valkey need to have this?, such as Lua, vectorsearch, and json etc..

Not really, and Valkey doesn't need it either, but it helps companies that want to include Valkey in a product to track the dependencies and their licenses. The company that sells a product or service is the one who needs to provide the SBOM stuff to their customers.

@github-actions

This comment was marked as duplicate.

Add Mark Pulford (lua_cjson, strbuf) and Mike Pall (lua_bit) to the
Lua stanza. Add Florian Loitsch to the fpconv stanza.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@github-actions

This comment was marked as duplicate.

Use REUSE.toml instead of the deprecated .reuse/dep5 format. REUSE.toml
is the current recommended format, supports richer metadata such as
per-annotation comments, and propagates fields to SPDX output that dep5
did not.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@github-actions

This comment was marked as duplicate.

@zuiderkwast zuiderkwast merged commit 6774c09 into valkey-io:unstable Jun 16, 2026
63 of 64 checks passed
@zuiderkwast zuiderkwast deleted the reuse branch June 16, 2026 18:09
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants