Skip to content

[BUG] Incorrect Maven PURL generation: Automatic-Module-Name should not be used as Maven groupId #4611

@debasishbsws

Description

@debasishbsws

What happened:

Summary

Syft incorrectly uses the Automatic-Module-Name MANIFEST.MF field as a Maven groupId when generating PURLs for Java artifacts that lack Maven metadata (pom.properties/pom.xml). This results in incorrect Maven PURLs that don't match the actual Maven Central coordinates, causing downstream issues in vulnerability scanning and dependency management.

Affected Code

The bug exists in the fallback logic for extracting Maven groupIds from MANIFEST.MF files:

File: syft/pkg/cataloger/internal/cpegenerate/java.go

SecondaryJavaManifestGroupIDFields = []string{
    "Automatic-Module-Name",  // ← THIS IS THE PROBLEM
    "Main-Class",
    "Package",
}

This array is used in the groupId resolution hierarchy defined in:

File: syft/pkg/cataloger/java/package_url.go

func groupIDFromJavaManifest(manifest *pkg.JavaManifest) (groupID string) {
    if manifest == nil {
        return groupID
    }

    groupIDs := cpegenerate.GetManifestFieldGroupIDs(manifest,
        cpegenerate.PrimaryJavaManifestGroupIDFields)
    // assumes primaryJavaManifestNameFields ordered by priority
    if len(groupIDs) != 0 {
        return groupIDs[0]
    }

    groupIDs = cpegenerate.GetManifestFieldGroupIDs(manifest,
        cpegenerate.SecondaryJavaManifestGroupIDFields)  // ← Fallback to secondary fields

    if len(groupIDs) != 0 {
        return groupIDs[0]  // ← Returns Automatic-Module-Name if found
    }

    return groupID
}

Concrete Example: lz4-java

Expected Behavior

For the lz4-java library version 1.8.0:

Actual Behavior

Syft generates: pkg:maven/org.lz4.java/lz4-java@1.8.0

Root Cause

The lz4-java-1.8.0.jar contains no Maven metadata (no pom.properties or pom.xml inside the JAR). The MANIFEST.MF contains:

Automatic-Module-Name: org.lz4.java
Bundle-SymbolicName: lz4-java

Issue

The Automatic-Module-Name field is a Java 9+ module system identifier, not a Maven groupId:

  • Purpose: Identifies a module in the Java Platform Module System (JPMS)
  • Format: Often follows reverse-DNS notation but represents the full module name, not just the organization/group
  • Relationship to Maven: Zero semantic relationship to Maven coordinates
  • Example: For lz4-java, the module name is org.lz4.java but the Maven groupId is just org.lz4

Using Automatic-Module-Name as a Maven groupId is semantically incorrect and produces invalid PURLs that don't match Maven Central.

Verification

You can reproduce this issue:

1. Download the JAR from Maven Central

curl -O https://repo1.maven.org/maven2/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.jar

2. Check the MANIFEST.MF

unzip -q -c lz4-java-1.8.0.jar META-INF/MANIFEST.MF | grep "Automatic-Module-Name"
# Output: Automatic-Module-Name: org.lz4.java

3. Verify no Maven metadata exists

unzip -l lz4-java-1.8.0.jar | grep -E "(pom\.|maven)"
# Output: (empty - no pom.properties or pom.xml)

4. Run Syft

syft lz4-java-1.8.0.jar -o json | jq '.artifacts[] | select(.name == "lz4-java") | .purl'
# Output: "pkg:maven/org.lz4.java/lz4-java@1.8.0"
# Expected: "pkg:maven/org.lz4/lz4-java@1.8.0"

5. Confirm correct Maven coordinates

curl -s https://repo1.maven.org/maven2/org/lz4/lz4-java/1.8.0/lz4-java-1.8.0.pom | grep -E "<groupId>|<artifactId>"
# Output:
#   <groupId>org.lz4</groupId>
#   <artifactId>lz4-java</artifactId>

Impact

This bug affects:

  1. Vulnerability Scanning: Tools like Grype won't match CVEs correctly because the PURL doesn't match vulnerability databases
  2. Dependency Management: SBOMs contain incorrect package identifiers

Affected Packages

Any Java package that:

  • Lacks Maven metadata inside the JAR (no pom.properties or pom.xml)
  • Contains an Automatic-Module-Name field in MANIFEST.MF
  • Has no primary manifest fields with valid top-level domain prefixes

Examples:

  • lz4-java (all versions from 1.6.0 to 1.8.0)
  • Potentially many other Java 9+ modularized libraries

Rationale

  1. Semantic Mismatch: Module names are not Maven groupIds
  2. No Reliable Extraction: There's no standard way to derive a Maven groupId from a module name as ‎Automatic-Module-Name is meant to be a module name, chosen by the library author for JPMS.

Additional Context

This issue was discovered while investigating SBOM accuracy for Kafka v4.1. The incorrect PURL generation prevents accurate vulnerability matching and creates false negatives in security scanning(Grype) workflows.

Environment: Latest main branch as well as

❯ syft version
Application:   syft
Version:       1.41.2
BuildDate:     2026-02-03T15:29:00Z
GitCommit:     Homebrew
GitDescription: [not provided]
Platform:      darwin/arm64
GoVersion:     go1.25.6
Compiler:      gc
SchemaVersion: 16.1.2
  • OS (e.g: cat /etc/os-release or similar): All
❯ sw_vers
ProductName:		macOS
ProductVersion:		26.2
BuildVersion:		25C56

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions