Skip to content

GH-1609 Bad metadata naming#1610

Merged
cstamas merged 6 commits intoapache:masterfrom
cstamas:issue-1609
Oct 7, 2025
Merged

GH-1609 Bad metadata naming#1610
cstamas merged 6 commits intoapache:masterfrom
cstamas:issue-1609

Conversation

@cstamas
Copy link
Member

@cstamas cstamas commented Oct 1, 2025

It caused in certain cases that lock names for different metadata collapsed onto same name, and hence, congestion with possible timeouts on resource happened. In case of metadata, Metadata#getType was totally ignored, which turns out was wrong, as today we have more types than the simple maven-metadata.xml.

Fix: We already had means to make sure "string is valid for path segment", but it was buried in RepositoryIdHelper. Pull it out, and make it reusable, as this is essentially very same use case. Also, this extra work really "punishes" only non maven metadata, and that is okay (right now we have only repository prefix and archetype catalog files in play).

Fixes #1609

And hence, congestion with possible timeouts on resource.

Fixes apache#1609
@cstamas cstamas self-assigned this Oct 1, 2025
@cstamas cstamas changed the title GH-1609 Bad metadata naming cause name colllaps GH-1609 Bad metadata naming Oct 1, 2025
@cstamas cstamas added this to the 2.0.13 milestone Oct 1, 2025
@cstamas cstamas marked this pull request as ready for review October 3, 2025 08:43
return metadataType.substring(0, slash)
+ separator
+ getMetadataName(
new DefaultMetadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks inefficient to create an object just to call the method recursively. It may be cleaner to define a new method which would have the individual parameters.

if (metadata.getType().contains("/")) {
String metadataType = metadata.getType();
int slash = metadataType.indexOf('/');
return metadataType.substring(0, slash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we put the first part of the metadata in front while we usually used g/a/v/t ?
What are the types containing a / ?

Copy link
Member Author

@cstamas cstamas Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit of history.... originally Maven had no means to (re)use Resolver transport to fetch (or put) non "standard" (not on layout) resources, and similarly, you could not reuse cache/checksum policies to perform such a transport, youd need to redo everything. As everything was either artifact or maven metadata. But with latest Resolver, metadata URI is calculated like this:

Metadata is G, A, V and T. Remote URI is constructed as:

  • remote repository baseUrl
  • if G given, +G.replaceAll(".", "/") -- as artifact groupId
  • if A given, +A -- as artifact artifactId
  • if V given, +V -- as artifact version
  • +type

Type was always "maven-metadata.xml". This is how you can make Resolver download the Maven Repository Metadata from G (plugin) level, GA (versions) level, and GAV (snapshot resolution) level.

Later, we realized that type does not have to be always "maven-metadata.xml", but can be something else, so for example Archetype Catalog (that sits in repo root) can be addressed (and hence, use same transport to get artifacts) as this (this is how it is done in m-archetype-p to get and cache the catalog):

Metadata(G:"", A:"", V:"", Type:"archetype-catalog.xml") and is cached locally as archetype-catalog-$repoID.xml (w/ checksums) by Resolver automatically (this is how "metadata" is handled by resolver).

The subsequent step was to make Metadata be able to address any (out of layout) file on remote repository, or, in other words, make Maven able to get any file (and cache it, with all the whistle and bells of caching/policies/checksums) that are not addressable by GAV/artifact. This change was done in #1491 and it was needed for prefixes (again, to use same transport and all features of local repo like checksum validation/cache retention/etc and maven metadata tracking by origin). Basically prefix files lie on this path:

Metadata(G:"", A:"", V:"", Type:".meta/prefixes.txt") and is cached locally as .meta/prefixes-$repoId.txt (w/ checksums) by resolver automatically.

And currently, they are the ONLY Metadata files (in resolver and/or maven) that contain / (slash).

The point is, that archetype catalog, but also prefixes are repository metadata resources, and they need same checksum validation, cache retention as any other "metadata". And what happens is total reuse of all existing functionality of Resolver to get/cache these files (so -U also updates them, etc).

Copy link
Member Author

@cstamas cstamas Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the typical pain was long time index: indexes are also published within repository, here. And originally, Maven Indexer had to reimplement everything (full transport stack) to get it from same place, from where user was already getting artifacts from (with same possible proxies, mirrors, auth and all the cruft). Now, Resolver would be able even to get indexes for you, no need to reimplement anything, as any file, that is published via repository is reachable to Resolver and Maven (and plugins).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to answer your original question "we put the first part of the metadata in front"... due local repository layout. If I ask from Central following metadata:

  • Metadata(G:"", A:"", V:"", Type:"prefixes.txt")
  • Metadata(G:"", A:"", V:"", Type:".meta/prefixes.txt")
  • Metadata(G:"", A:"", V:"", Type:".meta/meta/prefixes.txt")

They would call conflict, as Resolver handles Metadata file names in local repository by appending origin repoId, so they would be locally then be named prefix-$repoId.txt and clash.

Oh, but you are right, these are local repo paths, and we talk about lock names. Yes, you are right, we could replace / to _ or anything to keep them "flat", agreed.

Lock names by def should be flat, not follow same layout
as local repo paths.
@cstamas cstamas added the bug Something isn't working label Oct 3, 2025
We already had means to "make sure string is valid for path segment",
but it was buried in RepositoryIdHelper.

Pull it out, and make it reusable, as this is essentially very
same use case. Also, the "expensive" method really punishes
ONLY non maven metadata, and those should be few only.
@cstamas cstamas requested a review from gnodet October 3, 2025 12:06
@cstamas cstamas merged commit a8833e7 into apache:master Oct 7, 2025
8 checks passed
@cstamas cstamas deleted the issue-1609 branch October 7, 2025 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prefix locking issues

2 participants