Add linter checking remote_repository_url#1581
Conversation
the value should be the git URL to the tool, i.e. it should have a common suffix with the path of the repo that is under consideration
planemo/shed_lint.py
Outdated
|
|
||
|
|
||
| def lint_shed_remote_repository_url(realized_repository: "RealizedRepository", lint_ctx): | ||
| path = realized_repository.real_path.rstrip(" /") |
There was a problem hiding this comment.
Why is this doing rstrip with a whitespace and lsash ? can you add a comment explaining this longest common suffix heuristic ?
There was a problem hiding this comment.
i appreciate the extra comments but i'm still lost on what the while loop does, so I asked and this is what I got:
Current Implementation Issues
This code attempts to find a common suffix between a file path and a URL by iterating backwards through both strings. However, there are several problems:
Logic Error: The condition checks if characters match, but path[i:] captures everything from position i to the end, which grows longer as i becomes more negative. This doesn't correctly identify the longest common suffix.
String Comparison Confusion: Comparing individual characters at negative indices doesn't guarantee meaningful path segment matching. For example, /tool in a path might accidentally match ool in "school" in the URL.
Unclear Purpose: The docstring mentions checking for "common prefix" but the code looks for a suffix, creating confusion.
Weak Validation: Only checking for "/" in the common part is insufficient - it could match arbitrary substrings.
This was a replacement suggestion:
from pathlib import PurePosixPath
def lint_shed_remote_repository_url(realized_repository: "RealizedRepository", lint_ctx):
"""
Verify that remote_repository_url contains the repository path as a suffix.
Expected URL format: https://gitserver/organisation/tree/main/path
where 'path' should match the repository's filesystem path.
"""
path = PurePosixPath(realized_repository.real_path)
remote_repository_url = realized_repository.config.get("remote_repository_url", "").rstrip(" /")
if not remote_repository_url:
return # No URL to validate
# Get path parts (segments) excluding empty strings
path_parts = path.parts
# Check if URL ends with a reasonable portion of the path
# Look for at least 2 path segments to avoid false positives
min_segments = min(2, len(path_parts))
for i in range(len(path_parts) - min_segments + 1):
suffix = "/".join(path_parts[i:])
if remote_repository_url.endswith(suffix):
# Found a match with at least min_segments
return
# If no match found, issue warning
lint_ctx.warn(
f"remote_repository_url may be incorrect: expected it to end with "
f"repository path '{path}' or a significant portion of it"
)There was a problem hiding this comment.
Logic Error: The condition checks if characters match, but path[i:] captures everything from position i to the end, which grows longer as i becomes more negative. This doesn't correctly identify the longest common suffix.
This is why I'm not convinced yet of AI :) Of course checking equality for the last, 2nd last, 3rd last ... character will determine the longest common substring. Even if efficiency is not relevant here, note that it's also more efficient than repeatedly constructing potential longest substrings and comparing these substrings (O(n) vs O(n^2)) ... but I should move longest_common_suffix = path[i:] to the else branch :)
String Comparison Confusion: ...
Unclear Purpose: ...
Weak Validation: ...
This is why I still make use of it: Indeed checking for longest common suffix of path segments is a better idea.
3cc97a2 to
35b99d8
Compare
35b99d8 to
6c8db05
Compare
mvdbeek
left a comment
There was a problem hiding this comment.
Have you tested this against iuc/devteam, are there any false positives ?
No, but I should. |
|
For IUC problems are found for which are all true positives. |
|
In bgruening there are quite a few where the entry is missing .. Maybe we should have a better message for these cases? |
I was wrong. The cases mix up the remote repo URL and homepage URL. So I'm quite confident and you can merge :) |
the value should be the git URL to the tool, i.e. it should have a common suffix with the path of the repo that is under consideration