quirk: remove line number fragments from github URLs#2116
Conversation
we have no hope of checking these and they're probably unlikely to break as long as the file itself continues to exist. big remaining github fragment cases: - `#readme` fragment going to implicit readme file - fragments in implicit readme files (within a directory view) - issue comment numbers - pr comment numbers
There's also an API endpoint to find the URL for a repo's readme. Is that helpful? |
|
oh, yes! that does help. we can add that in another pr |
mre
left a comment
There was a problem hiding this comment.
Code should work fine!
The only thing I'd add would be a few unit tests:
#[rstest]
// Standard single line
#[case("https://github.com/lycheeverse/lychee/blob/master/README.md#L10", true)]
// Standard range with double 'L'
#[case("https://github.com/lycheeverse/lychee/blob/master/src/main.rs#L10-L20", true)]
// Shorthand range (no second 'L')
#[case("https://github.com/lycheeverse/lychee/blob/master/src/lib.rs#L5-15", true)]
// Deeply nested path
#[case("https://github.com/user/repo/blob/feat/branch/path/to/file.txt#L1", true)]
// Should NOT match: Markdown fragment (handled by the other regex)
#[case("https://github.com/user/repo/blob/master/README.md#installation", false)]
// Should NOT match: Raw blob without line numbers
#[case("https://github.com/user/repo/blob/master/src/main.rs", false)]
// Should NOT match: Normal website URL
#[case("https://github.com/user/repo", false)]
fn test_github_blob_line_fragment_regex(#[case] url: &str, #[case] expected: bool) {
assert_eq!(
GITHUB_BLOB_LINE_FRAGMENT_PATTERN.is_match(url),
expected,
"Failed for URL: {}",
url
);
}|
Thanks for the quirk! I also like the new debugging logs.
I've double-checked, it actually does resolve the issue 👍 |
To be honest, they were kind of a lazy job just to see if it was working. Do you think it's worthwhile to add name strings to each quirk to avoid logging the big regex? |
Haha :) but I definitely prefer that we now see when quirks are applied, could help a lot with troubleshooting
Yes that would be quite nice, I also considered that. Let's do it 👍 |
GITHUB_BLOB_MARKDOWN_FRAGMENT_PATTERN for line-number fragments. this is actually not entirely true, because if you go to a raw md file and click permalink, you will get a blob link ending in `README.md?plain=1#L4`. a more accurate quirk would send: - `README.md?plain=1#L4` to a line fragment quirk, and - `README.md#L4` to the general markdown fragment quirk. however, this is not easy in regex without somethign like negative patterns to make sure that the other one *doesn't* match. so i just simplify to line number taking precedence, because people are probably not making sections called `L123`.
|
I had to think about the precedence of the two github fragment quirks. I wrote about it in bd17ccb, lmk wyt. I also wonder if it's time to start thinking about applying multiple quirks. |
No objections. I leave it up to you to decide if you'd like to tackle this in the same PR or a separate one. :) |
it is not valid to use a short link form directly on youtube.com, it will 404
|
Added multiple quirk support in 93eef4a. It isn't actually used yet, because the github line quirk deletes fragments so the github markdown fragment quirk won't apply after it. But it was a very simple change so I just did it here. |
thomas-zahner
left a comment
There was a problem hiding this comment.
Thank you Kait! I like the new debug logs and it makes so much sense that quirks aren't isolated so that it's possible to have multiple quirks applied to a single request 👍
we have no hope of checking these and they're probably unlikely to break as long as the file itself continues to exist.
big remaining github fragment cases:
#readmefragment going to implicit readme file - apiMight close #2113 and also addresses #1729 for this particular case.