Skip to content

quirk: remove line number fragments from github URLs#2116

Merged
thomas-zahner merged 8 commits into
lycheeverse:masterfrom
rina-forks:more-github-quirks
Apr 6, 2026
Merged

quirk: remove line number fragments from github URLs#2116
thomas-zahner merged 8 commits into
lycheeverse:masterfrom
rina-forks:more-github-quirks

Conversation

@katrinafyi

@katrinafyi katrinafyi commented Mar 30, 2026

Copy link
Copy Markdown
Member

we have no hope of checking these and they're probably unlikely to break as long as the file itself continues to exist.

big remaining github fragment cases:

  • #readme fragment going to implicit readme file - api
  • fragments in implicit readme files (within a directory view) - api
  • issue comment numbers - use api
  • pr comment numbers - use api
  • other branch tree views?

Might close #2113 and also addresses #1729 for this particular case.

we have no hope of checking these and they're probably unlikely to
break as long as the file itself continues to exist.

big remaining github fragment cases:
- `#readme` fragment going to implicit readme file
- fragments in implicit readme files (within a directory view)
- issue comment numbers
- pr comment numbers
@mre

mre commented Mar 30, 2026

Copy link
Copy Markdown
Member

#readme fragment going to implicit readme file

There's also an API endpoint to find the URL for a repo's readme. Is that helpful?
https://docs.github.com/en/rest/repos/contents?apiVersion=2026-03-10#get-a-repository-readme

@katrinafyi

Copy link
Copy Markdown
Member Author

oh, yes! that does help. we can add that in another pr

@mre mre left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code should work fine!

The only thing I'd add would be a few unit tests:

#[rstest]
// Standard single line
#[case("https://github.com/lycheeverse/lychee/blob/master/README.md#L10", true)]
// Standard range with double 'L'
#[case("https://github.com/lycheeverse/lychee/blob/master/src/main.rs#L10-L20", true)]
// Shorthand range (no second 'L')
#[case("https://github.com/lycheeverse/lychee/blob/master/src/lib.rs#L5-15", true)]
// Deeply nested path
#[case("https://github.com/user/repo/blob/feat/branch/path/to/file.txt#L1", true)]
// Should NOT match: Markdown fragment (handled by the other regex)
#[case("https://github.com/user/repo/blob/master/README.md#installation", false)]
// Should NOT match: Raw blob without line numbers
#[case("https://github.com/user/repo/blob/master/src/main.rs", false)]
// Should NOT match: Normal website URL
#[case("https://github.com/user/repo", false)]
fn test_github_blob_line_fragment_regex(#[case] url: &str, #[case] expected: bool) {
    assert_eq!(
        GITHUB_BLOB_LINE_FRAGMENT_PATTERN.is_match(url),
        expected,
        "Failed for URL: {}",
        url
    );
}

@thomas-zahner

Copy link
Copy Markdown
Member

Thanks for the quirk! I also like the new debugging logs.

Might close #2113

I've double-checked, it actually does resolve the issue 👍

@katrinafyi

Copy link
Copy Markdown
Member Author

I also like the new debugging logs.

To be honest, they were kind of a lazy job just to see if it was working. Do you think it's worthwhile to add name strings to each quirk to avoid logging the big regex?

@thomas-zahner

Copy link
Copy Markdown
Member

To be honest, they were kind of a lazy job just to see if it was working.

Haha :) but I definitely prefer that we now see when quirks are applied, could help a lot with troubleshooting

Do you think it's worthwhile to add name strings to each quirk to avoid logging the big regex?

Yes that would be quite nice, I also considered that. Let's do it 👍

GITHUB_BLOB_MARKDOWN_FRAGMENT_PATTERN for line-number fragments.

this is actually not entirely true, because if you go to a raw md file
and click permalink, you will get a blob link ending in
`README.md?plain=1#L4`. a more accurate quirk would send:

- `README.md?plain=1#L4` to a line fragment quirk, and
- `README.md#L4` to the general markdown fragment quirk.

however, this is not easy in regex without somethign like negative
patterns to make sure that the other one *doesn't* match.

so i just simplify to line number taking precedence, because people
are probably not making sections called `L123`.
@katrinafyi

Copy link
Copy Markdown
Member Author

I had to think about the precedence of the two github fragment quirks. I wrote about it in bd17ccb, lmk wyt. I also wonder if it's time to start thinking about applying multiple quirks.

@mre

mre commented Apr 3, 2026

Copy link
Copy Markdown
Member

I also wonder if it's time to start thinking about applying multiple quirks.

No objections. I leave it up to you to decide if you'd like to tackle this in the same PR or a separate one. :)

@katrinafyi

Copy link
Copy Markdown
Member Author

Added multiple quirk support in 93eef4a. It isn't actually used yet, because the github line quirk deletes fragments so the github markdown fragment quirk won't apply after it. But it was a very simple change so I just did it here.

@thomas-zahner thomas-zahner left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kait! I like the new debug logs and it makes so much sense that quirks aren't isolated so that it's possible to have multiple quirks applied to a single request 👍

@thomas-zahner thomas-zahner merged commit 62545ef into lycheeverse:master Apr 6, 2026
7 checks passed
@mre mre mentioned this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When link highlights code like #L23-L28 it fails

3 participants