Skip to content

refactor: check glob validity when parsing input source arguments#1869

Merged
thomas-zahner merged 9 commits into
lycheeverse:masterfrom
rina-forks:glob-error
Nov 10, 2025
Merged

refactor: check glob validity when parsing input source arguments#1869
thomas-zahner merged 9 commits into
lycheeverse:masterfrom
rina-forks:glob-error

Conversation

@katrinafyi

Copy link
Copy Markdown
Member

now, when providing an input with glob characters but an invalid glob pattern, this will cause a "cannot parse inputs" error rather than erroring out later during link checking. this makes use of the existing ErrorKind::InvalidGlobPattern error case.

for example, a glob with unbalanced brackets is invalid and now looks like this:

$ cargo run -- 'noasdjfi/['
Error: Cannot parse inputs from arguments

Caused by:
    0: UNIX glob pattern is invalid
    1: Pattern syntax error near position 9: invalid range pattern

i've also added some TODO notes which would make the logic neater. i didn't do those refactors in this PR because i am mindful of another currently-open (and much bigger) PR which touches these lines (https://www.github.com/lycheeverse/lychee/pull/1837)

i wonder if the reason FsGlob stores strings rather than a Pattern is for flexibility to switch to different glob packages.

this PR was prompted by findings in #1864 (comment)

now, when providing an input with glob characters but an invalid glob
pattern, this will cause a "cannot parse inputs" error rather than
erroring out later during link checking. this makes use of the
existing ErrorKind::InvalidGlobPattern error case.

for example, a glob with unbalanced brackets is invalid and now looks
like this:
```console
$ cargo run -- 'noasdjfi/['
Error: Cannot parse inputs from arguments

Caused by:
    0: UNIX glob pattern is invalid
    1: Pattern syntax error near position 9: invalid range pattern
```

i've also added some TODO notes which would make the logic neater.
i didn't do those refactors in this PR because i am mindful of another
currently-open (and much bigger) PR which touches these lines
(https://www.github.com/lycheeverse/lychee/pull/1837)

i wonder if the reason FsGlob stores strings rather than a Pattern is
for flexibility to switch to different glob packages.

this PR was prompted by findings in
lycheeverse#1864 (comment)
@thomas-zahner

Copy link
Copy Markdown
Member

i wonder if the reason FsGlob stores strings rather than a Pattern is for flexibility to switch to different glob packages.

Good question, I wouldn't think so but then I'm not the author. @mre might know the answer to that.
Have you tried to change the type to Pattern? I would much prefer that if it is possible without too much effort.

Otherwise I have no objections to this PR.

@thomas-zahner thomas-zahner force-pushed the master branch 2 times, most recently from fcdf77c to e0912ab Compare October 21, 2025 12:53
@katrinafyi

Copy link
Copy Markdown
Member Author

Have you tried to change the type to Pattern?

If you're asking me, I haven't tried. I think the change would be pretty easy. But it's the kind of thing which is so obvious that if it hasn't been done already, then there's probably a logical reason why it hasn't. That's my reasoning at least.

If you're asking @ mre, then we have to wait for him :)

@mre

mre commented Nov 7, 2025

Copy link
Copy Markdown
Member

Sorry for the late reply.

If you're asking me, I haven't tried. I think the change would be pretty easy. But it's the kind of thing which is so obvious that if it hasn't been done already, then there's probably a logical reason why it hasn't. That's my reasoning at least.

That's a sensible approach in general. In this specific case however, I have to disappoint you. It might just have been an oversight on my side. 😆 Well, perhaps one issue might be serializing/deserializing those values when writing/reading the config. The glob crate does not support this. We'd have to find a workaround.

@mre

mre commented Nov 7, 2025

Copy link
Copy Markdown
Member

i've also added some TODO notes which would make the logic neater. i didn't do those refactors in this PR because i am mindful of another currently-open (and much bigger) PR which touches these lines (github.com/lycheeverse/lychee/pull/1837)

Don't worry about this. My PR is stale anyway, so I'd have to rebase it. So feel free to make any changes you like.

unfortunately, the `glob_with` function (which walks the dirs and
performs the globbing) does _not_ support taking a Pattern as input, so
we have to convert it back to a string at that point. this is a bit
weird, but i think it's still a bit nicer.
ths parsing logic is also changed to use early return so there is less
nesting
Conflicts:
lychee-bin/tests/cli.rs
lychee-lib/src/collector.rs
lychee-lib/src/types/input/input.rs
@katrinafyi

Copy link
Copy Markdown
Member Author

Thanks for your comments! I've done the two extra TODOs within this PR. The InputSource parsing is now in InputSource::new and the FsGlob field is a Pattern. As @mre foreshadowed, we have to do some custom serialisation/deserialisation, but it's not too much trouble.

@thomas-zahner thomas-zahner left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katrinafyi Thanks, I really like it. This makes the code quite a bit cleaner and usability is also improved. I have just a few small remarks.

Comment thread lychee-lib/src/types/input/source.rs Outdated
fn test_pattern_serialization_is_original_pattern() {
let pat = "asd[f]*";
assert_eq!(
to_json(&InputSource::FsGlob {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then unwrap both results here when we expect two Oks

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's done ~

Comment thread lychee-lib/src/types/input/source.rs Outdated
Comment thread lychee-lib/src/types/input/source.rs Outdated
katrinafyi and others added 2 commits November 10, 2025 19:48
@thomas-zahner thomas-zahner merged commit 100caf6 into lycheeverse:master Nov 10, 2025
7 checks passed
@mre mre mentioned this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants