refactor: check glob validity when parsing input source arguments#1869
Conversation
now, when providing an input with glob characters but an invalid glob
pattern, this will cause a "cannot parse inputs" error rather than
erroring out later during link checking. this makes use of the
existing ErrorKind::InvalidGlobPattern error case.
for example, a glob with unbalanced brackets is invalid and now looks
like this:
```console
$ cargo run -- 'noasdjfi/['
Error: Cannot parse inputs from arguments
Caused by:
0: UNIX glob pattern is invalid
1: Pattern syntax error near position 9: invalid range pattern
```
i've also added some TODO notes which would make the logic neater.
i didn't do those refactors in this PR because i am mindful of another
currently-open (and much bigger) PR which touches these lines
(https://www.github.com/lycheeverse/lychee/pull/1837)
i wonder if the reason FsGlob stores strings rather than a Pattern is
for flexibility to switch to different glob packages.
this PR was prompted by findings in
lycheeverse#1864 (comment)
Good question, I wouldn't think so but then I'm not the author. @mre might know the answer to that. Otherwise I have no objections to this PR. |
fcdf77c to
e0912ab
Compare
If you're asking me, I haven't tried. I think the change would be pretty easy. But it's the kind of thing which is so obvious that if it hasn't been done already, then there's probably a logical reason why it hasn't. That's my reasoning at least. If you're asking @ mre, then we have to wait for him :) |
|
Sorry for the late reply.
That's a sensible approach in general. In this specific case however, I have to disappoint you. It might just have been an oversight on my side. 😆 Well, perhaps one issue might be serializing/deserializing those values when writing/reading the config. The |
Don't worry about this. My PR is stale anyway, so I'd have to rebase it. So feel free to make any changes you like. |
unfortunately, the `glob_with` function (which walks the dirs and performs the globbing) does _not_ support taking a Pattern as input, so we have to convert it back to a string at that point. this is a bit weird, but i think it's still a bit nicer.
ths parsing logic is also changed to use early return so there is less nesting
Conflicts: lychee-bin/tests/cli.rs lychee-lib/src/collector.rs lychee-lib/src/types/input/input.rs
|
Thanks for your comments! I've done the two extra TODOs within this PR. The InputSource parsing is now in InputSource::new and the FsGlob field is a Pattern. As @mre foreshadowed, we have to do some custom serialisation/deserialisation, but it's not too much trouble. |
thomas-zahner
left a comment
There was a problem hiding this comment.
@katrinafyi Thanks, I really like it. This makes the code quite a bit cleaner and usability is also improved. I have just a few small remarks.
| fn test_pattern_serialization_is_original_pattern() { | ||
| let pat = "asd[f]*"; | ||
| assert_eq!( | ||
| to_json(&InputSource::FsGlob { |
There was a problem hiding this comment.
and then unwrap both results here when we expect two Oks
Co-authored-by: Thomas Zahner <thomas.zahner@protonmail.ch>
now, when providing an input with glob characters but an invalid glob pattern, this will cause a "cannot parse inputs" error rather than erroring out later during link checking. this makes use of the existing ErrorKind::InvalidGlobPattern error case.
for example, a glob with unbalanced brackets is invalid and now looks like this:
i've also added some TODO notes which would make the logic neater. i didn't do those refactors in this PR because i am mindful of another currently-open (and much bigger) PR which touches these lines (https://www.github.com/lycheeverse/lychee/pull/1837)
i wonder if the reason FsGlob stores strings rather than a Pattern is for flexibility to switch to different glob packages.
this PR was prompted by findings in #1864 (comment)