Skip to content

feat: Add support for ranges in the --accept option / config field#1167

Merged
mre merged 23 commits into
lycheeverse:masterfrom
Techassi:feat/accept-range-selectors
Sep 17, 2023
Merged

feat: Add support for ranges in the --accept option / config field#1167
mre merged 23 commits into
lycheeverse:masterfrom
Techassi:feat/accept-range-selectors

Conversation

@Techassi

Copy link
Copy Markdown
Contributor

Adds support for accept ranges discussed in #1157. This allows the user to specify custom HTTP status codes accepted during checking and thus will report as valid (not broken). The accept option only supports specifying status codes as a comma-separated list. With this PR, the option will accept a list of status code ranges formatted like this:

accept = ["100..=103", "200..=299", "403"]

These combinations will be supported: ..<end>, ..=<end>, <start>..<end> and <start>..=<end>. The behaviour is copied from the Rust Range like concepts:

  • ..<end>, includes 0 to <end> (exclusive)
  • ..=<end>, includes 0 to <end> (inclusive)
  • <start>..<end>, includes <start> to <end> (exclusive)
  • <start>..=<end>, includes <start> to <end> (inclusive)

This draft PR only implements some foundation work for this feature. Once all functionality is added, the PR will be ready to review.

@mre mre left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a draft, but I thought I might add some early feedback. Keep it up! 😃

Comment thread lychee-lib/src/types/accept/range.rs Outdated
Comment thread lychee-lib/src/types/accept/range.rs Outdated
Comment thread lychee-lib/src/types/accept/range.rs
Comment thread lychee-lib/src/types/accept/selector.rs Outdated
Comment thread lychee-lib/src/types/accept/selector.rs Outdated
Comment thread lychee-lib/src/types/accept/selector.rs Outdated
Comment thread lychee-lib/src/types/accept/selector.rs
@Techassi

Techassi commented Aug 1, 2023

Copy link
Copy Markdown
Contributor Author

The new accept syntax is now included in the CLI. I opted to use a string of comma-separated ranges instead of a Vec of ranges. This allows users to specify ranges like --accept "100..=199,200,404" instead of

--accept "100..=199" --accept 200 --accept 404

One of the tests keeps failing, which boggles my mind because the test case should be unaffected by my changes. Maybe you can take a look @mre

Comment thread fixtures/configs/smoketest.toml Outdated
@Techassi Techassi requested a review from mre August 6, 2023 13:11
@mre

mre commented Aug 17, 2023

Copy link
Copy Markdown
Member

Works for me.
BTW, it also works without quotes, it seems:

lychee --accept 100..=199,200,404 -- README.md

One thing that I would love to support would be a mix of both styles:

lychee --accept 100..=199,200 --accept 404 -- README.md

But I guess that's a trickier change.
We'd have to accept a Vec of AcceptSelector in the options and merge them. 🤔

The reason is, that I'm a bit afraid of breaking changes, but then again I don't know many people who set multiple accept args right now, so it might be fine?

@Techassi

Techassi commented Aug 18, 2023

Copy link
Copy Markdown
Contributor Author

Works for me. BTW, it also works without quotes, it seems:

lychee --accept 100..=199,200,404 -- README.md

Ah that's neat! I usually include quotes just to be sure...

One thing that I would love to support would be a mix of both styles:

lychee --accept 100..=199,200 --accept 404 -- README.md

But I guess that's a trickier change. We'd have to accept a Vec of AcceptSelector in the options and merge them. 🤔

The reason is, that I'm a bit afraid of breaking changes, but then again I don't know many people who set multiple accept args right now, so it might be fine?

Yeah that's definitely a harder change. However, just like you mentioned, the chance people use multiple --accept parameters is slim. So yes, we should be good to go.


I can also open a separate PR, which explores further improvements, like merging multiple AcceptSelectors and improving the internal merging of AcceptRanges in AcceptSelector.

@mre

mre commented Aug 26, 2023

Copy link
Copy Markdown
Member

We didn't add 403 to the range of accepted status codes yet, or?
That was at least my intention when I opened #1157.
Should we add 403 in this pull request?

@Techassi

Copy link
Copy Markdown
Contributor Author

No, not yet - but we can add it. How would we go about doing that? I think there are two ways:

  • Providing a Default impl for AcceptSelector and changing the accept argument to AcceptSelector instead of Option<AcceptSelector>. This also requires us to use the clap default value attribute.
  • We could provide a set of pre-defined accepted status codes here
    None => None,

@mre

mre commented Aug 27, 2023

Copy link
Copy Markdown
Member

Yeah, Default is the idiomatic way to go imho.

@Techassi

Copy link
Copy Markdown
Contributor Author

Okay, I will add this. Then we should be ready to merge!

@Techassi

Techassi commented Sep 3, 2023

Copy link
Copy Markdown
Contributor Author

Added Default and Display for AcceptSelector. The CLI now accepts AcceptSelector and uses default_value_t when the user doesn't provide a custom value.

@mre

mre commented Sep 5, 2023

Copy link
Copy Markdown
Member

Some failing unit tests. Apart from that looks great.

Okay this was quite a dig. Clap uses the default value not as is, but will
use `to_string` (from the Display impl) to convert the default value into
a string and then re-parses it via the `FromStr` trait. The `Display` impl
from `AcceptSelector` included surrounding square brackets `[ ]` which failed
to parse.

cargo expand really helped here, as I had to dig into the code generated by the Clap `Parser` derive macro.

I'm still unsure why Clap handles default values in this way, but there surely is a reason for it. I'm just too lazy to do further research into that.
@Techassi

Techassi commented Sep 9, 2023

Copy link
Copy Markdown
Contributor Author

Okay, this was quite a dig. Clap uses the default value not as is but will use to_string (from the Display impl) to convert it into a string and then re-parses it via the FromStr trait. See generated code here:

let arg = arg
    .help("A List of accepted status codes for valid links")
    .long_help(None)
    .short('a')
    .long("accept")
    .default_value({
        static DEFAULT_VALUE: clap::__derive_refs::once_cell::sync::Lazy<String> =
            clap::__derive_refs::once_cell::sync::Lazy::new(|| {
                let val: AcceptSelector =
                    <AcceptSelector as ::std::default::Default>::default();
                ::std::string::ToString::to_string(&val)
            });
        let s: &'static str = &*DEFAULT_VALUE;
        s
    })
    // ...

The Display impl from AcceptSelector included surrounding square brackets [ ], which failed to parse.

@mre

mre commented Sep 9, 2023

Copy link
Copy Markdown
Member

OMG, I would never have found that. Kudos!

@mre

mre commented Sep 9, 2023

Copy link
Copy Markdown
Member

The config test failed because fixtures/configs/cache.toml didn't have an accept entry.
I've extended the error message to make that easier to troubleshoot in the future:

Running `target/debug/lychee --config fixtures/configs/cache.toml .`
[ERROR] Error while loading config file "fixtures/configs/cache.toml": Failed to parse configuration file: TOML parse error at line 1, column 1
  |
1 | cache = true
  | ^^^^^^^^^^^^
missing field `accept`

The default doesn't get picked up, because there is no default value for serde deserialization.
Notably, this is different from clap's default for the command-line; it has to be set separately.

I removed accept from pub(crate) fn merge(&mut self, toml: Config) { ... } and added the serde default.
My guess is, that we could also remove some other stuff from merge, but I didn't touch that for this PR.

The final failing test will be a satisfying one, so I left it up to you. ❤️
It is about adding the new default to the config file. Check the failing test to see what I mean. 😉 🚀
Good job!

@Techassi

Techassi commented Sep 9, 2023

Copy link
Copy Markdown
Contributor Author

I've extended the error message to make that easier to troubleshoot in the future

Yeah, that's nice. I was kind of clueless about what exactly it complained about.

The final failing test will be a satisfying one, so I left it up to you. ❤️
It is about adding the new default to the config file. Check the failing test to see what I mean. 😉 🚀

Ha! Thanks for saving me the last error 😏

I will look into it!

@Techassi

Copy link
Copy Markdown
Contributor Author

All tests should now run without issues. However, the failing tests (3) succeeded perfectly on my local machine. They are all related to Wayback testing. The test cases even state that they are flaky.

Let me know how we want to proceed.

@mre mre left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out really great!

The flaky tests are a bit of an issue, but let's tackle that at a later point.
I tried to fix that once with multiple retries, but it seems that doesn't always do the trick.
We might have to disable these checks or put them into a different, non-default group. 🤷‍♀️

The changes in this PR make a lot of sense and the code quality is great, so I don't mind merging and releasing it. 🚀 Thanks for the contribution!

@mre mre merged commit 1b1fd0c into lycheeverse:master Sep 17, 2023
@Techassi

Copy link
Copy Markdown
Contributor Author

Thanks! I loved working on this. Can't wait to tackle future improvements :)

@nvuillam

nvuillam commented Jan 7, 2024

Copy link
Copy Markdown

This PR creates a breaking change between lychee 0.13.0 and 0.14.0 :(

#1338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants