Skip to content

Introduce --root-dir#1576

Merged
mre merged 26 commits into
lycheeverse:masterfrom
trask:root-path
Dec 13, 2024
Merged

Introduce --root-dir#1576
mre merged 26 commits into
lycheeverse:masterfrom
trask:root-path

Conversation

@trask

@trask trask commented Nov 28, 2024

Copy link
Copy Markdown
Contributor

Related to #452 (comment)

I tried a couple of more ambitious refactoring, but they kept spiraling out of (my) control, so tried to keep it minimal.

Comment thread lychee-lib/src/utils/path.rs Outdated
@trask

trask commented Nov 28, 2024

Copy link
Copy Markdown
Contributor Author

Also, I don't understand the (remaining) CI failures if you could give some hint, thanks!

Update: there are only lint failures remaining now which seem unrelated to this PR, though I'm probably missing something

@trask trask marked this pull request as ready for review November 28, 2024 22:13
Comment thread lychee-lib/src/utils/request.rs
Comment thread lychee-lib/src/types/input.rs Outdated
Comment thread lychee-lib/src/lib.rs
Comment thread lychee-lib/src/utils/request.rs Outdated
Comment thread lychee-lib/src/types/base.rs Outdated
@mre

mre commented Nov 30, 2024

Copy link
Copy Markdown
Member

Hey @trask,

this is a very solid attempt and a very long-standing issue of the project, so thanks a lot for looking into that!

I haven't checked all the edge-cases yet, but I think you're on the right track.
However, I wonder if it supports setting base-url and root-path at the same time? They serve a different purpose: base-url is used for relative URLs, while --root-path is used for absolute URLs. Setting both at the same time is expected to be a common scenario.
I've added an explanation to the section in the code that seemed the most relevant.

On another note, we could use the opportunity and deprecate --offline at the same time.
See #668 for more information.
In order to deprecate it, we should add a warning message when it's used and update the help message accordingly to
point people to the new --root-path option.
We can also add a note in the README about the deprecation and create a pull request to update the documentation
at https://github.com/lycheeverse/lycheeverse.github.io.

But we don't have to do that in one go. Let's take it step by step.

@trask

trask commented Dec 1, 2024

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! I've incorporated your suggestions.

Comment thread lychee-lib/src/utils/path.rs
Comment thread lychee-lib/src/utils/path.rs
Comment thread lychee-lib/src/utils/path.rs
Comment thread lychee-lib/src/utils/path.rs
@trask trask changed the title Introduce --root-path Introduce --root-dir Dec 6, 2024
@mre

mre commented Dec 7, 2024

Copy link
Copy Markdown
Member

Awesome. Thanks!

From my side, this is good to go. Let's wait for the feedback of at least one other person and then we can merge it.

@trask trask mentioned this pull request Dec 10, 2024
@mre

mre commented Dec 10, 2024

Copy link
Copy Markdown
Member

@george-gca, any updates? Did you find the time to give it a try?

@thomas-zahner

thomas-zahner commented Dec 10, 2024

Copy link
Copy Markdown
Member

Code-wise it looks good. But now I'm confused about its usage.

When I'm in the fixtures directory I do:

$ cargo run -- resolve_paths_from_root_dir/ --root-dir resolve_paths_from_root_dir/

Issues found in 1 input. Find details below.

[resolve_paths_from_root_dir/nested/index.html]:
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/about/index.html#missing | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/about/index.html#fragment | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/another%20page | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested | Cannot find file
   [ERROR] file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested/about | Cannot find file

🔍 7 Total (in 0s) ✅ 2 OK 🚫 5 Errors

But:

$ cargo run -- resolve_paths_from_root_dir/ --root-dir ..

🔍 7 Total (in 0s) ✅ 7 OK 🚫 0 Errors

@trask Why isn't the first one working? Why is the second working?

@trask

trask commented Dec 10, 2024

Copy link
Copy Markdown
Contributor Author

@trask Why isn't the first one working? Why is the second working?

relative --root-dir paths are definitely odd (and not covered currently by the cli.rs tests)

--root-dir is prepended to all absolute local links

and so when --root-dir is a relative path, it effectively turns absolute local links into relative links

e.g. running

cargo run -- resolve_paths_from_root_dir/ --root-dir resolve_paths_from_root_dir/

turns the absolute local link /nested into a relative link resolve_paths_from_root_dir/nested which get resolved against the html file's directory and results in file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested/resolve_paths_from_root_dir/nested

and running

cargo run -- resolve_paths_from_root_dir/ --root-dir ..

turns the absolute local link /nested into a relative link ../nested which get resolved against the html file's directory and results in file:///home/thomas/Projects/lychee/fixtures/resolve_paths_from_root_dir/nested

It probably makes more sense to resolve relative --root-dir into an absolute dir, I'll give this a try...

@mre

mre commented Dec 10, 2024

Copy link
Copy Markdown
Member

Good point.

An alternative would be to prevent relative links by throwing an error. It would be simpler to implement while avoiding ambiguity and unwanted side effects.

The help text would need to be updated to indicate that an absolute directory is expected.

We could start with a conservative design and allow for future support of relative paths if we see a use-case.

@trask

trask commented Dec 10, 2024

Copy link
Copy Markdown
Contributor Author

An alternative would be to prevent relative links by throwing an error. It would be simpler to implement while avoiding ambiguity and unwanted side effects.

The help text would need to be updated to indicate that an absolute directory is expected.

done!

@mre

mre commented Dec 11, 2024

Copy link
Copy Markdown
Member

@thomas-zahner, what do you think? Can you try it again?

@george-gca

george-gca commented Dec 11, 2024

Copy link
Copy Markdown

Wouldn't it be better to accept both absolute and relative path, and handle them internally? Maybe doing something like suggested in these answers on stackoverflow? One uses PathBuf itself with std::fs::canonicalize, the other would add a new dependency path-clean.

@george-gca

george-gca commented Dec 11, 2024

Copy link
Copy Markdown

@mre sorry for the late reply, I haven't had the time during the weekend. Now, would this be useful for my case? How should I use it?

@mre

mre commented Dec 11, 2024

Copy link
Copy Markdown
Member

Ah, sorry. It's not applicable for your case because you have a more complex routing setup. root-dir is only if you want to prepend an absolute path before a relative link.

Comment thread lychee-bin/src/main.rs Outdated
let mut collector = Collector::new(opts.config.base.clone())
if let Some(root_dir) = &opts.config.root_dir {
if root_dir.is_relative() {
bail!("`--root_dir` must be an absolute path");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing; I think we should move this to Collector::new. lychee can also be used as library through lychee-lib so that's another entrypoint where this invariant should be enforced

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


impl Collector {
/// Create a new collector with an empty cache
#[must_use]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy told me to remove this:

warning: this function has a #[must_use] attribute with no message, but returns a type already marked as #[must_use]

@norswap

norswap commented Dec 13, 2024

Copy link
Copy Markdown

Just tested it locally and it works as expected. Good work! 🥳

@norswap, @dkarlovi, @vanbroup, @george-gca, can any of you test this version and provide some feedback? Does it cover your use-cases? 🙏

Unfortunately I haven't been at the company at which we were using this for more than one year now :') They've also moved the specs this was to be used on, so it's hard to say.

@thomas-zahner

Copy link
Copy Markdown
Member

@trask Awesome, thank you 👍
From my side this looks good @mre

@mre mre merged commit 6d0e94c into lycheeverse:master Dec 13, 2024
@mre

mre commented Dec 13, 2024

Copy link
Copy Markdown
Member

Great! Thanks for your work @trask! 👏

@mre

mre commented Jan 6, 2025

Copy link
Copy Markdown
Member

I changed my mind regarding relative path handling for --root-dir. Toying around with the feature and writing the docs for --root-dir, I found the lack of support for . to be a bit tedious.

# ✅ Good
lychee --root-dir "$(pwd)/public"

# ❌ Bad: will fail
lychee --root-dir ./public/

Maybe we should implement what @george-gca suggested all along: we simply always canonicalize the root-dir:

use std::fs::canonicalize;
use std::path::PathBuf;

fn main() {
        let path = PathBuf::from(".");
        println!("{:?}", canonicalize(path));
}

@trask, @george-gca, @thomas-zahner, I've created a separate issue for that here: #1606.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants