set user-agent when fetching remote input URLs by mre · Pull Request #2099 · lycheeverse/lychee

mre · 2026-03-25T00:11:45Z

This is a veeery conservative attempt at fixing #1886.

Previously, the reqwest::Client used by the UrlContentResolver (which fetches CLI URL inputs) was built without a user-agent. This meant that passing a URL directly as a CLI argument resulted in a request with no user-agent header, unlike links discovered during checking which correctly use the configured user-agent.

This caused subtle bugs such as Wikipedia returning a 403 for CLI inputs (since it requires a user-agent), resulting in lychee finding zero links on pages that actually contain hundreds.

Fixed by storing the configured user-agent on the Collector and using it to build the resolver's reqwest::Client.

Let me say that I believe the collector is already doing way too much and that this is probably not the right approach going forward. The fact that the collector is responsible for building a reqwest::Client for the resolver is a sign of tight coupling and a violation of separation of concerns. The collector should ideally only be responsible for collecting links, while the resolver should be responsible for fetching content.

That said, I opened the PR to get some early feedback and perhaps to fix the immediate issue. I also wanted to avoid a larger refactor that would touch many parts of the codebase and potentially cause more disruption in the short term. (See below.)

As an alternative, I considered to unify the two code paths where we deal with "building request clients." Instead of the Collector using its own bare reqwest::Client, we can make it reuse the same fully-configured lychee_lib::Client (with rate limiting, retries, per-host config, TLS settings, etc.) that's used for link checking. The original issue already notes that unifying these paths would also help with recursion support. The downside is it's a much larger refactor touching the Client API.
But it's probably still worth it? I can create another PR for that and we can compare.
I'm a bit hesitant to do it because large PRs have caused some churn in the past and it took a while to get them merged. But I'm willing to do it if we think it's the right move. I just don't want to cause too much disruption in the short term.

Let me know what you think!

@katrinafyi @thomas-zahner @cristiklein feedback welcome.

Previously, the reqwest::Client used by the UrlContentResolver (which fetches CLI URL inputs) was built without a user-agent. This meant that passing a URL directly as a CLI argument resulted in a request with no user-agent header, unlike links discovered during checking which correctly use the configured user-agent. This caused subtle bugs such as Wikipedia returning a 403 for CLI inputs (since it requires a user-agent), resulting in lychee finding zero links on pages that actually contain hundreds. Fix by storing the configured user-agent on the Collector and using it to build the resolver's reqwest::Client. Fixes #1886

thomas-zahner · 2026-03-26T14:33:53Z

I think we're all agreeing to go your less conservative approach proposed in #2100.

mre added 2 commits March 25, 2026 00:57

test: add CLI tests for user-agent on remote input URLs

85b4774

mre requested review from katrinafyi and thomas-zahner March 25, 2026 00:11

mre mentioned this pull request Mar 25, 2026

refactor: unify input URL fetching with the link-checker's HostPool #2100

Merged

thomas-zahner closed this Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

set user-agent when fetching remote input URLs#2099

set user-agent when fetching remote input URLs#2099
mre wants to merge 2 commits into
masterfrom
fix/user-agent-cli-inputs

mre commented Mar 25, 2026

Uh oh!

thomas-zahner commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mre commented Mar 25, 2026

Uh oh!

thomas-zahner commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants