Skip to content

Check fragments for remote URLs only for certain MIME types #1737

@MichaIng

Description

@MichaIng

Currently, with fragment checking enabled, all remote URLs are (tried to be) downloaded completely, into RAM and passed to the fragment checker as if it was an HTML document, causing unnecessary traffic, memory usage, and often failures in the fragment checker for binary files.

#1733 aims to solve this for most cases, skipping fragment checking if there is no non-empty fragment in the URL. It is however not ruled out that fragments are handled server-side, hence valid URLs with intentional fragments to non-HTML files. For such cases, it would be great to additionally check the content-type of the HTTP response, and invoke the fragment checker only if it can actually handle that type.

Additional condition to start with: https://github.com/lycheeverse/lychee/blob/master/lychee-lib/src/checker/website.rs#L100

response.headers().get("content-type").is_some_and(|x| x.starts_with("text/html"))

Also text/markdown would be possible, adjusting the file type for the fragment checker to file_type: crate::FileType::Markdown accordingly.

Since text/markdown does not seem to be widely used, not automatically served for .md file extensions by latest Apache2 at least, neither by GitHub, we could additionally accept text/plain, if the URL path ends with .md.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions