Currently, with fragment checking enabled, all remote URLs are (tried to be) downloaded completely, into RAM and passed to the fragment checker as if it was an HTML document, causing unnecessary traffic, memory usage, and often failures in the fragment checker for binary files.
#1733 aims to solve this for most cases, skipping fragment checking if there is no non-empty fragment in the URL. It is however not ruled out that fragments are handled server-side, hence valid URLs with intentional fragments to non-HTML files. For such cases, it would be great to additionally check the content-type of the HTTP response, and invoke the fragment checker only if it can actually handle that type.
Additional condition to start with: https://github.com/lycheeverse/lychee/blob/master/lychee-lib/src/checker/website.rs#L100
response.headers().get("content-type").is_some_and(|x| x.starts_with("text/html"))
Also text/markdown would be possible, adjusting the file type for the fragment checker to file_type: crate::FileType::Markdown accordingly.
Since text/markdown does not seem to be widely used, not automatically served for .md file extensions by latest Apache2 at least, neither by GitHub, we could additionally accept text/plain, if the URL path ends with .md.
Currently, with fragment checking enabled, all remote URLs are (tried to be) downloaded completely, into RAM and passed to the fragment checker as if it was an HTML document, causing unnecessary traffic, memory usage, and often failures in the fragment checker for binary files.
#1733 aims to solve this for most cases, skipping fragment checking if there is no non-empty fragment in the URL. It is however not ruled out that fragments are handled server-side, hence valid URLs with intentional fragments to non-HTML files. For such cases, it would be great to additionally check the
content-typeof the HTTP response, and invoke the fragment checker only if it can actually handle that type.Additional condition to start with: https://github.com/lycheeverse/lychee/blob/master/lychee-lib/src/checker/website.rs#L100
Also
text/markdownwould be possible, adjusting the file type for the fragment checker tofile_type: crate::FileType::Markdownaccordingly.Since
text/markdowndoes not seem to be widely used, not automatically served for.mdfile extensions by latest Apache2 at least, neither by GitHub, we could additionally accepttext/plain, if the URL path ends with.md.