Check text fragments by cristiklein · Pull Request #2138 · lycheeverse/lychee

cristiklein · 2026-04-12T15:39:42Z

This PR adds --include-text-fragments which checks text fragments. --include-fragments's behavior is changed to only check the anchor part.

Hence, for a fragment like #anchor:~:text=some-text, the user may choose between three behaviors:

--include-fragments checks only if #anchor is a valid anchor.
--include-text-fragments checks only if some-text is a valid text.
--include-fragments --include-text-fragments checks both.

Decisions to be challenged:

The text fragment checker uses only html5ever. I found no good reason to also include an implementation based on html5gum.
The text fragment checker extracts all visible text, then checks the text fragment. This is rather wasteful on memory compared to a state-machine-based approach. I opted for the more memory-intense approach, both due to simplicity and because we might want to support multiple text fragments in the future. (If that ever gets requested.)
Extracting visible text assume all tags contain visible text, except head, script, style and template. I believe this is a good start, in particular head also filters out title and meta. However, I see this more like a starting point, than an academically correct response. I'm hoping for more user feedback here.

cristiklein · 2026-04-12T15:55:46Z

@SKalt wanna take this for a spin?

thomas-zahner · 2026-04-13T09:04:14Z

Wow, thank you for this nice PR! 🚀
It looks great already. Just last week I've started to rework #1600 via https://github.com/thomas-zahner/lychee/tree/text-fragment-v2 but now with this great PR it probably makes sense to continue here. (I've also considered to do a full rewrite instead of using #1600 as a starting point)

Ideally we should respect the use_html5ever option/flag to align with the other parsing behaviour.
This indeed sounds quite wasteful. I will take a closer look and will think about how this can be improved.
This sounds sensible.

One more thing that could be debated is if --include-text-fragments makes sense. In my rework on #1600 I've integrated the feature into --include-fragments, after all text fragments are fragments. Do you see a use-case in checking text fragments but not checking any other fragments? I'm currently in favour of a single flag.

katrinafyi · 2026-04-13T10:41:28Z

to make it faster, something like an inverted index: https://swtch.com/~rsc/regexp/regexp4.html. then, to search for a string, you can just go through each word inside the search string, in turn, and look at its indices and check they're contiguous.

but I don't know if we'll see text fragments often enough for preprocessing to make sense. if they're rare, matching on the fly as you go through the html cursor could be fine.

I also agree that the two include fragment options should be folded together.

cristiklein · 2026-04-13T12:23:01Z

@thomas-zahner and @katrinafyi Thank you for your feedback!

Ideally we should respect the use_html5ever option/flag to align with the other parsing behaviour.

Do I understand correctly that you want to also see an implementation based on html5gun before merging this PR?

but I don't know if we'll see text fragments often enough for preprocessing to make sense. if they're rare, matching on the fly as you go through the html cursor could be fine.

This sounds like a fascinating idea, but I agree that text anchor might be too rare for this to make sense.

I also agree that the two include fragment options should be folded together.

Can I challenge this a bit? In my experience, text fragments are less stable than anchor fragments. I could imagine users wanting to perform anchor fragment checking (as they are more stable and can be seen as an "API" of the webpage), but not text fragment checking (as they might be less stable and are definitely not an "API" of the webpage).

Not sure this is a good enough reason to separate the two checks, but I felt a need to point it out. 😄 I'm fine with whatever is decided.

katrinafyi · 2026-04-13T12:37:08Z

In my experience, text fragments are less stable than anchor fragments.

True, that's a really good point, thanks for bringing it up. To be honest, I've never used text fragments myself.

How about we combine it with --include-fragments but make it take an argument string that's none, anchor, text, or all? I think that's possible with clap and I think joining them together makes sense and makes the features more discoverable. You should also be able to write --include-fragments without a value and it would default to.. "anchor", probably.

thomas-zahner · 2026-04-13T13:01:05Z

In my experience, text fragments are less stable than anchor fragments.

Text fragments are also supposed to get/become an API at some point. But it's true that it's a newer feature and maybe still not fully mature.

How about we combine it with --include-fragments but make it take an argument string that's none, anchor, text, or all?

Yes, that would be a middle ground. Something like the following?

enum FragmentMode {
  /// Don't do any fragment checking
  #[default]
  None,
  /// Check text anchors but not text fragments
  Simple,
  /// Check both text anchors and text fragments
  Full,
}

I'm not really sure/happy with the above variant names, but AnchorsWithTextFragments & AnchorsWithoutTextFragments is not really any better. Admittedly, I still feel like the better approach is to make text fragment checking part of include_fragments/--include-fragments while keeping this a boolean type. I keeps UX simpler & doesn't require additional logic. Also we don't really have to fear introducing this as part of include_fragments in terms of stability/correctness as currently lychee reports an error on any text fragment so we would greatly improve correctness even if the implementation might not be fully conformant yet.

katrinafyi · 2026-04-13T12:47:46Z

+        return status;
+    }
+
+    let document = normalize_whitespace(&extract_visible_text(content));


it seems unnecessarily slow to split then join the entire document. can we just avoid emitting redundant whitespace in the callback?

alternatively, the document can emit into a Vec of whitespace-separated words and you just have to split the fragment in the same way. this would avoid all the normalising atm

Fixed in 1fbdfd4

katrinafyi · 2026-04-13T12:48:52Z

+        };
+
+        let [start] = parts.as_slice() else {
+            let [start, end] = parts.as_slice() else {


chained let/else can just be a match statement, I believe.

Fixed in ef333e4

katrinafyi · 2026-04-13T12:51:51Z

+
+fn extract_visible_text(input: &str) -> String {
+    #[derive(Default)]
+    struct TextExtractor {


I think the inner structure is too big. also, I think using default for an initial value is a misuse, just use pub fn new() :)

this feature is kinda interesting outside of lychee, so if possible it would be nice to clean up the API a little bit (even if we don't export it)

I'm not sure to fully understand your comment.

~~Could you explain why new() is better than default() in this case?~~ Nevermind, clippy explained it to me. 😄 Fixed in d68c43e

Also, to me extract_visible_text feels like a clean API (with TextExtractor being an internal implementation detail). Did you mean to clean up the API so that in the future TextExtractor could be used outside of lychee?

I'd just move TextExtractor outside of the function. I'm always surprised by inner structs and its impl is fairly big. You can keep it private and it won't be accessible outside the file.

Thanks for adding the new()!

katrinafyi · 2026-04-13T12:55:46Z

+                    let tag = String::from_utf8_lossy(name).into_owned();
+                    if TextExtractor::is_hidden_tag(&tag) {
+                        self.hidden_stack.push(tag);
+                    }


I was looking at the spec and a JS implementation and a "block" tag should terminate (prevent) any matches across its boundary. this is kinda tricky, so i'm fine with the lychee mplementation accepting too much. I just thought it was worth mentioning and maybe deserves a comment in the code.

in general, Id like to see some refrences to the spec in the comments :) is the matching algorithm based on some reference implementation?

Fixed in many places, e.g., 0f21d09

katrinafyi · 2026-04-13T13:00:39Z

+}
+
+#[cfg(test)]
+mod tests {


you seem to support it in your code, so it would be nice to have a test where the start and end could be overlapping:

start=aa, end=aa should match in aaaa but not aaa or aa

also, where the start is part of a word

start=aa, end=b should match in aab, aaab.

actually, does the spec say anything about whether a fragment match can start in the middle of a word?

I checked and I couldn't get Chrome to generate start=aa,end=aa because that would be rather ambiguous. It seems to prefer something like prefix-,aaaa,-suffix.

Regarding word boundaries, seems like the proposal wants that: https://wicg.github.io/scroll-to-text-fragment/#word-boundaries

What exactly constitutes a word boundary seems to be ... complicated (see link above). I need to think about what lychee could/should do to check if the text fragment is aligned with word boundaries.

Yeah interesting, I'm happy to leave this for now. Thanks for looking into it!

katrinafyi · 2026-04-13T13:04:07Z

+            };
+        };
+
+        let Some((element, text_directive)) = fragment.split_once(":~:text=") else {


duplicated with code in text_fragment.rs?

also I'd use "anchor" instead of element, i makes me think of html elements

duplicated with code in text_fragment.rs?

Not really. This code only splits anchor from text fragment. The code in text_fragment.rs actually parses the text fragment into prefix, start, stop and suffix. I thought to keep this separation, as fully parsing the text directive at this point is not needed.

also I'd use "anchor" instead of element, i makes me think of html elements

Fully agree, nice catch!

Fixed in 83531d9

katrinafyi · 2026-04-13T13:09:51Z

+    /// Whether to check the existence of text fragments in the response HTML files.
+    ///
+    /// Will be disabled if the request method is `HEAD`.
+    include_text_fragments: bool,


this is only in website checker ATM. as far as I can tell, it should be in FileChecker too. is there any way of doing this nicely?

honestly, fragment_checker should be be an Option field rather than carrying the adjacent include_fragments flag, and maybe the text fragments can be hooked into FragmentChecker somehow

Fixed in 239b0ac

cristiklein · 2026-04-13T13:13:18Z

How about we combine it with --include-fragments but make it take an argument string that's none, anchor, text, or all? I think that's possible with clap and I think joining them together makes sense and makes the features more discoverable. You should also be able to write --include-fragments without a value and it would default to.. "anchor", probably.

I love this! I'll produce a PR as soon as I have time.

Text fragments are also supposed to get/become an API at some point. But it's true that it's a newer feature and maybe still not fully mature.

Sorry for not being clearer. What I meant with the "API" analogy is the following: As a website author, if I want to encourage people to link to specific parts of a page, I'd ask them to use anchors. I'd do my best to keep anchor unchanged, even if the section title has changed. So website authors are very likely to include anchors in their stable interface promise.

In contrast, as a website author, I would definitely not want people to expect me to keep text stable, so that their text fragments don't break.

mre · 2026-04-14T12:08:07Z

Do I understand correctly that you want to also see an implementation based on html5gun before merging this PR?

don't know how hard it is to add html5gum support, but I would personally very much appreciate that. The reason is that we'd like to maintain feature parity between both engines. They each have their advantages and disadvantages. html5gum is faster and allows more direct/raw HTML access, while html5ever is very mature and provides some helpful high-level functionality.

mre · 2026-04-14T12:28:17Z

+    fn parse(input: &str) -> Option<Self> {
+        let mut parts: Vec<&str> = input.split(',').collect();
+        if parts.is_empty() {
+            return None;
+        }
+
+        let prefix = if parts.first().is_some_and(|part| part.ends_with('-')) && parts.len() >= 2 {
+            let prefix = parts.remove(0);
+            Some(prefix[..prefix.len() - 1].to_owned())
+        } else {
+            None
+        };
+
+        let suffix = if parts.last().is_some_and(|part| part.starts_with('-')) && parts.len() >= 2 {
+            let suffix = parts.pop().expect("checked length above");
+            Some(suffix[1..].to_owned())
+        } else {
+            None
+        };
+
+        let [start] = parts.as_slice() else {
+            let [start, end] = parts.as_slice() else {
+                return None;
+            };
+
+            return Some(Self {
+                prefix,
+                start: normalize_whitespace(start),
+                end: Some(normalize_whitespace(end)),
+                suffix: suffix.map(|s| normalize_whitespace(&s)),
+            });
+        };
+
+        Some(Self {
+            prefix: prefix.map(|s| normalize_whitespace(&s)),
+            start: normalize_whitespace(start),
+            end: None,
+            suffix: suffix.map(|s| normalize_whitespace(&s)),
+        })
+    }


I personally found this a bit harder to read than it could be.

Tried my hand on a refactoring and added a few comments. Also turned the separator and markers into constants. I feel like some examples are also helpful.

/// Parses a text directive value (the part after `text=`) into its components. /// /// The format is: `[prefix-,] start [,end] [,-suffix]` /// /// # Examples /// /// ``` /// // Exact match /// let d = TextDirective::parse("an example").unwrap(); /// assert_eq!(d.start, "an example"); /// assert!(d.prefix.is_none() && d.end.is_none() && d.suffix.is_none()); /// /// // Range match /// let d = TextDirective::parse("an example,text fragment").unwrap(); /// assert_eq!(d.start, "an example"); /// assert_eq!(d.end.unwrap(), "text fragment"); /// /// // Prefix and suffix context terms /// let d = TextDirective::parse("this is-,an example,-text fragment").unwrap(); /// assert_eq!(d.prefix.unwrap(), "this is"); /// assert_eq!(d.start, "an example"); /// assert_eq!(d.suffix.unwrap(), "text fragment"); /// /// // Invalid: no start term /// assert!(TextDirective::parse("").is_none()); /// ``` fn parse(input: &str) -> Option<Self> { // The parameter separator splits the directive into its components: // [prefix-,] start [,end] [,-suffix] const PARAMETER_SEPARATOR: char = ','; // The context term marker distinguishes prefix/suffix from start/end terms. // e.g. "prefix-,start,-suffix" const CONTEXT_TERM_MARKER: char = '-'; // Literal commas in the text are percent-encoded upstream, so splitting on ',' is safe. let mut parts: Vec<&str> = input.split(PARAMETER_SEPARATOR).collect(); if parts.is_empty() { return None; } // A prefix is indicated by a trailing context term marker on the first token let prefix = if parts.first().is_some_and(|part| part.ends_with(CONTEXT_TERM_MARKER)) && parts.len() >= 2 { let prefix = parts.remove(0); Some(prefix.strip_suffix(CONTEXT_TERM_MARKER).unwrap_or(prefix).to_owned()) } else { None }; // A suffix is indicated by a leading context term marker on the last token let suffix = if parts.last().is_some_and(|part| part.starts_with(CONTEXT_TERM_MARKER)) && parts.len() >= 2 { let suffix = parts.pop()?; Some(suffix.strip_prefix(CONTEXT_TERM_MARKER).unwrap_or(suffix).to_owned()) } else { None }; // After removing prefix/suffix, exactly 1 or 2 tokens must remain (start [,end]) let (start, end) = match parts.as_slice() { [start] => (start, None), [start, end] => (start, Some(normalize_whitespace(end))), _ => return None, }; Some(Self { prefix: prefix.map(|s| normalize_whitespace(&s)), start: normalize_whitespace(start), end, suffix: suffix.map(|s| normalize_whitespace(&s)), }) }

mre · 2026-04-14T12:31:44Z

+    fn matches(&self, document: &str) -> bool {
+        if self.start.is_empty() {
+            return false;
+        }
+
+        let mut start_offset = 0;
+        while let Some(relative_start) = document[start_offset..].find(&self.start) {
+            let match_start = start_offset + relative_start;
+            let mut match_end = match_start + self.start.len();
+
+            if let Some(end) = &self.end {
+                let Some(relative_end) = document[match_end..].find(end) else {
+                    return false;
+                };
+                match_end += relative_end + end.len();
+            }
+
+            let prefix_matches = self
+                .prefix
+                .as_ref()
+                .is_none_or(|prefix| document[..match_start].trim_end().ends_with(prefix));
+            let suffix_matches = self
+                .suffix
+                .as_ref()
+                .is_none_or(|suffix| document[match_end..].trim_start().starts_with(suffix));
+
+            if prefix_matches && suffix_matches {
+                return true;
+            }
+
+            start_offset = match_start + 1;
+        }
+
+        false
+    }
+}


I think there's a bug. When end is not found, it immediately returns false, but it should continue to the next candidate match of start instead, since a later occurrence of start might have a valid end after it.

fn matches(&self, document: &str) -> bool { let mut start_offset = 0; // Find each occurrence of `start` and check whether the surrounding // context (prefix, end, suffix) also matches, per the spec's algorithm. while let Some(relative_start) = document[start_offset..].find(&self.start) { let match_start = start_offset + relative_start; let mut match_end = match_start + self.start.len(); // For range matches, find the first occurrence of `end` after `start` if let Some(end) = &self.end { let Some(relative_end) = document[match_end..].find(end.as_str()) else { // No `end` found after this `start`; try the next occurrence of `start` start_offset = match_start + 1; continue; }; match_end += relative_end + end.len(); } let prefix_matches = self .prefix .as_ref() .is_none_or(|prefix| document[..match_start].trim_end().ends_with(prefix.as_str())); let suffix_matches = self .suffix .as_ref() .is_none_or(|suffix| document[match_end..].trim_start().starts_with(suffix.as_str())); if prefix_matches && suffix_matches { return true; } start_offset = match_start + 1; } false }

I think that this is not a bug, since if there's no end starting at an earlier position, then there will still be no end when starting from a later position.

However, there was a very similar bug where it was failing to try later occurrences of end which might have a valid suffix after them.

mre · 2026-04-14T12:34:45Z

+fn parse_text_directives(url: &Url) -> Vec<TextDirective> {
+    let Some(fragment) = url.fragment() else {
+        return Vec::new();
+    };
+    let Some((_, directive)) = fragment.split_once(":~:") else {
+        return Vec::new();
+    };
+
+    url::form_urlencoded::parse(directive.as_bytes())
+        .filter(|(key, _)| key == "text")
+        .filter_map(|(_, value)| TextDirective::parse(value.as_ref()))
+        .collect()
+}


Can we make :~: a constant? Maybe also the text directive type?

fn parse_text_directives(url: &Url) -> Vec<TextDirective> { // The fragment directive delimiter separates the element fragment from the directive. // e.g. "https://example.com#section:~:text=foo" const FRAGMENT_DIRECTIVE_DELIMITER: &str = ":~:"; // The only directive type defined by the spec const TEXT_DIRECTIVE_PREFIX: &str = "text"; let Some(fragment) = url.fragment() else { return Vec::new(); }; let Some((_, directive)) = fragment.split_once(FRAGMENT_DIRECTIVE_DELIMITER) else { return Vec::new(); }; url::form_urlencoded::parse(directive.as_bytes()) .filter(|(key, _)| key == TEXT_DIRECTIVE_PREFIX) .filter_map(|(_, value)| TextDirective::parse(value.as_ref())) .collect() }

Of course, we could move all constants to the beginning of the file to group them together, but I find reading code with meaningful constants easier in general.

mre · 2026-04-14T12:36:38Z

+}
+
+fn normalize_whitespace(input: &str) -> String {
+    input.split_whitespace().collect::<Vec<_>>().join(" ")


Soon 😉 (This would avoid extra allocations if only it was stable already.)

Fixed in 1fbdfd4

mre · 2026-04-14T12:42:03Z

+            return ParsedFragment {
+                anchor_fragment: None,
+                text_directive: None,
+            };


Could be slightly cleaner if ParsedFragment implemented Default:

ParsedFragment::default()

Fixed in fb2ecb9

Oh, is there a reason why the Default trait can't be used?

https://doc.rust-lang.org/std/default/trait.Default.html

mre · 2026-04-14T12:44:31Z

+            };
+        };
+
+        let Some((element, text_directive)) = fragment.split_once(":~:text=") else {


I like constants. 😉

pub(crate) const TEXT_DIRECTIVE_SEPARATOR: &str = ":~:text=";

mre · 2026-04-14T12:46:00Z

    async fn check_default(&self, request: Request) -> Status {
        let method = request.method().clone();
        let request_url = request.url().clone();
+        let parsed_fragment = Self::parse_fragment(&request_url);


This should technically be parsed_fragments. I was confused for a moment while reading this. Thought it returned a single value and then you did parsed_fragment.anchor_fragment. The plural would make it clearer what's going on.

Outdated, I moved fragment parsing into FragmentChecker.

cristiklein · 2026-04-15T14:16:02Z

@mre and @katrinafyi Thanks tons for reviewing this PR.

I extracted the following high-level comments:

Make lychee accept --include-fragments[=<none|anchor-only|text-only|all>]
Add an html5gum implementation
Check text fragments in FileChecker too.
Somehow avoid splitting and recombining the whole document (or documenting why it's really necessary)
Add comments to link implementation to spec
Add tests for overlapping start and end
Revisit the matching code, make it more readable by adding constants

I'll keep you posted. 😄

mre · 2026-04-16T16:49:43Z

Some more lints to fix (which are unrelated to this specific change, it seems). Looks good from my side.

katrinafyi · 2026-04-18T15:24:50Z

Oh one more thing (sorry!), is there a reason you use the lower level CallbackEmitter rather than the higher level default Tokeniser? It seems to do everything we need: https://docs.rs/html5gum/latest/html5gum/index.html#html5gum

cristiklein · 2026-04-18T18:28:28Z

Oh one more thing (sorry!), is there a reason you use the lower level CallbackEmitter rather than the higher level default Tokeniser? It seems to do everything we need: https://docs.rs/html5gum/latest/html5gum/index.html#html5gum

Indeed, that looks so much nicer (0ea1412). Thanks for pointing me in the right direction.

katrinafyi

@cristiklein I had a look through this in cristiklein#1 and I think that whitespace normalisation by split is necessary and okay for now.

I also found a bug when the end string occurs multiple times. Previously, it only checked the earliest end string for each start position. This would fail to match if the first end had no suffix, even if there was a later end which had the right suffix. This is similar to mre's comment here: #2138 (comment). To fix this, I just changed the loop into a regex match which is less work and possibly faster. Handling this in the manual loop code would've needed an inner loop over all possible end positions for each start position.

I also looked at the behaviour around alternative whitespaces (using   as an example) and fixed a bug there.

And a few more test cases. Hopefully that is okay :)

katrinafyi · 2026-04-19T07:01:23Z

+    fn matches(&self, document: &str) -> bool {
+        if self.start.is_empty() {
+            return false;
+        }
+
+        let mut start_offset = 0;
+        while let Some(relative_start) = document[start_offset..].find(&self.start) {
+            let match_start = start_offset + relative_start;
+            let mut match_end = match_start + self.start.len();
+
+            if let Some(end) = &self.end {
+                let Some(relative_end) = document[match_end..].find(end) else {
+                    return false;
+                };
+                match_end += relative_end + end.len();
+            }
+
+            let prefix_matches = self
+                .prefix
+                .as_ref()
+                .is_none_or(|prefix| document[..match_start].trim_end().ends_with(prefix));
+            let suffix_matches = self
+                .suffix
+                .as_ref()
+                .is_none_or(|suffix| document[match_end..].trim_start().starts_with(suffix));
+
+            if prefix_matches && suffix_matches {
+                return true;
+            }
+
+            start_offset = match_start + 1;
+        }
+
+        false
+    }
+}


I think that this is not a bug, since if there's no end starting at an earlier position, then there will still be no end when starting from a later position.

However, there was a very similar bug where it was failing to try later occurrences of end which might have a valid suffix after them.

Check text fragments: fix matching with repeated end, fix nbsp, justify normalise

cristiklein · 2026-04-19T15:29:44Z

Really love @katrinafyi regex-based matching. So much easier to understand.

I'd say this is as good as it gets. Ready to merge from my end.

Co-authored-by: Kait <39479354+katrinafyi@users.noreply.github.com>

thomas-zahner

@cristiklein @katrinafyi Thank you for your efforts! 🚀 Remarking just a few tiny things I noticed. I'd like to include this in the next release I'm intending to do the next few days.

thomas-zahner · 2026-04-23T14:56:05Z

+    }
+
+    /// This method parses the fragment, separating the element id (if any) from the text directives (if any).
+    pub(super) fn parse(url: &'a Url) -> Self {


Hmm first time seeing pub(super) personally but also in lychee. How is it different to pub(crate) in this context? Is it less visible than pub(crate)?

Indeed, pub(super) is less visible than pub(crate). I think it expresses the intention that this method is internal only a bit better, but we can also stick to pub(crate). It doesn't really have an impact in this case. Let me know what you prefer.

Thank you, let's leave it

thomas-zahner · 2026-04-24T06:55:40Z

@cristiklein @katrinafyi The implementation really looks solid and it's a great feature. Thank you for your hard work! 🚀

thomas-zahner · 2026-04-24T06:58:24Z

One last question, have we decided that an additional html5gum implementation is not worth the effort? I don't think it's a top priority and it leads to additional code but theoretically it would be pretty nice if use_html5ever was respected in this new feature as well. Maybe I have missed a discussion.

cristiklein · 2026-04-24T07:07:28Z

Thanks all! This was fun!

Regarding use_html5ever, we opted to only implement text fragment checking using html5gum, because that's what the anchor fragment checker does.

If it's valuable, I can take a shot at an html5ever implementation.

thomas-zahner · 2026-04-24T10:13:44Z

It's definitely not urgent which is why I've merged and released it. @mre @katrinafyi What are your thoughts on the dual parser implementation?

katrinafyi reviewed Apr 13, 2026

View reviewed changes

mre reviewed Apr 14, 2026

View reviewed changes

Comment thread lychee-lib/src/checker/website.rs Outdated

mre reviewed Apr 14, 2026

View reviewed changes

Comment thread lychee-lib/src/checker/website.rs Outdated

cristiklein force-pushed the check-text-fragments branch from c6440a0 to b4b5a9a Compare April 15, 2026 14:19

thomas-zahner mentioned this pull request Apr 16, 2026

Add support for Text fragment feature (#1545) #1600

Closed

mre reviewed Apr 16, 2026

View reviewed changes

Comment thread lychee-lib/src/checker/website.rs

mre reviewed Apr 16, 2026

View reviewed changes

Comment thread lychee-lib/src/utils/fragment_checker/mod.rs Outdated

cristiklein added 6 commits April 16, 2026 20:31

Add text fragment fixtures

607c7f1

Add test for text fragments

f4c871a

Add --include-text-fragments

739a100

Wired include_text_fragments

f15bfc0

Add include_text_fragment in lychee.example.toml

4504e5d

Fix

1879b4c

Update README.md

2aaef63

Simplify extract_visible_text()

0ea1412

cristiklein and others added 7 commits April 18, 2026 20:34

Make clippy happy

3db6155

Make cargo fmt happy

978df85

add some more tests

62a939e

implement nbsp interpretation in fragment

9d94a24

upgrade is_hidden_tag

05c42f6

use regex for matching to fix bugs

10d0404

touch comment

5a5b1a3

katrinafyi reviewed Apr 19, 2026

View reviewed changes

katrinafyi and others added 3 commits April 19, 2026 17:52

intersperse_whitespace to avoid copy

1cfca11

Merge pull request #1 from rina-forks/check-text-fragments-update

8313b54

Check text fragments: fix matching with repeated end, fix nbsp, justify normalise

Fix typos

5e7a646

katrinafyi reviewed Apr 19, 2026

View reviewed changes

Comment thread lychee-lib/src/utils/fragment_checker/text.rs Outdated

Update lychee-lib/src/utils/fragment_checker/text.rs

e468b10

Co-authored-by: Kait <39479354+katrinafyi@users.noreply.github.com>

thomas-zahner reviewed Apr 23, 2026

View reviewed changes

cristiklein added 4 commits April 24, 2026 05:25

Remove unused test code

1e1b451

Idiomatic Default

118028b

Improve comments in parses_text_directive_with_encoded_utf8

7c2707e

Make clippy happy

ec1466c

thomas-zahner merged commit 290e9d7 into lycheeverse:master Apr 24, 2026
7 checks passed

mre mentioned this pull request Apr 21, 2026

chore: release v0.24.0 #2048

Merged

strickvl mentioned this pull request May 8, 2026

Fix lychee.toml for lychee 0.24+ schema change zenml-io/kitaru#315

Merged

4 tasks

Uh oh!

Conversation

cristiklein commented Apr 12, 2026

Uh oh!

cristiklein commented Apr 12, 2026

Uh oh!

thomas-zahner commented Apr 13, 2026

Uh oh!

katrinafyi commented Apr 13, 2026

Uh oh!

cristiklein commented Apr 13, 2026

Uh oh!

katrinafyi commented Apr 13, 2026

Uh oh!

thomas-zahner commented Apr 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cristiklein Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cristiklein commented Apr 13, 2026

Uh oh!

mre commented Apr 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cristiklein Apr 17, 2026 •

edited

Loading

cristiklein commented Apr 15, 2026 •

edited

Loading

katrinafyi left a comment •

edited

Loading