Treat broken reference links as links#445
Treat broken reference links as links#445jyn514 wants to merge 4 commits intopulldown-cmark:masterfrom
Conversation
|
I worked around the |
|
@marcusklaas I'm not really sure where to go from here. I know what I want to do, which is:
However I'm not really sure how to implement that, I'm not understanding the current code very well. |
|
Just as I said that I figured out how to do it actually 😆 |
|
No, that just causes more test failures (from https://github.com/jyn514/pulldown-cmark/tree/reference-link-dup): |
marcusklaas
left a comment
There was a problem hiding this comment.
Apologies for the delay in reviewing this, and thanks again for getting started.
If I were to try this, I'd try skip parsing until the end of the sequence. So if we find that [label][ref] is not a valid reference link, we extend the node of the first ] to ][ref] and continue after this. This would effectively skip the parsing of [ref] and hence prevent the double call to the broken link function. This would probably be done here. That line makes the ] simple text. You would need to add some editing of the node end, and manipulation of cur/ cur_ix to make it work, I think.
I fully acknowledge that the link parsing logic is one of the most complex pieces of pulldown, so please don't feel bad if the above suggestion is daunting.
| } | ||
|
|
||
| #[test] | ||
| fn broken_links_called_only_once() { |
src/parse.rs
Outdated
| } else { | ||
| self.tree[cur_ix].item.body = ItemBody::Text; | ||
| } | ||
| // assume the reference link was meant to be valid |
There was a problem hiding this comment.
I don't completely understand what this change does. Is it reordering some operations, and if so, for what reason?
There was a problem hiding this comment.
The idea, which didn't work, was to treat [a][b] as a link even if b didn't resolve. Before it would only be treated as a link if the reference was valid. Now it causes the following markdown to be parsed incorrectly:
[foo][bar][baz]
[baz]: /url
It should be parsed as [foo] and then a link to /url with text bar. Instead it's now parsed as a self-contained reference link with text baz. I think this is because I took [bar] completely out of the tree so it isn't being considered as a label.
I think that would run into the same issues I mentioned in https://github.com/raphlinus/pulldown-cmark/pull/445/files#r435628484: If it's removed from the tree altogether, it's not considered as a label for future links. |
|
Ok, I had an idea that almost worked: I said if |
|
I changed it to only skip the callback and not any other processing. It passes all the test cases, but I'm no longer sure this is implemented correctly. |
|
This seems like a reasonable solution, although I wonder if the reset of [brokenlink1] some other node [brokenlink2] |
|
Good catch, I added a test for that. I tried fixing it by resetting $ diff --git a/src/parse.rs b/src/parse.rs
index d43b81f..34a60d9 100644
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -1997,6 +1997,8 @@ impl<'a> Parser<'a> {
let block_text = &self.text[..block_end];
while let Some(mut cur_ix) = cur {
+ let last_broken = saw_broken;
+ saw_broken = false;
match self.tree[cur_ix].item.body {
ItemBody::MaybeHtml => {
let next = self.tree[cur_ix].next;
@@ -2200,8 +2202,7 @@ impl<'a> Parser<'a> {
(link_type, url, title)
})
.or_else(|| {
- if saw_broken {
- saw_broken = false;
+ if last_broken { |
|
Yeah, it looks like it executes three times on a single reference link: Do you know why it has that behavior? |
|
Closing in favor of #469. |
This ensures that broken_link_callback will only see broken links once.
Closes #444
Outdated: trouble with FnMut
I tried to add the following test, but it fails because callbacks are only allowed to be
Fn, notFnMut:I tried allowing FnMut closures, but that had all sorts of other things that broke, including needing to pass in
Some(&mut closure)instead ofSome(&closure)and removing theCloneimpl forParser. Let me know if you want me to follow up with that, I'm not sure what the best approach is there.