Skip to content

Strong with nested emphasis parses as "**" #657

@mgeisler

Description

@mgeisler

The string "****foo*bar*baz****" is parsed as follows:

"****foo*bar*baz****" -> [
  Start(Paragraph)
    Text(Borrowed("*"))
    Text(Borrowed("*"))
    Start(Emphasis)
      Start(Emphasis)
        Text(Borrowed("foo"))
      End(Emphasis)
      Text(Borrowed("bar"))
    End(Emphasis)
    Text(Borrowed("baz"))
    Text(Borrowed("*"))
    Text(Borrowed("*"))
    Text(Borrowed("*"))
    Text(Borrowed("*"))
  End(Paragraph)
]

This differs from the commonmark.js reference implementation which gives the following AST:

  <paragraph>
    <strong>
      <emph>
        <emph>
          <text>foo</text>
        </emph>
        <text>bar</text>
      </emph>
      <text>baz</text>
    </strong>
    <text>**</text>
  </paragraph>

I found this while trying to make pulldown-cmark-to-cmark round-trip the Markdown text with a fuzzer. I'm trying to make a single paragraph of text work and the fuzzer looks like this:

#![no_main]

use libfuzzer_sys::fuzz_target;
use pulldown_cmark::Parser;
use pulldown_cmark::{Event, Tag};
use pulldown_cmark_to_cmark::cmark;

fn round_trip(text: &str) -> String {
    let mut result = String::with_capacity(text.len());
    let events = Parser::new(&text);

    let events = events.collect::<Vec<_>>();

    if std::env::var("EVENTS").is_ok() {
        let mut width = 0;
        eprintln!("{text:?} -> [");
        for event in &events {
            if let Event::End(_) = event {
                width -= 2;
            }
            eprintln!("  {:width$}{event:?}", "");
            if let Event::Start(_) = event {
                width += 2;
            }
        }
        eprintln!("]");
    }

    cmark(events.into_iter(), &mut result).unwrap();
    result
}

fuzz_target!(|text: String| {
    if text.contains(&['\n', '\r', '`']) {
        return;
    }

    let round_trip_1 = round_trip(&text);
    match Parser::new(&round_trip_1).collect::<Vec<_>>().as_slice() {
        [Event::Start(Tag::Paragraph), .., Event::End(Tag::Paragraph)] => {}
        _ => return,
    }

    let round_trip_2 = round_trip(&round_trip_1);
    let round_trip_3 = round_trip(&round_trip_2);
    assert_eq!(round_trip_2, round_trip_3);
});

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions