Skip to content

renderMarkdown doesn't do what we all think it does #15285

@moonmeister

Description

@moonmeister

Astro Info

Astro                    v5.16.0
Node                     v22.22.0
System                   macOS (arm64)
Package Manager          pnpm
Output                   static
Adapter                  @astrojs/node (v9.5.1)
Integrations             @astrojs/starlight (v0.36.2)

If this issue only occurs in one browser, which browser is a problem?

No response

Describe the Bug

I think I found a bug in how renderMarkdown works, or at least in how it's portrayed.

Summary

The RenderedContent object that it returns is supposed to look like

Promise<{
    html: string;
    metadata?: {
        [key: string]: unknown;
        imagePaths?: string[] | undefined;
        headings?: MarkdownHeading[] | undefined;
        frontmatter?: Record<string, any>;
    } | undefined;
}>

It does return this structure, but the contents are mangled. For example:

{
  "html": "<hr>\n<h2 id=\"title-headless-wordpress-toolkitdescription-a-modern-framework-agnostic-collection-of-plugins-and-packages-for-building-headless-wordpress-applications\">title: “Headless WordPress Toolkit”\ndescription: “A modern, framework-agnostic collection of plugins and packages for building headless [...]",
  "metadata": {
    "headings": [
      {
        "depth": 2,
        "slug": "title-headless-wordpress-toolkitdescription-a-modern-framework-agnostic-collection-of-plugins-and-packages-for-building-headless-wordpress-applications",
        "text": "title: “Headless WordPress Toolkit”\ndescription: “A modern, framework-agnostic collection of plugins and packages for building headless WordPress applications.”"
      },
      //...
    ],
    //...
    "frontmatter": {}
  }
}

Important

Notice that the metadata => frontmatter field is empty, and both the html and frontmatter => headings fields include frontmatter content. Which they shouldn't.

Premise

Due to the promise of "it works like glob" from the docs and the fact that it returns this structured object, I'd expect it to correctly return the structure object. However, after digging into what glob actually does. This function fails to be very useful or is inaccurately documented.

  1. Glob separately parses the MD file with, I believe, parseFrontmatter from @astrojs/markdown-remark here.
  2. Then passes the result of that function + the raw code to the processor generated by createMarkdownProcessor here.
  3. That processor takes 2 arguments, the raw code and the result of parseFrontmatter. It then combines those 2 things into a single structured response in the format of RenderedContent.

Conclusing

The problem is that in the renderMarkdown implementation, only the raw MD is passed, not the parsed MD resulting from parseFrontmatter. Thus the result is incomplete data and a mangled response.

Some experimenting shows that passing parsed body(i.e. content) from the parseFrontmatter response as the first argument and the rest of parseFrontmatter as the second to the remark parser correctly generates a response. e.g.,

const rawFile = await fetchFile()
const {content, ...parsedFile} = parseFrontmatter(rawFile);
const rendered = await renderMarkdown({ content, data: parsedFile }); // this is modified function pnpm patch

Solutions

I see two possible explanations or solutions.

Option 1

I'm wrong. That renderMarkdown was never intended to parse frontmatter, only convert clean MD content into HTML. While not impossible I find this unlikely for several reasons:

  1. The documentation sets the expectation that this works "just like glob". Glob correctly handles frontmatter and more.
  2. If all this function was designed to do was convert a MD string to an HTML string, it'd return a string. Not a complex structured object identical to how glob handles it.

If this is the case, documentation should be updated and more info included on how to actually mimic glob.

Option 2

I'm right and either due to unnoticed breaking changes or bugs, this function needs fixing.

In which case I belive we could update the internals of renderMarkdown to correctly execute parseFrontmatter.

Outstanding Questions

  1. glob handles parsing for "Markdown, MDX, Markdoc, JSON, YAML, and TOML". I'd expect this function to not handle the latter 3. But what about MDX and Marcdoc? Are those seerate parsers or included within renderMarkdown? If not, how can we solve them? Answed Below

Related Issues

When looking for existing issues I stumbled across #14620. While related to images instead of frontmatter, it seems likely that it's due to the same or a related bugs.

Contribution

I'm happy to contribute something as this is actively hindering our ability to implement remote Markdown docs integration with Starlight.

What's the expected result?

I would expect the RenderedContent response from renderMarkdown to correctly parse my markdown and return a valid response with all relevant data.

Link to Minimal Reproducible Example

https://stackblitz.com/edit/github-fwjqules?file=src%2FexampleLoader.ts

Participation

  • I am willing to submit a pull request for this issue.

Metadata

Metadata

Assignees

Labels

- P2: nice to haveNot breaking anything but nice to have (priority)feat: markdownRelated to Markdown (scope)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions