Skip to content

display math newlines removed in Markdown output (3.7 vs 3.6.4) #11384

@cderv

Description

@cderv

When converting native -> markdown (with tex_math_dollars), DisplayMath that contains leading/trailing newlines is now emitted as a single-line $$...$$ in pandoc 3.7+, whereas pandoc 3.6.4 preserved the internal newlines:

  • pandoc 3.6.4 (previous behavior)

    $$
    e = mc^2
    $$
    
  • pandoc 3.7+ (actual)

    $$e = mc^2$$
    

This breaks use-cases where consumers expect a multi-line display-math block (for example: producing Markdown for tools like remark/remark-math + KaTeX or Docusaurus MDX that treat $$\n...\n$$ as a block).

Here is how to reproduce

  1. Start pandoc in native mode to build the test Inline element (or use a file containing display math with surrounding blank lines):

    ❯ pandoc -f native -t markdown
    [ Para [ Math DisplayMath "\ne = mc^2\n" ] ]
    ^D
    
  2. Observe output in 3.6.4:

    $$
    e = mc^2
    $$
    

    Observe output in 3.7:

    $$e = mc^2$$
    

Looking at history I found what changed this.

Before 3.7 the Markdown writer used simple concatenation for $$ display math (effectively "$$" <> literal str <> "$$"). In 3.7 the writer now uses the new delimited helper which pulls leading/trailing whitespace (including newlines) outside the opener/closer.

-- | Add an opener and closer to a Doc. If the Doc begins or ends
-- with whitespace, export this outside the opener or closer.
-- This is used for formats, like Markdown, which don't allow spaces
-- after opening or before closing delimiters.
delimited :: Doc Text -> Doc Text -> Doc Text -> Doc Text
delimited opener closer content =
mconcat initialWS <> opener <> mconcat middle <> closer <> mconcat finalWS
where
contents = toList content
(initialWS, rest) = span isWS contents
(reverseFinalWS, reverseMiddle) = span isWS (reverse rest)
finalWS = reverse reverseFinalWS
middle = reverse reverseMiddle
isWS NewLine = True
isWS CarriageReturn = True
isWS BreakingSpace = True
isWS BlankLines{} = True
isWS _ = False
toList (Concat (Concat a b) c) = toList (Concat a (Concat b c))
toList (Concat a b) = a : toList b
toList x = [x]

inlineToMarkdown opts (Math DisplayMath str) = do
variant <- asks envVariant
case () of
_ | variant == Markua -> do
let attributes = attrsToMarkua opts (addKeyValueToAttr ("",[],[])
("format", "latex"))
return $ blankline <> attributes <> cr <> literal "```" <> cr
<> literal str <> cr <> literal "```" <> blankline
| otherwise -> case writerHTMLMathMethod opts of
WebTeX url ->
let str' = T.strip str
in (\x -> blankline <> x <> blankline) `fmap`
inlineToMarkdown opts (Image nullAttr [Str str']
(url <> urlEncode str', str'))
_ | isEnabled Ext_tex_math_gfm opts ->
return $ cr <> (literal "``` math"
$$ literal (T.dropAround (=='\n') str)
$$ literal "```") <> cr
| isEnabled Ext_tex_math_dollars opts ->
return $ delimited "$$" "$$" (literal str)
| isEnabled Ext_tex_math_single_backslash opts ->
return $ "\\[" <> literal str <> "\\]"
| isEnabled Ext_tex_math_double_backslash opts ->
return $ "\\\\[" <> literal str <> "\\\\]"
| otherwise -> (\x -> cr <> x <> cr) `fmap`
(texMathToInlines DisplayMath str >>= inlineListToMarkdown opts)

The delimited function intentionally extracts initial and trailing whitespace (NewLine, CarriageReturn, BreakingSpace, BlankLines) and places it outside the opener/closer. That behavior is often useful for inline delimiters (emphasis, code markers, etc.) but has the side effect of removing the interior newlines for display math when the tex_math_dollars path uses delimited "$$" "$$" (literal str). So I wonder if this is expected change for display math block ?

This is a problem for some Markdown toolchains that distinguish between:

  • block display math: $$\n<math>\n$$ (treated as a block)
  • inline math: $<math>$ or single-line $$<math>$$ (treated inline)

Converting a display math block to $$<math>$$ can change rendering from block-centered math to inline math, breaking layout.

Context

  • I originally noticed this while generating Markdown for Docusaurus / MDX with remark-math + KaTeX. The single-line $$...$$ is interpreted as inline math by the toolchain and not displayed as a centered block.
  • I believe previous behavior in 3.6.4 preserved the internal newlines for DisplayMath; the change to delimited in 3.7 appears to have caused this regression.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions