Converting captioned figures to DOCX and back produces raw HTML, or doubled captions

Using pandoc 3.6.4.

With the following Markdown input as `foo.md`:

```markdown
![An image.](media/image.jpg)
```

I convert to DOCX:

```sh
pandoc --to=docx foo.md --output=foo.docx
```

This works fine. Converting `foo.md` to HTML and PDF also produce good results.

I then convert the DOCX back to Markdown:

```sh
pandoc --to=markdown --output=bar.md --extract-media=. foo.docx
```

This produces the following Markdown:

```markdown
<figure>
<img src="./media/rId20.jpg" style="width:5.83333in;height:3.91413in"
alt="An image." />
<figcaption aria-hidden="true"><p>An image.</p></figcaption>
</figure>
```

This looks OK in principle, but while converting it to HTML produces a good result, converting it to PDF omits the image.

I also tried:

```sh
pandoc --to=markdown-raw_html --output=bar.md --extract-media=. foo.docx
```

This produces:

```markdown
:::: figure
![An image.](./media/rId20.jpg){width="5.833333333333333in"
height="3.9141338582677165in"}

::: caption
An image.
:::
::::
```

Here, the PDF output is fine, but the HTML has two copies of the figure caption.

Ideally, it would be possible to produce the same Markdown from the DOCX as the original, but I'd be quite happy with an equivalent that worked as well. I'm hoping to be able to do round-trip conversion, so that I and others can do (disciplined!) edits to the DOCX and then re-convert to Markdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Converting captioned figures to DOCX and back produces raw HTML, or doubled captions #10755

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Converting captioned figures to DOCX and back produces raw HTML, or doubled captions #10755

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions