Skip to content

org-reader: begin_block is parsed incorrectly #4287

@hrehfeld

Description

@hrehfeld
$ pandoc -t html test.org 
<iframe></iframe>

$ cat test.org

#+BEGIN_HTML
<iframe></iframe>
#+END_HTML

$ pandoc -t html test.org
<p>&lt;iframe&gt;&lt;/iframe&gt;</p>
<p>#+END<sub>HTML</sub></p>
$ cat test.org           

#+BEGIN_HTML:
<iframe></iframe>
#+END_HTML

$ pandoc -t html test.org
<p>&lt;iframe&gt;&lt;/iframe&gt;</p>
$ cat test.org           

#+BEGIN_HTML:
<iframe></iframe>
#+END_HTML:

At least the third version is slightly incorrect according to https://orgmode.org/worg/dev/org-syntax.html#Greater_Blocks :

NAME can contain any non-whitespace character.

So to be correct, we would need output of

<html:><p>&lt;iframe&gt;&lt;/iframe&gt;</p></html:>

where : is replaced by some legal character.

The second version is what I'm personally running into. I would expect one of the following:

  • ignore the : after begin_block
  • output the (escaped) incorrect block markup, including #+begin_block: so that you can see what's going on
  • (issue warning)

This also happens with -t json.

I'm using pandoc 2.0.6-20

ps. I know, orgmode special attributes vs. blocks are hard to parse. :-(

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions