org-parser is a small streaming Org reader and minimal Org→HTML exporter.
I built it so I can read my Org files on devices where I can’t install Emacs.
Building blocks:
org_reader.py: reads Org files line-by-line and expands#+INCLUDEdepth-firstorg_parser.py: streaming parser emitting events (OrgEvent)org_to_html.py: minimal rendererwebapp.py: web viewer (Flask)math_renderer.py: LaTeX → SVG (cache)
Design goals:
- streaming / lazy
- “good enough” Org support
- clear state + events pipeline
Requirements:
- Python ≥ 3.12 recommended
- for math SVG:
latexanddvisvgmin PATH
Minimal:
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install flask gunicorn pyyamluv venv
source .venv/bin/activate
uv pip install flask gunicorn pyyamlBuild an image and run it with (podman-)compose.
Benefits:
- reproducible
- no Python setup needed on the host
podman build -t org-viewer:latest -f Containerfile .podman-compose up -d
podman logs -f org-viewer- mount cert/key to
/certs - set
CERT_FILEandKEY_FILE
Container serves HTTP internally — let Caddy/Nginx/Traefik handle TLS.
The renderer writes to /app/.math-cache.
With a bind mount:
- use Podman
:Uso ownership matches inside the container - with SELinux add
:Z
Example:
./.math-cache:/app/.math-cache:rw,Z,U
(better than chmod 777 on the host)
Shows how the reader resolves #+INCLUDE directives.
python3 org_reader.pyExports an Org file to HTML.
python3 org_to_html.py org/90-feature-demo.org -o out.htmlStarts the Flask web app.
python3 webapp.py
# then: http://localhost:5000HTTP:
gunicorn -w 4 -b 0.0.0.0:5000 webapp:appTLS:
gunicorn -w 4 -b 0.0.0.0:5000 webapp:app --certfile /certs/tls.crt --keyfile /certs/tls.keyThis section aggregates the documentation of the internal org-parser
modules.
Each module has its own file. They are included here using #+INCLUDE
so they stay modular, but can also be read as one continuous document.
The config_loader.py module encapsulates all configuration for the Org reader
(regexes, block types, header keys, etc.) in a dedicated class.
Container for regexes and parser settings.
Key fields (excerpt):
- verbatim_blocks: set[str]
- skip_header_keys: set[str]
- quotes: dict[str,str]
- block_re: re.Pattern
- header_kv_re: re.Pattern
- include_keyword_re: re.Pattern
- section_heading_re: re.Pattern
- comment_begin_re, comment_end_re
- latex_macro_re
An instance is passed around to the reader/parser as a configuration object.
DEFAULT_CONFIG is a preconfigured OrgReaderConfig instance with compiled
regexes for “normal” Org files.
Intended as:
- a sensible default configuration
- a reference for how the YAML config file is structured
Reads a YAML file (e.g. config.yml) and creates an OrgReaderConfig instance.
Typical behavior:
- parses YAML
- compiles regex strings
- converts fields like
verbatim_blocks/skip_header_keysto proper types (set, dict, …)
Used to override or customize the default configuration.
org_parser.py contains the streaming parser that turns lines into OrgEvent
objects and maintains the stateful OrgState.
OrgEvent(type: str, data: dict[str, Any])- generic event object
typedescribes the kind of event (e.g."heading","block_begin", …)dataholds context-specific information (level, text, options, …)
OrgPreamble(headers: dict[str,str])- represents preamble headers (
#+TITLE,#+AUTHOR, …) - convenient properties for title/author/date/options
- represents preamble headers (
OrgState- mutable streaming state
- tracks e.g.:
- whether we are in the preamble
- whether we are inside a block / src block / comment block
- context for lists, tables, etc.
parse_org_line(line, cfg, state) -> (state, events)Core parser function:- takes a single
line - uses regexes from
cfg - updates
state - returns a list of
events(often empty or 1–2 entries)
- takes a single
Typical event types:
heading-
block_begin/block_end -
src_begin/src_end -
list_item/ordered_list_item -
table_row/table_hline tblfm-
name,caption,attr_html -
comment,comment_block latex_macro-
line_tokens(inline tokenization of text lines) -
tokenize_inline_org_markup(text) -> list[(type, text)]Minimal inline tokenizer supporting:- plaintext
- bold_text
- italic_text
- code
- link (combined url/desc)
- math_inline (for \(..\) and
$..$ )
-
parse_src_block_options(arg_string) -> dict[str,str]Parses the arguments of the#+begin_srcline:- language (e.g.
python,bash) - header args (e.g.
:results,:session,:tangle,:var…)
- language (e.g.
-
parse_html_attr_args(arg_string) -> dict[str,str]Parses lines like:#+ATTR_HTML: :width 50% :class foo
into a dictionary that the renderer turns into HTML attributes.
The parser currently emits, among others, the following event types:
preamble_kv,preamble_endheadingblock_begin,block_endsrc_begin,src_endlist_item,ordered_list_itemtable_row,table_hline,tblfmname,caption,attr_htmlcomment,comment_blocklatex_macroline_tokens
org_reader.py is responsible for reading Org files with
#+INCLUDE support and preamble handling. The result is an iterator
over lines (with includes expanded).
un_quote_string(string, cfg) -> str- removes quotes based on the rules in
cfg.quotes - useful for paths/strings from headers or INCLUDE lines
- removes quotes based on the rules in
resolve_include(line, path, cfg) -> Path- evaluates a
#+INCLUDE:line - resolves the file path relative to
path(current file) - returns a
Pathinstance
- evaluates a
is_include(line, cfg) -> bool- checks whether a line is an include directive
- uses regexes from the configuration
should_skip_header_line(line, cfg) -> bool- decides whether a preamble line should be skipped
- driven by settings like
skip_header_keys
preamble_decision(line, cfg) -> (skip: bool, still_in_preamble: bool)- central logic for “Are we still in the preamble?”
- determines:
- whether the current line is ignored as preamble
- whether the preamble ends at this line
read_with_includes(path, cfg, *, is_root=True) -> Iterator[str]- main entry point of the reader
- reads a file line by line
- expands
#+INCLUDE:directives depth-first - respects preamble handling and skip rules
- depth-first include expansion
- no expansion inside:
- blocks (
state.is_inside_block) - drawers
- comment blocks
- blocks (
- for included files:
- preamble is skipped until the first “real” content line
- your
preamble_decisionlogic defines that behavior
org_to_html.py contains the minimal Org→HTML renderer. It consumes
OrgEvent streams and produces a complete HTML document.
render_org_to_html_document(input_path, cfg) -> str- reads an Org file (via reader + parser)
- renders the event stream into an HTML string
- includes a basic HTML header/body scaffold
org_to_html(input_path, output_path, cfg) -> None- convenience function
- calls
render_org_to_html_document - writes the result to
output_path
It currently supports:
- headings (
h1..h6) with tags - paragraphs
- unordered/ordered lists
- verbatim blocks + src blocks
-
data-languageattribute - additional
data-*attributes from src header args
-
- inline markup:
- bold, italic, code, links
- image-only lines:
- rendered as
<figure>with caption + ATTR_HTML
- rendered as
- tables:
- simple tables + a subset of TBLFM
- comments + comment blocks:
- rendered as collapsible sections
-
:noexport:headings:- treated like comment sections (collapsible)
- verse blocks:
- own container preserving line breaks
- inline math:
-
$..$ and \(..\) are rendered as SVG images - URLs:
/math/<digest>.svg
-
render_inline_tokens(tokens, *, preamble_macros””) -> str=- renders a list of inline tokens to HTML
- optionally applies LaTeX macros from the preamble
flush_paragraph(..., preamble_macros””) -> None=- writes the currently accumulated paragraph into the output stream
- ensures clean paragraph separation
math_image_url(math_src, *, preamble_macros””) -> str=- builds the URL/path for the SVG math image
- delegates to the math renderer / cache
render_math_to_svg(math_src, out_path, *, preamble_macros””) -> None=- writes a temporary standalone LaTeX file
- runs
latex→dvi→dvisvgm - the result is an SVG file at
out_path - uses a cache so the same formula isn’t rendered repeatedly
Typical usage:
- called indirectly from the
/math/<digest>.svgendpoint - digest is based on the math source string + optional macros
index()- lists
README.organd files underorg/*.org - provides entry points to view Org documents
- lists
view_file(filename)- renders an Org file via reader + parser + renderer into HTML
- returns the HTML view in the browser
assets(subpath)- serves static assets under
/assets/...(CSS, images, …)
- serves static assets under
math_image(digest)- creates (or returns from cache) an SVG image for a math expression
- uses
math_renderer.render_math_to_svg
Important:
- math cache must be writable:
/app/.math-cache - in containers, a volume mount with
:Z,Uis recommended
Cause:
- the container user cannot write into the bind mount
Recommended fix (Podman):
- mount the volume with
:U:./.math-cache:/app/.math-cache:rw,Z,U
Why not chmod 777?
- it works
- but it’s messy and potentially unsafe
Typical causes:
- you call the service via HTTPS, but the container serves HTTP
- or the other way around
Checklist:
podman logs org-viewer
ss -tlpn | grep 5000
curl http://localhost:5000Possible reasons:
- CERT_FILE / KEY_FILE incorrectly set
- certificate volume not mounted to
/certs - wrong filenames or ownership inside the container
Debug:
ls -al /certs- preamble parsing
- INCLUDE (depth-first)
- headings & tags
- lists
- inline markup & links
- images + caption + ATTR_HTML + NAME anchors
- verbatim and src blocks
- tables + TBLFM
- inline math + LaTeX macros
Example: \(∑superscript\) and
| Name | German | Math | Average |
|---|---|---|---|
| Student1 | 2 | 3 | |
| Student2 | 1 | 1 |
This is a hobby project, partly created with help from ChatGPT, and not meant for production use. License: GPLv3