Skip to content

Markdown content extractors: md_inline_text, md_headings, md_links, md_blocks #367

@aallan

Description

@aallan

Problem

The Markdown stdlib (md_parse, md_render, md_has_heading, md_has_code_block, md_extract_code_blocks) covers the boolean-question and code-extraction use cases well, but leaves structural traversal entirely to the user. Extracting a heading's text, iterating document blocks, or collecting all links requires deeply nested pattern matches over the full ADT tree:

MdDocument([MdBlock...]) → MdHeading(Int, [MdInline...]) → MdText(String)

This is the same boilerplate pattern as the JSON field-extraction problem — every Markdown-consuming program will rediscover it.

Proposed additions

-- Unwrap MdDocument into its block list; identity on other constructors
md_blocks(@MdBlock.0)                 -- Array<MdBlock>

-- Extract plain text from a single MdInline node
md_inline_text(@MdInline.0)           -- String

-- Extract plain text from a single MdBlock (concatenates inline text)
md_block_text(@MdBlock.0)             -- String

-- Extract all headings as (level, text) pairs
md_extract_headings(@MdBlock.0)       -- Array<Tuple<Int, String>>

-- Extract all links as (url, display_text) pairs
md_extract_links(@MdBlock.0)          -- Array<Tuple<String, String>>

-- Filter a document to blocks of a given constructor name
md_filter_blocks(@MdBlock.0, "MdHeading")   -- Array<MdBlock>

Motivation

The three dominant Markdown agent tasks are: (1) extract structured content (headings, links, code); (2) check content properties; (3) render to string. The stdlib covers (2) and (3) well. md_extract_code_blocks shows the right pattern for (1) but only covers one constructor. The proposed additions complete the set for the most commonly needed constructors.

Implementation

Pure Vera prelude functions — recursive traversal over the MdBlock/MdInline ADTs. No new WASM host imports. The tricky case is md_inline_text for MdEmph/MdStrong (which wrap Array<MdInline>) — needs array folding to concatenate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions