Skip to content

HTML standard library type #311

@aallan

Description

@aallan

Add an Html standard library type for parsing and generating HTML documents.

Motivation

Web scraping and HTML generation are core agent tasks. With the browser target (vera compile --target browser) already shipping, HTML is the natural document format for data extraction and rendered output. Agents frequently need to:

  • Parse HTML responses from web pages (extract structured data)
  • Generate HTML for email bodies, reports, and rendered output
  • Query HTML documents by element type, class, or CSS selector

Proposed type

data HtmlNode {
  Element(@String, @Array<Tuple<String, String>>, @Array<HtmlNode>),
  Text(@String),
  Comment(@String)
}

Operations

  • html_parse(@String -> Result<HtmlNode, String>) — parse an HTML string into a tree
  • html_to_string(@HtmlNode -> String) — serialise back to HTML
  • html_query(@HtmlNode, @String -> Array<HtmlNode>) — CSS selector query
  • html_text(@HtmlNode -> String) — extract text content (strips tags)
  • html_attr(@HtmlNode, @String -> Result<String, String>) — get attribute value

Usage

public fn extract_titles(@String -> Array<String>) {
  match html_parse(@String.0) {
    Ok(doc) -> html_query(doc, "h1")
      |> array_map(html_text),
    Err(_) -> []
  }
}

Verification opportunities

Contracts could verify:

  • Generated HTML is well-formed (matching open/close tags)
  • Required elements are present (e.g., every <img> has an alt attribute for accessibility)
  • Output conforms to a structural schema

Dependencies

Implementation notes

Host bindings to Python's html.parser (built-in, no new dependencies). The tree representation is a recursive ADT — same pattern as Markdown (#147, already implemented in v0.0.84).

Follows the same pattern as Markdown: built-in type registered in environment.py, constructor layouts in registration.py, host-side parsing in codegen/api.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C9C9 — Language designenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions