Add an Html standard library type for parsing and generating HTML documents.
Motivation
Web scraping and HTML generation are core agent tasks. With the browser target (vera compile --target browser) already shipping, HTML is the natural document format for data extraction and rendered output. Agents frequently need to:
- Parse HTML responses from web pages (extract structured data)
- Generate HTML for email bodies, reports, and rendered output
- Query HTML documents by element type, class, or CSS selector
Proposed type
data HtmlNode {
Element(@String, @Array<Tuple<String, String>>, @Array<HtmlNode>),
Text(@String),
Comment(@String)
}
Operations
html_parse(@String -> Result<HtmlNode, String>) — parse an HTML string into a tree
html_to_string(@HtmlNode -> String) — serialise back to HTML
html_query(@HtmlNode, @String -> Array<HtmlNode>) — CSS selector query
html_text(@HtmlNode -> String) — extract text content (strips tags)
html_attr(@HtmlNode, @String -> Result<String, String>) — get attribute value
Usage
public fn extract_titles(@String -> Array<String>) {
match html_parse(@String.0) {
Ok(doc) -> html_query(doc, "h1")
|> array_map(html_text),
Err(_) -> []
}
}
Verification opportunities
Contracts could verify:
- Generated HTML is well-formed (matching open/close tags)
- Required elements are present (e.g., every
<img> has an alt attribute for accessibility)
- Output conforms to a structural schema
Dependencies
Implementation notes
Host bindings to Python's html.parser (built-in, no new dependencies). The tree representation is a recursive ADT — same pattern as Markdown (#147, already implemented in v0.0.84).
Follows the same pattern as Markdown: built-in type registered in environment.py, constructor layouts in registration.py, host-side parsing in codegen/api.py.
Add an
Htmlstandard library type for parsing and generating HTML documents.Motivation
Web scraping and HTML generation are core agent tasks. With the browser target (
vera compile --target browser) already shipping, HTML is the natural document format for data extraction and rendered output. Agents frequently need to:Proposed type
Operations
html_parse(@String -> Result<HtmlNode, String>)— parse an HTML string into a treehtml_to_string(@HtmlNode -> String)— serialise back to HTMLhtml_query(@HtmlNode, @String -> Array<HtmlNode>)— CSS selector queryhtml_text(@HtmlNode -> String)— extract text content (strips tags)html_attr(@HtmlNode, @String -> Result<String, String>)— get attribute valueUsage
Verification opportunities
Contracts could verify:
<img>has analtattribute for accessibility)Dependencies
Map<String, String>(optional, can useArray<Tuple<String, String>>initially)Implementation notes
Host bindings to Python's
html.parser(built-in, no new dependencies). The tree representation is a recursive ADT — same pattern as Markdown (#147, already implemented in v0.0.84).Follows the same pattern as Markdown: built-in type registered in
environment.py, constructor layouts inregistration.py, host-side parsing incodegen/api.py.