Skip to content

HTML convenience accessors: html_query_one, html_tag, html_children #368

@aallan

Description

@aallan

Problem

The HTML stdlib is well-designed around CSS selectors (html_query, html_text, html_attr), but html_query always returns Array<HtmlNode> — even when only one element is expected. The common pattern of "get the first matching element" requires an extra array_get + Option unwrap:

let @Array<HtmlNode> = html_query(@HtmlNode.0, "h1");
let @Option<HtmlNode> = array_get(@Array<HtmlNode>.0, 0);
-- now unwrap the Option...

This is the same boilerplate every HTML-scraping program will write.

Proposed additions

-- First CSS selector match — Option<HtmlNode> instead of Array<HtmlNode>
html_query_one(@HtmlNode.0, "h1")     -- Option<HtmlNode>

-- Tag name of an element node (None for text/comment nodes)
html_tag(@HtmlNode.0)                  -- Option<String>

-- Direct children of a node
html_children(@HtmlNode.0)             -- Array<HtmlNode>

Impact

html_query_one covers the dominant scraping pattern — get the page title, get the first paragraph, get the canonical link. html_tag enables dispatching on element type when traversing a node list. html_children enables manual tree traversal for cases where CSS selectors aren't expressive enough.

Implementation

All three are pure Vera prelude functions or trivial extensions of existing host imports. html_query_one is html_query + array_get(_, 0). html_tag and html_children match on the HtmlNode constructor (HtmlElement carries tag + attributes + children; HtmlText/HtmlComment do not).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions