Skip to content

Computer Use + Browser Use + RPA Nodes (Workflow Recording → Generated Flow) #462

@felix-schultz

Description

@felix-schultz

Scope

Add a full node suite for desktop + browser automation, reliability/RPA primitives, template matching, and LLM-assisted self-healing. Recording is a Workflow feature (not nodes): user records actions; Flow-Like generates a workflow that uses the nodes below.


Workflow Feature: Recording → Workflow (non-node)

  • Record browser + desktop actions (mouse/keyboard/navigation).
  • On each click: extract an ElementFingerprint (DOM/AX + role/name/text + bbox + nearby text + screenshot crop/template).
  • Auto-generate a flow with: primary selector + fallbacks (template match → LLM resolve).
  • Persist artifacts (templates, snapshots) to object store; attach diagnostics on failure.

Node List

1) Browser Use

  • browser.open(context_options) -> BrowserContextHandle
  • browser.close(context)
  • browser.new_page(context) -> BrowserPageHandle
  • browser.close_page(page)
  • browser.goto(page, url, wait_until)
  • browser.back(page) / browser.forward(page) / browser.reload(page)
  • browser.wait_for(page, selector|state, timeout)
  • browser.wait_for_navigation(page, timeout)
  • browser.wait_for_network_idle(page, timeout)
  • browser.click(page, selector|TargetRef, options)
  • browser.double_click(page, selector|TargetRef)
  • browser.hover(page, selector|TargetRef)
  • browser.focus(page, selector|TargetRef)
  • browser.fill(page, selector|TargetRef, value)
  • browser.type(page, selector|TargetRef, text, delay_ms?)
  • browser.press(page, selector|TargetRef, key_combo)
  • browser.select(page, selector|TargetRef, value|label|index)
  • browser.check(page, selector|TargetRef) / browser.uncheck(page, selector|TargetRef)
  • browser.upload(page, selector|TargetRef, ArtifactRef)
  • browser.download_wait(page, timeout) -> ArtifactRef
  • browser.screenshot(page, full|element, selector?) -> ArtifactRef
  • browser.pdf(page) -> ArtifactRef
  • browser.extract(page, selector, kind=text|html|attr|table|json)
  • browser.evaluate(page, js, typed_schema?)
  • browser.cookies_get/set(context)
  • browser.storage_get/set(context, local|session)
  • browser.auth_basic(context, user, secret_ref)
  • browser.auth_cookie_jar_load/save(context, secret_ref)
  • browser.observe_console(page) / browser.observe_network(page)
  • browser.get_dom_snapshot(page) -> DomSnapshotRef
  • browser.get_accessibility_tree(page) -> AxSnapshotRef

2) Computer Use (Desktop)

  • computer.session_start(options) -> ComputerSessionHandle
  • computer.session_stop(session)
  • computer.list_displays()
  • computer.list_windows()
  • computer.get_active_window()
  • computer.focus_window(app|title|handle)
  • computer.launch_app(path|bundle_id, args?)
  • computer.close_app(app|pid)
  • computer.mouse_move(x,y) / computer.mouse_click(x,y, button) / computer.mouse_double_click(...)
  • computer.mouse_drag(from_x,from_y,to_x,to_y, button)
  • computer.scroll(dx,dy)
  • computer.key_press(key_combo)
  • computer.key_type(text)
  • computer.clipboard_get() / computer.clipboard_set(text|ArtifactRef)
  • computer.screenshot(full|display|window|region) -> ArtifactRef
  • computer.wait(ms)
  • computer.wait_for_window(title|app, timeout)
  • computer.get_accessibility_tree(window?) -> AxSnapshotRef

3) Selectors + Element Fingerprints (Shared)

  • selector.build(from=dom|ax|role|text|xpath|css|image, options) -> Selector
  • selector.rank(SelectorSet, context) -> RankedSelectorSet
  • fingerprint.create(context_signals) -> ElementFingerprint
  • fingerprint.match(fingerprint, context, strategy=dom|ax|vision|hybrid) -> TargetRef
  • fingerprint.update(fingerprint, new_observation) -> ElementFingerprint

4) Vision / Template Matching

  • vision.template_capture(source=ArtifactRef|bbox) -> TemplateRef
  • vision.template_match(image|ArtifactRef, TemplateRef, thresholds, region?) -> MatchResult
  • vision.template_match_all(image, TemplateRef, thresholds, max_hits) -> MatchResult[]
  • vision.wait_for_template(TemplateRef, timeout, poll_ms) -> MatchResult
  • vision.click_template(TemplateRef, click_offset?, retries)
  • vision.crop(image, bbox) -> ArtifactRef
  • vision.ocr_read(image|region) -> text
  • vision.find_text(image, query) -> bbox[]

5) LLM-Assisted Self-Healing (SoTA)

  • llm.find_element(prompt, context=DOM|AX|screenshot, constraints) -> ElementFingerprint
  • llm.resolve_element(ElementFingerprint, context, strategy=hybrid) -> TargetRef
  • llm.rank_candidates(candidates, goal, context) -> ranked
  • llm.heal_selector(failure, context) -> SelectorSetCandidate
  • llm.heal_template(failure, context) -> TemplateUpdateCandidate
  • llm.plan_actions(goal, context, policy) -> ActionPlan (typed calls)
  • llm.classify_screen(context) -> ScreenState
  • llm.extract_structured(page|image, schema) -> json

6) RPA Reliability / Control

  • rpa.locate(input=ElementFingerprint|SelectorSet|TemplateRef, context, fallbacks) -> TargetRef
  • rpa.act(TargetRef, action=click|type|hover|drag|scroll, options)
  • rpa.wait_for(TargetRef|condition, timeout)
  • rpa.assert(TargetRef|condition, message?)
  • rpa.retry(policy) { subgraph }
  • rpa.timeout(ms) { subgraph }
  • rpa.on_error(strategy=catch|finally|branch) { subgraph }
  • rpa.diagnose(level=light|full) -> ArtifactBundle
  • rpa.log(event, fields)
  • rpa.metrics(counter|timer, labels)

Acceptance (high-level)

  • “Record → Generate workflow” produces flows that remain stable under moderate UI drift using selector→template→LLM fallback.
  • Template matching can re-find targets with confidence thresholds and bounded search regions.
  • Failures emit a diagnostics bundle (screenshot + DOM/AX snapshots + logs + template hits) under policy control.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions