-
-
Notifications
You must be signed in to change notification settings - Fork 50
Computer Use + Browser Use + RPA Nodes (Workflow Recording → Generated Flow) #462
Copy link
Copy link
Closed
Copy link
Description
Scope
Add a full node suite for desktop + browser automation, reliability/RPA primitives, template matching, and LLM-assisted self-healing. Recording is a Workflow feature (not nodes): user records actions; Flow-Like generates a workflow that uses the nodes below.
Workflow Feature: Recording → Workflow (non-node)
- Record browser + desktop actions (mouse/keyboard/navigation).
- On each click: extract an ElementFingerprint (DOM/AX + role/name/text + bbox + nearby text + screenshot crop/template).
- Auto-generate a flow with: primary selector + fallbacks (template match → LLM resolve).
- Persist artifacts (templates, snapshots) to object store; attach diagnostics on failure.
Node List
1) Browser Use
browser.open(context_options) -> BrowserContextHandlebrowser.close(context)browser.new_page(context) -> BrowserPageHandlebrowser.close_page(page)browser.goto(page, url, wait_until)browser.back(page)/browser.forward(page)/browser.reload(page)browser.wait_for(page, selector|state, timeout)browser.wait_for_navigation(page, timeout)browser.wait_for_network_idle(page, timeout)browser.click(page, selector|TargetRef, options)browser.double_click(page, selector|TargetRef)browser.hover(page, selector|TargetRef)browser.focus(page, selector|TargetRef)browser.fill(page, selector|TargetRef, value)browser.type(page, selector|TargetRef, text, delay_ms?)browser.press(page, selector|TargetRef, key_combo)browser.select(page, selector|TargetRef, value|label|index)browser.check(page, selector|TargetRef)/browser.uncheck(page, selector|TargetRef)browser.upload(page, selector|TargetRef, ArtifactRef)browser.download_wait(page, timeout) -> ArtifactRefbrowser.screenshot(page, full|element, selector?) -> ArtifactRefbrowser.pdf(page) -> ArtifactRefbrowser.extract(page, selector, kind=text|html|attr|table|json)browser.evaluate(page, js, typed_schema?)browser.cookies_get/set(context)browser.storage_get/set(context, local|session)browser.auth_basic(context, user, secret_ref)browser.auth_cookie_jar_load/save(context, secret_ref)browser.observe_console(page)/browser.observe_network(page)browser.get_dom_snapshot(page) -> DomSnapshotRefbrowser.get_accessibility_tree(page) -> AxSnapshotRef
2) Computer Use (Desktop)
computer.session_start(options) -> ComputerSessionHandlecomputer.session_stop(session)computer.list_displays()computer.list_windows()computer.get_active_window()computer.focus_window(app|title|handle)computer.launch_app(path|bundle_id, args?)computer.close_app(app|pid)computer.mouse_move(x,y)/computer.mouse_click(x,y, button)/computer.mouse_double_click(...)computer.mouse_drag(from_x,from_y,to_x,to_y, button)computer.scroll(dx,dy)computer.key_press(key_combo)computer.key_type(text)computer.clipboard_get()/computer.clipboard_set(text|ArtifactRef)computer.screenshot(full|display|window|region) -> ArtifactRefcomputer.wait(ms)computer.wait_for_window(title|app, timeout)computer.get_accessibility_tree(window?) -> AxSnapshotRef
3) Selectors + Element Fingerprints (Shared)
selector.build(from=dom|ax|role|text|xpath|css|image, options) -> Selectorselector.rank(SelectorSet, context) -> RankedSelectorSetfingerprint.create(context_signals) -> ElementFingerprintfingerprint.match(fingerprint, context, strategy=dom|ax|vision|hybrid) -> TargetReffingerprint.update(fingerprint, new_observation) -> ElementFingerprint
4) Vision / Template Matching
vision.template_capture(source=ArtifactRef|bbox) -> TemplateRefvision.template_match(image|ArtifactRef, TemplateRef, thresholds, region?) -> MatchResultvision.template_match_all(image, TemplateRef, thresholds, max_hits) -> MatchResult[]vision.wait_for_template(TemplateRef, timeout, poll_ms) -> MatchResultvision.click_template(TemplateRef, click_offset?, retries)vision.crop(image, bbox) -> ArtifactRefvision.ocr_read(image|region) -> textvision.find_text(image, query) -> bbox[]
5) LLM-Assisted Self-Healing (SoTA)
llm.find_element(prompt, context=DOM|AX|screenshot, constraints) -> ElementFingerprintllm.resolve_element(ElementFingerprint, context, strategy=hybrid) -> TargetRefllm.rank_candidates(candidates, goal, context) -> rankedllm.heal_selector(failure, context) -> SelectorSetCandidatellm.heal_template(failure, context) -> TemplateUpdateCandidatellm.plan_actions(goal, context, policy) -> ActionPlan(typed calls)llm.classify_screen(context) -> ScreenStatellm.extract_structured(page|image, schema) -> json
6) RPA Reliability / Control
rpa.locate(input=ElementFingerprint|SelectorSet|TemplateRef, context, fallbacks) -> TargetRefrpa.act(TargetRef, action=click|type|hover|drag|scroll, options)rpa.wait_for(TargetRef|condition, timeout)rpa.assert(TargetRef|condition, message?)rpa.retry(policy) { subgraph }rpa.timeout(ms) { subgraph }rpa.on_error(strategy=catch|finally|branch) { subgraph }rpa.diagnose(level=light|full) -> ArtifactBundlerpa.log(event, fields)rpa.metrics(counter|timer, labels)
Acceptance (high-level)
- “Record → Generate workflow” produces flows that remain stable under moderate UI drift using selector→template→LLM fallback.
- Template matching can re-find targets with confidence thresholds and bounded search regions.
- Failures emit a diagnostics bundle (screenshot + DOM/AX snapshots + logs + template hits) under policy control.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done