Automation API

The Automation API is available from the browser.webfuseSession.automation namespace.

Perception methods (‘see’ 👁️) are grouped under automation.see.
Actuation methods (‘act’ 👆️) are grouped under automation.act.
Auxiliary tool methods (‘tool’ 🛠️) are grouped under automation.tool.
Other methods and properties are available right from automation.

Targeting

The Automation API can be utilized with both DOM- and vision-based agents. For this, targeting an element in the page is overloaded with different types:

type Target = HTMLElement | CSSSelector | Point | WebfuseID | MetaTarget;

type CSSSelector = string;
type Point = [number, number]; // [x, y];
type WebfuseID = string; // Unique ID of an element per Tab in a Webfuse Session
enum MetaTarget {
  POINTER, // Current virtual pointer position
  FOCUS    // Currently focused element
}

// By element reference
browser.webfuseSession
  .automation
  .act
  .click(document.getElementById('cta'))

// By CSS selector
browser.webfuseSession
  .automation
  .act
  .click('main > button.cta')

// By point coordinate
browser.webfuseSession
  .automation
  .act
  .click([420, 890])

// By meta target
browser.webfuseSession
  .automation
  .act
  .click(
    browser.webfuseSession.automation.Target.POINTER
  )

Cross-Shadow and -Frame Targeting Webfuse Exclusive

Relevant elements may, in some cases, be hidden inside shadow DOM, or even iframes. For example, if the agent-enhanced web page embeds a checkout component from a third-party provider. Targeting with the Automation API is able to pierce shadow root and even iframe boundaries. Point coordinates on iframes simply lead to descending into the iframe and targeting with normalized coordinates until a non-frame element is found. By design, cross-shadow or -frame CSS selectors do not exist. To enable cross-shadow and -frame targeting via CSS selectors, Webfuse considers both types of DOM nodes as ordinary container elements. For shadow DOM, the shadow-root pseudo container tag name is therefore introduced. As a result, shadow roots and frames represent implicit container tags, like, i.a., div or section.

Cross-Shadow Targeting

Shadow DOM subtrees in the browser are usually invisible upon parent DOM serialization (.outerHTML/.innerHTML). The position of a shadow root is visualized with # Shadow in the following example:

<div>
  <custom-element>
    # Shadow
    <b>Slotted</b>
  </custom-element>
</div>

Webfuse, however, implies a shadow root element, which is reflected with Webfuse-native perception in DOM snapshots. For above given example, this would look as follows:

<div>
  <custom-element>
    <shadow-root>
      <strong>Shadow</strong>
      <p>
        <slot></slot>
      </p>
    </shadow-root>
    <b>Slotted</b>
  </custom-element>
</div>

Now, the shadow root and elements within the shadow DOM can be targeted with valid CSS selector syntax:

// Cross-shadow
browser.webfuseSession
  .automation
  .act
  .click('body my-component shadow-root button#submit')

Cross-Frame Targeting

To isolate embedded DOMs from each other, subtrees beneath iframes are usually hidden upon parent DOM serialization (.outerHTML/.innerHTML):

<html>
  <head></head>
  <body>
    <h1>Parent</h1>
    <iframe src="/child"></iframe>
  </body>
</html>

Webfuse implies frames to be parent DOM native container elements. Optionally, DOM snaphots through Webfuse-native perception inline iframe contents. This would looks as follows for above given example:

<html>
  <head></head>
  <body>
    <h1>Parent</h1>
    <iframe src="/child">
      <html>
        <h1>Child</h1>
      </html>
    </iframe>
  </body>
</html>

Now, elements within the frame can be targeted with valid CSS selector syntax:

// Cross-frame
browser.webfuseSession
  .automation
  .act
  .click('body iframe button#submit')

IMPORTANT: Without further ado, a cross-shadow and cross-frame CSS selector paradigm bloats the target scope. For instance, with reduandant IDs across frames (e.g., three times id="submit"). For that reason, Webfuse constrains cross-shadow and -frame selectors by mandatory explicitness: Every single shadow root or frame to be pierced must explicitly be part of a cross-shadow and/or -frame selector. This way, each boundary can be progessively selected and pierced. For frames, this means, the iframe tag must be stated for each iframe. For shadow roots, this means, for each element that has a shadow root attached, the element selector followed by the shadow-root pseudo tag must be stated.

Take this example of a DOM (snapshot):

<body>
  <iframe id="my-frame" src="/foo">
    <html>
      <iframe src="/bar">
        <html></html>
      </iframe>
      <iframe src="/baz">
        <qux-quux>
          <shadow-root>
            <button id="submit">Deep Submit</button>
          </shadow-root>
        </qux-quux>
      </iframe>
    </html>
  </iframe>
  <button id="submit">Submit</button>
</body>

Suppose the button deep in the frames (‘Deep Submit’) shall be targeted. Here’s how different selector targets would resolve:

✔️ iframe iframe:nth-of-type(2) qux-quux shadow-root #submit
❌ #submit Resolves button in parent (‘Submit’)
❌ qux-quux shadow-root #submit Missing frame tags
❌ iframe iframe qux-quux shadow-root #submit Wrong second frame selector
❌ #my-frame iframe qux-quux shadow-root #submit Missing explicit iframe tag
❌ iframe iframe:nth-of-type(2) shadow-root #submit Missing shadow parent tag
❌ iframe iframe:nth-of-type(2) qux-quux #submit Missing shadow root pseudo tag

Webfuse IDs Recommended

Every element in a Webfuse Session associates with two internal unique IDs: a per-frame (ID), and a per-Tab ID (Webfuse ID). The Webfuse ID is a successive dash-concatenation of the nested frames ending with the target element: <WEBFUSE_ID> = <FRAME:0:ID>-...-<FRAME:n:ID>-<ELEMENT:ID>. It can be utilised with DOM snapshots, in particular cross-shadow and -frame snapshots. Using the webfuseIDs option, the Webfuse ID of each element is inlined with an HTML pseudo attribute wf-id. It can subsequently be resolved with actuation calls:

_{DOM snapshot with inlined Webfuse IDs}

<button wf-id="2-1-54">Submit</button>
<span id="result" wf-id="2-1-55"></span>

_{Actuation via Webfuse ID}

// By Webfuse ID
browser.webfuseSession
  .automation
  .act
  .click('2-1-54') // 2nd iframe > 1st iframe > 54th element

`.act` Actuation Scope

Actuation refers to the process of interacting with a web page to change its state. For human users, this typically involves a few common actions, such as clicking, typing, or scrolling.

In the browser, an action is actually a sequence of multiple events fired on a specific target element. For example, a single “click” triggers a chain of events including:

mousedown
click
mouseup

Webfuse emulates human actuation as closely as possible. When an action is performed via Webfuse, all corresponding events are dispatched not only on the target element but also along the trajectory of the mouse pointer movement, ensuring the web application reacts exactly as it would for a real user.

act.mouseMove()

Move the virtual mouse pointer.

browser.webfuseSession.automation.act.mouseMove(
  target: Target,
  options?: {
    persistent?: boolean;
  }
): Promise<void>

Besides positional arguments, as described in this reference, the Automation API does also support those arguments folded into a single object. At that, object keys resemble the herein used argument keys. For instance, act.mouseMove could also be called as follows:

browser.webfuseSession.automation.act.mouseMove(args: {
  target: Target;
  options?: {
    persistent?: boolean;
  };
}): Promise<void>

Parameters

target

Mouse pointer target.

[options]

Mouse move options:
- [persistent] Whether to keep the pointer on screen after it was moved. By default, the pointer fades out after some time.

Returns

A promise that resolves once the mouse was moved.

Example

// Move the virtual mouse pointer to pixel position x=100 and y=400:
await browser.webfuseSession
  .automation
  .act
  .mouseMove([100, 400]);

act.scroll()

Scrolls the deepest scrollable element under the target by the given amount in the given direction.

browser.webfuseSession.automation.act.scroll(
  target: Target,
  direction: 'vertical' | 'horizontal',
  amount: number
): Promise<void>

Parameters

target

Scroll(able) target.

direction

The direction to scroll.

amount

The amount of pixels to scroll.

Returns

A promise that resolves once scroll ended.

Example

// Scroll the body element 100 pixels up:
await browser.webfuseSession
  .automation
  .act
  .scroll('body', 'vertical', 100);

act.click()

Perform a left (primary) mouse button click.

browser.webfuseSession.automation.act.click(
  target: Target,
  options?: {
    button?: 'left' | 'middle' | 'right';
    moveMouse?: boolean;
    scrollIntoView?: boolean;
  }
): Promise<void>

Parameters

target

Click target.

[options]

Click options:
- [button] Mouse button to click (left by default).
- [moveMouse] Whether to move the virtual mouse pointer to the target center before performing the action (false by default).
- [scrollIntoView] Whether to scroll the target element into view before performing the action (true by default).

Returns

A promise that resolves once click was performed.

Example

// Click the fourth element in the second frame.
// Move the virtual mouse pointer to the center of the target element beforehand:
await browser.webfuseSession
  .automation
  .act
  .click('2-4', {
    moveMouse: true,
  });

act.type()

browser.webfuseSession.automation.act.type(
  target: Target,
  text: string,
  options?: {
    followFocus?: boolean;
    overwrite?: boolean;
    timePerChar?: number;
    moveMouse?: boolean;
    scrollIntoView?: boolean;
  }
): Promise<void>

Type text to an element. Typing is natural, i.e. as if a human presses a sequence of keys.

Parameters

target

Typing target.

text

Text to type.

[options]

Type options:
- [followFocus] Whether to type to the element that has focus even if it changed to a different target (true by default).
- [overwrite] Whether to overwrite the current contents of the target input (true by default).
- [timePerChar] Expected mean time to press each character in ms (100 by default).
- [moveMouse] Whether to move the virtual mouse pointer to the target center before performing the action (false by default).
- [scrollIntoView] Whether to scroll the target element into view before performing the action (true by default).

Returns

A promise that resolves once text was typed.

Example

// Type 'Jane Doe' to the currently focused element (suppose it is an input field):
await browser.webfuseSession
  .automation
  .act
  .type(browser.webfuseSession.automation.Target.FOCUS, 'Jane Doe');

act.select()

browser.webfuseSession.automation.act.select(
  target: Target,
  value: string,
  options?: {
    moveMouse?: boolean;
    scrollIntoView?: boolean;
  }
): Promise<void>

Select an option of a dropdown element by value.

Parameters

target

Select target.

value

Value to select (according to value attribute).

[options]

Type options:
- [moveMouse] Whether to move the virtual mouse pointer to the target center before performing the action (false by default).
- [scrollIntoView] Whether to scroll the target element into view before performing the action (true by default).

Returns

A promise that resolves once value was selected.

Example

// Select 'Netherlands' from a country dropdown list:
await browser.webfuseSession
  .automation
  .act
  .select('select#country', 'netherlands', { scrollIntoView: true });

act.keyPress()

browser.webfuseSession.automation.act.keyPress(
  target: Target,
  key: 'a' | 'b' | ... | 'Z' | '.' | '!' | ... | 'Enter' | 'ArrowUp' | ...,
  options?: {
    altKey?: boolean;
    ctrlKey?: boolean;
    metaKey?: boolean;
    shiftKey?: boolean;
    moveMouse?: boolean;
    scrollIntoView?: boolean;
  }
): Promise<void>

Press a key on an element. The key argument must be equivalent to a supported KeyboardEvent.key property, which is either:

a control key, e.g., Enter to submit forms, or ArrowUp to scroll up a page, or
a printable key, i.e., a typeable symbol that could be typed to an input field, e.g., a, B, !, or >.

Control Keys
Printable Keys

TabEnterBackspaceDeleteInsert
ArrowUpArrowDownArrowLeftArrowRight
HomeEndPageUpPageDown
EscapeCapsLockShiftControlAltMeta
F1F2F3F4F5F6F7F8F9F10F11F12
Numpad0Numpad1Numpad2Numpad3Numpad4Numpad5Numpad6Numpad7Numpad8Numpad9
NumpadAddNumpadSubtractNumpadMultiplyNumpadDivideNumpadDecimalNumpadEnter

abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
,<.>/?;:
!@#$%^&*
~-_=+
()[]{}\|
`’”
␠\t\n\r

Parameters

target

Key press target.

key

Key to press.

[options]

Booleans to hold down a secondary during the press: alt, ctrl, meta, or shift.
[moveMouse] Whether to move the virtual mouse pointer to the target center before performing the action (false by default).
[scrollIntoView] Whether to scroll the target element into view before performing the action (true by default).

Returns

A promise that resolves once key was pressed.

Example

// Hit 'Enter' key on an identifiable submit button:
await browser.webfuseSession
  .automation
  .act
  .keyPress('#submit', 'Enter');

act.textSelect()

browser.webfuseSession.automation.act.textSelect(
  target: Target,
  text: string,
  options?: {
    occurrence?: number;
    moveMouse?: boolean;
    scrollIntoView?: boolean;
  }
): Promise<void>

Select continuous text in the page.

Parameters

target

Text content selection target.

text

Text to select (empty text also removes any existing selection).

[options]

Type options:
- [occurrence] Occurrence of text to select if is redundant (0, i.e., first by default).
- [moveMouse] Whether to move the virtual mouse pointer to the target center before performing the action (false by default).
- [scrollIntoView] Whether to scroll the target element into view before performing the action (true by default).

Returns

A promise that resolves once the selection was applied.

Example

// Select the third occurrence of the text 'ipsum' in the main element:
await browser.webfuseSession
  .automation
  .act
  .textSelect('main', 'ipsum', {
    occurrence: 2
  });

`.see` Perception Scope

Perception is the process of interpreting the current state of a web page. In the case of web automation, the state of web applications. While human users primarily see websites via graphical user interfaces (GUIs), computer agents can easily understand web application state by more that visual means. A prominent technical representation of non-visual state is the document object model (DOM)—a web browser’s runtime model of a web application.

By analyzing the DOM, the agent can programmatically understand:

Structural hierarchy (how elements are nested).
Element attributes (IDs, classes, and accessibility labels).
Metadata not immediately visible to the naked eye.

see.textSelection()

browser.webfuseSession.automation.see.textSelection(): Promise<string>

Get the currently selected text in the page.

Returns

A promise that resolves with the currently selected text (or empty string if nothing is selected).

see.domSnapshot()

browser.webfuseSession.automation.see.domSnapshot(options?: {
  crossFrame?: boolean;
  crossShadow?: boolean;
  interactiveOnly?: boolean;
  revealMaskedElements?: boolean;
  root?: Target;
  webfuseIDs?: boolean;
}): Promise<string>

Take a web page snapshot, which is a time-sensitive serialization of the current web page state.

Parameters

[options] DOM snapshot options:

[crossFrame] Webfuse Exclusive Whether to include iframe subtrees (false by default).
[crossShadow] Webfuse Exclusive Whether to include shadow DOM subtrees (true by default).
[interactiveOnly] Only include interactive DOM subtrees in the snapshot, which can be interpreted as a noise filter.
[revealMaskedElements] Whether to include masked elements (false by default).
[root] Target a snapshot root element to scope the DOM snapshot to a specific subtree (integrates with targeting, body by default).
[webfuseIDs] Whether to assign each element the Webfuse ID (Webfuse ID) via the HTML pseudo attribute wf-id (false by default).

Returns

A promise that resolves with the DOM snapshot, i.e., DOM state serialized as HTML.

Example

// Take a DOM snapshot, i.e., serialize the current state of the DOM as HTML, descending only from the main element.
// To each element, assign its unique Webfuse ID via `wf-id` pseudo attribute.
// Do not include shadow DOMs in snapshot:
await browser.webfuseSession
  .automation
  .see
  .domSnapshot({
    root: 'main',
    crossShadow: false,
    webfuseIDs: true,
  });

Cross-Frame Snapshots

Cross-frame snapshots (crossFrame = true) will have all iframe contents inlined, e.g.:

<html>
  <head></head>
  <body>
    <h1>Parent</h1>
    <iframe src="/child">
      <html>
        <h1>Child</h1>
      </html>
    </iframe>
  </body>
</html>

Cross-Shadow Snapshots

Cross-shadow snapshots (crossShadow = true) will have all shadow root contents inlined, e.g.:

<div>
  <custom-element>
    <shadow-root>
      <strong>Shadow</strong>
      <p>
        <slot></slot>
      </p>
    </shadow-root>
    <b>Slotted</b>
  </custom-element>
</div>

Snapshots and Agentic AI

The web AI agent lifecycle models a constant loop of perceiving the current state of a website, prompting the model for actuation suggestions, and acting out these suggestions. Snapshots paired with Webfuse IDs represent a robust means of targeting, even after destructive snapshot processing.

_PROMPT

Book a flight to Amsterdam.

_SNAPSHOT

<body WF-ID="1">
  <h1 WF-ID="2">Book Flight</h1>
  <p>Please confirm your booking information.</p>
  <checkout-form WF-ID="3">
    <shadow-root>
      <button type="button" WF-ID="4">Confirm</button>
    </shadow-root>
  </checkout-form>
</body>

Based on the AI model’s suggestions, actuation can target elements via Webfuse ID:

browser.webfuseSession
  .automation
  .act
  .click('4')

see.guiSnapshot()

Serialize the GUI for various processing purposes, such as for LLM input. Serialized GUI corresponds to a screenshot. Hence, this is an alias of webfuseSession.takeScreenshot().

browser.webfuseSession.automation.see.guiSnapshot(): Promise<ImageBitmap>

Returns

A promise that resolves with the GUI snapshot, i.e., GUI state serialized as an image bitmap.

`.tool` Tool Scope

tool.computeAccessibilityTree()

browser.webfuseSession.automation.tool.computeAccessibilityTree(domSnapshot: string): object

Translate a DOM snapshot to its accessibility tree object representation.

Parameters

domSnapshot

DOM snapshot to compute an accessibility tree representation from.

Example

<form
    role="form"
    aria-describedby="recipe-hint"
    aria-labelledby="recipe-form-title">
    <div
        role="group"
        aria-labelledby="checkbox-group">
        <h3 id="checkbox-group">Recipe Preferences</h3>
        <label for="notifications"
            aria-describedby="notifications-description">
            <input type="checkbox" id="notifications"
                name="notifications"
                aria-label="Enable recipe update notifications">
            Receive recipe updates
        </label>
        <p id="notifications-description">I would like to receive updates.</p>
    </div>
</form>

const snapshot = await browser.webfuseSession
  .automation
  .see
  .domSnapshot({
    root: 'form',
  });

await browser.webfuseSession
  .automation
  .tool
  .computeAccessibilityTree(snapshot);

{
  "role": "RootWebArea",
  "source": "html",
  "children": [
    {
      "name": "Recipe Preferences",
      "properties": {
        "level": 3
      },
      "role": "heading",
      "source": "#checkbox-group"
    },
    {
      "children": [
        {
          "name": "Enable recipe update notifications",
          "properties": {
            "aria-label": "Enable recipe update notifications"
          },
          "role": "checkbox",
          "source": "#notifications",
          "states": {
            "checked": false
          }
        }
      ],
      "properties": {
        "aria-describedby": "notifications-description"
      },
      "role": "generic",
      "source": "html > body > section > form > div > label",
      "description": "I would like to receive updates."
    }
  ]
}

tool.applyD2Snap()

Apply the D2Snap DOM downsampling algorithm to a DOM snapshot. This will reduce its size, while retaining its overall structural features, and also a majority of inherent UI features. The D2Snap algorithm was developed in order to mitigate the prevalent DOM token size disadvantage.

browser.webfuseSession.automation.tool.applyD2Snap(
  domSnapshot: string,
  hierarchyRatio: number = 0.4, textRatio: number = 0.6, attributeRatio: number = 0.8,
  options?: {
    keepUnknownElements?: boolean;
    skipMarkdownTranslation?: boolean;
  }
): Promise<string>

Parameters

domSnapshot

DOM snapshot to downsample.

[hierarchyRatio]

Hierarchy (nesting) compression ratio of the result.

[textRatio]

Text (length) compression ratio of the result.

[attributeRatio]

Attribute (count) compression ratio of the result.

[options]

Snapshot options:
- [keepUnknownElements] Whether to keep unknown (custom) elements in the downsampled DOM (false by default).
- [skipMarkdownTranslation] Whether to skip content HTML to Markdown translation (false by default).

Returns

A promise that resolves with the downsampled DOM snapshot.

Example

_Raw

<section class="container" tabindex="3" required="true" type="example">
  <div class="mx-auto" data-topic="products" required="false">
    <h1>Our Pizza</h1>
    <div>
      <div class="shadow-lg">
        <h2>Margherita</h2>
        <p>
          A simple classic: mozzarela, tomatoes and basil.
          An everyday choice!
        </p>
        <button type="button">Add</button>
      </div>
      <div class="shadow-lg">
        <h2>Capricciosa</h2>
        <p>
          A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.
          A true favourite!
          </p>
        <button type="button">Add</button>
      </div>
    </div>
  </div>
</section>

const snapshot = await browser.webfuseSession
  .automation
  .see
  .domSnapshot({
    root: 'section.container:first-of-type',
  });

// Downsample ('compress') the raw DOM snapshot.
// Compress hierarchy, i.e., element nesting depth by about 50%.
// Compress text, i.e., paragraph sentence length by about 30%.
// Compress attributes, i.e., attribute amount by about 80%:
await browser.webfuseSession
  .automation
  .tool
  .applyD2Snap(snapshot, 0.5, 0.3, 0.8);

_Downsampled

<!-- hierarchyRatio = .4, textRatio = .6, attributeRatio = .8 -->
<section>
  # Our Pizza
  <div>
    ## Margherita
    A simple classic:
    <button>Add</button>
    ## Capricciosa
    A rich taste:
    <button>Add</button>
  </div>
</section>

tool.applyAdaptiveD2Snap()

Alias: tool.downsample()

Apply the AdaptiveD2Snap DOM downsampling algorithm to a DOM snapshot. This is an adaptive version of the D2Snap algorithm that does not require explicit parameters.

browser.webfuseSession.automation.tool.applyAdaptiveD2Snap(
  domSnapshot: string,
  maxTokens: number = 2**15, // ≈ 32K
  maxIterations: number = 3,
  options?: {
    webfuseIDs?: boolean;
    keepUnknownElements?: boolean;
    skipMarkdownTranslation?: boolean;
  }
): Promise<string>

Parameters

domSnapshot

DOM snapshot to downsample.

[maxTokens]

Maximum expected snapshot size in estimated LLM input tokens (1 token ≈ 4 bytes/symbols).

[maxIterations]

Maximun tries to downsample with inceasing compression ratio paramteters to obtain a snapshot below the given token limit (throws error otherise).

[options]

Snapshot options:
- [webfuseIDs] Whether to add a unique data attribute wf-id to every element in the DOM in order to allow identification of equivalent elements across the original and the downsampled DOM. For example, <button class="btn btn-primary" wf-id="27">Click here!</button> (false by default).
- [keepUnknownElements] Whether to keep unknown (custom) elements in the downsampled DOM (false by default).
- [skipMarkdownTranslation] Whether to skip content HTML to Markdown translation (false by default).

Returns

A promise that resolves with the downsampled DOM snapshot.

Example

The recommended way to get started with downsampled DOM snapshots is by simply calling the adaptive D2Snap tool alias with the default arguments:

const snapshot = await browser.webfuseSession
  .automation
  .see
  .domSnapshot();

await browser.webfuseSession
  .automation
  .tool
  .downsample(snapshot); // .applyAdaptiveD2Snap(snapshot)

Other

navigate()

browser.webfuseSession.automation.navigate(newUrl: string): void

Parameters

newUrl

URL to navigate to.

wait()

browser.webfuseSession.automation.wait(ms: number): Promise<void>

Parameters

ms

The amount of milliseconds to wait.

Returns

A promise that resolves once the given wait time has passed.

Example

await browser.webfuseSession
  .automation
  .act
  .mouseMove([230, 1215]);

// Wait for 500ms:
await browser.webfuseSession
  .automation
  .wait(500);

await browser.webfuseSession
  .automation
  .act
  .type(browser.webfuseSession.automation.Target.POINTER, 'Amsterdam');

Automation API

Targeting

Cross-Shadow and -Frame Targeting Webfuse Exclusive

Cross-Shadow Targeting

Cross-Frame Targeting

Parent

Parent

Webfuse IDs Recommended

.act Actuation Scope

act.mouseMove()

Parameters

Returns

Example

act.scroll()

Parameters

Returns

Example

act.click()

Parameters

Returns

Example

act.type()

Parameters

Returns

Example

act.select()

Parameters

Returns

Example

act.keyPress()

Parameters

Returns

Example

act.textSelect()

Parameters

Returns

Example

.see Perception Scope

see.textSelection()

Returns

see.domSnapshot()

Parameters

Returns

Example

Cross-Frame Snapshots

Parent

Cross-Shadow Snapshots

Snapshots and Agentic AI

see.guiSnapshot()

Returns

.tool Tool Scope

tool.computeAccessibilityTree()

Parameters

Example

tool.applyD2Snap()

Parameters

Returns

Example

tool.applyAdaptiveD2Snap()

Parameters

Returns

Example

Other

navigate()

Parameters

wait()

Parameters

Returns

Example

`.act` Actuation Scope

`.see` Perception Scope

`.tool` Tool Scope