Skip to content

browse: full-page screenshots silently exceed Anthropic vision API 2000px limit, bricking sessions #1214

@raffoz

Description

@raffoz

Summary

screenshot (and snapshot -a / snapshot -H) default to fullPage: true. On any page taller than ~2000px scroll height, the resulting PNG exceeds the Anthropic vision API's 2000px-per-side limit for many-image requests. The failing image stays in conversation history, so every subsequent turn re-fails on the same envelope — the entire session is bricked, not just one tool call.

This is a soft footgun: defaults are otherwise excellent (viewport 1280×720, deviceScaleFactor 1), so the user has no signal that "long page" is the boundary condition until they hit it.

Repro

  1. browse goto https://example.com/some-long-doc (any page with body height > ~2000px) 2. browse screenshot /tmp/shot.png 3. Have Claude Read the resulting PNG 4. Claude returns: image exceeds 2000 pixels on the longest edge 5. The PNG is now stuck in transcript — every following turn fails

Where in code

Affected sites (refs at commit 6209163):

  • browse/src/snapshot.ts:419await page.screenshot({ path, fullPage: true }) (annotate) - browse/src/snapshot.ts:539 — same (heatmap)
  • (plus the bare screenshot command — didn't trace the exact line, but the docstring
    at commands.ts:138 documents it as full-page by default)

Proposed fix

Three options, in order of how invasive:

  1. Cheapest: in browser-manager.ts add a post-capture guard. After every page.screenshot(...), read PNG dims (zero-dep: parse IHDR chunk, ~20 lines) and either downscale or split if either side > 1800px. Emit a [browse] log line so the agent sees what happened. No new deps.

  2. Behavioral: flip the default. Make screenshot capture the visible viewport unless --full-page is passed explicitly. The current --viewport flag becomes the default; add --full-page as opt-in. Mirrors how Playwright's own API treats fullPage: false as the default.

  3. Config knob: ~/.gstack/config.yaml exposes screenshot_max_height: 1800 (default), and the screenshot command auto-splits above it. Lets power users opt out by setting it to 0 / Infinity.

I'd vote for (1) + (2) together: agents can't accidentally produce poison images, and --full-page stays available for users who explicitly want it.

Why this matters

Once one bad screenshot lands in conversation history, the user has to either clear context or ask Claude to forget. Both are friction. Since gstack is explicitly designed for "agent QA-ing a site", the failure mode lands almost entirely on agent users, which is precisely the audience.

Happy to send a PR if the maintainers agree on which option to take.

— Reported via Claude Opus 4.7, gstack v1.12.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions