Overview
Inspiration: Vercel's dogfood skill in vercel-labs/agent-browser (PR #538, merged Feb 24, 2026). Also adopted by callstackincubator/agent-device for mobile QA.
Give Hermes Agent the ability to systematically explore and test web applications, find bugs, and produce structured QA reports with reproduction evidence (screenshots, video, console errors). The agent becomes its own QA tester — point it at a URL, it explores every page, clicks every button, tests forms with edge cases, checks console errors, captures visual evidence, and writes a professional bug report.
How Vercel's Dogfood Skill Works
Workflow (5 Phases)
- Initialize — Create output dirs (
screenshots/, videos/), copy report template, open target URL in agent-browser
- Authenticate — If login required, perform auth flow and save browser state for session persistence
- Orient & Explore — Take annotated screenshots of each page, test interactive elements, check empty/error states, monitor browser console
- Document Issues — Capture evidence immediately as bugs are found:
- Interactive/behavioral bugs: video recording + step-by-step screenshots
- Static bugs (typos, alignment): single annotated screenshot
- Console errors: captured from browser
- Wrap Up — Aim for 5-10 well-documented issues, update severity counts in report header, close session
Issue Taxonomy (7 Categories)
| Category |
What to Look For |
| Visual/UI |
Layout breakage, rendering issues, missing assets, animation glitches |
| Functional |
Broken links (404s), buttons that do nothing, form validation errors, state not persisting, silent failures |
| UX |
Missing loading indicators, slow interactions (>300ms), confusing navigation, dead ends, missing confirmation for destructive actions |
| Content |
Typos, grammatical errors, placeholder/Lorem Ipsum text, truncated text without tooltips |
| Performance |
Slow page loads (>3s), janky scrolling, large layout shifts (CLS), excessive network requests, memory leaks |
| Console/Errors |
JS exceptions, unhandled promise rejections, failed network requests (4xx/5xx), CORS errors, deprecation warnings |
| Accessibility |
Missing alt text, unlabeled form inputs, poor keyboard navigation, focus traps, insufficient color contrast |
Severity Levels
| Severity |
Definition |
| Critical |
Blocks a core workflow, causes data loss, or crashes the app |
| High |
Major feature broken or unusable; no workaround exists |
| Medium |
Feature works but with noticeable problems; workaround exists |
| Low |
Minor cosmetic or polish issue |
Exploration Checklist (Per Page)
- Visual scan — take annotated screenshot; check layout and alignment
- Interactive elements — click every button and link; verify feedback
- Forms — test empty submissions, invalid inputs, and edge cases
- Navigation — verify breadcrumbs, back buttons, and deep links
- States — explicitly check empty, loading, error, and overflow states
- Console — monitor for JS errors and failed network requests
- Responsiveness — test at various viewport sizes
- Auth boundaries — test behavior for logged-out users or different roles
Report Template Structure
# Dogfood Report
| Field | Value |
|-------|-------|
| Date | {DATE} |
| App URL | {URL} |
| Session | {SESSION_NAME} |
| Scope | {SCOPE} |
## Summary
| Severity | Count |
|----------|-------|
| Critical | 0 |
| High | 0 |
| Medium | 0 |
| Low | 0 |
## ISSUE-001: {Short title}
- **Severity:** critical / high / medium / low
- **Category:** visual / functional / ux / content / performance / console / accessibility
- **URL:** {page URL}
- **Repro Video:** {path or N/A}
**Description:** {what is wrong, expected vs actual}
**Repro Steps:**
1. Navigate to {URL}

2. {Action}

3. **Observe:** {Result}

Key Design Decisions from Vercel's Implementation
- "Repro is everything" — match evidence type to issue type (video for interaction bugs, screenshot for static bugs)
- Human-pace recording —
sleep 1s between actions so videos are watchable by humans
- Use
type not fill during recording for character-by-character input visibility
- Incremental writing — append to report as you go for crash resilience
- Test as a user, not a developer — do not read application source code or config files
- Testable — includes structural tests (CI, no API key needed) + full E2E evals (against a fixture page with intentional bugs)
Implementation for Hermes Agent
Browser Automation Tool Options
| Option |
Pros |
Cons |
agent-browser (recommended) |
Purpose-built for AI agents; Rust CLI + Playwright backend; annotated screenshots with numbered refs (@e1, @e2); security features (domain allowlist, content boundaries, auth vault); snapshot command gives accessibility tree |
npm dependency; needs Chromium install |
| Playwright directly |
No wrapper dependency; full browser control |
More raw commands; no agent-optimized features like ref-based interaction |
web_extract + vision_analyze |
Zero setup; already available |
No real interaction (can't click/fill); limited to content extraction |
Recommendation: agent-browser — it's the tool Vercel's skill was built for. Key commands:
agent-browser open <url> # Navigate
agent-browser snapshot -i # Accessibility tree (interactive elements)
agent-browser screenshot --annotate output.png # Numbered screenshot
agent-browser click @e2 # Click by ref
agent-browser fill @e3 "text" # Fill input by ref
agent-browser console # Get console output
agent-browser errors # Get JS errors
agent-browser record start output.webm # Start video recording
agent-browser record stop # Stop recording
agent-browser scroll down 300 # Scroll
agent-browser close # End session
Dogfood Skill Components
- SKILL.md — Main instructions teaching the agent the 5-phase workflow
- references/issue-taxonomy.md — Full issue category definitions and severity levels
- templates/dogfood-report-template.md — Report template to copy into output dir
- Optional: scripts/ — Helper scripts for setup/teardown
The skill should teach Hermes to:
- Accept a target URL and optional scope/auth parameters
- Systematically explore using
agent-browser commands via terminal
- Capture evidence (screenshots saved to disk, optional video)
- Optionally use
vision_analyze to verify screenshots show real issues
- Write a structured report using the template
- Categorize and severity-rate each issue
Report Output
Markdown report with embedded screenshot paths, suitable for:
- Saving to disk for review
- GitHub issue creation (one issue per finding, or summary issue)
- Telegram/Discord summary delivery
- Comparison against previous runs
Use Cases
- Self-QA: Test our own landing page, docs, or web properties
- PR review augmentation: When a PR changes UI, run dogfood against the preview deployment to catch visual regressions before merge
- Scheduled QA: Cronjob that runs dogfood against production URLs periodically and reports issues via Telegram/Discord
- User-facing: "Hey Hermes, QA test this website for me" — instant value for any web project the user is working on
Phased Rollout
Phase 1: Install agent-browser + basic skill
npm install -g agent-browser && agent-browser install
- Create a dogfood skill adapted from Vercel's version
- Support: open URL → explore pages → take screenshots → basic markdown report
- Verify headless operation on our server (no display)
Phase 2: Full evidence pipeline
- Video recording for interaction bugs
- Console error capture and reporting
- Issue taxonomy + severity classification
- Structured report template with per-issue evidence
- Form testing with edge cases (empty, invalid, overflow inputs)
Phase 3: Integration with Hermes workflows
- Auto-create GitHub issues from findings
- Send summary to Telegram/Discord
- Compare against previous runs (regression detection)
- Scheduled dogfood cronjobs for monitored URLs
- Integration with PR review workflow (test preview deployments)
Open Questions
- Should we vendor
agent-browser or treat it as an optional dependency that gets installed on first use?
- How to handle sites requiring authentication? (agent-browser has an auth vault — is that sufficient, or do we need our own credential storage?)
- Should
vision_analyze validate every screenshot, or trust agent-browser's accessibility tree for issue detection? (hybrid approach — use accessibility tree for functional issues, vision for visual issues?)
- What's the right scope limit per session? Vercel targets 5-10 issues. Too many = noise, too few = incomplete.
- Can we run this fully headless on our server (no display)?
agent-browser uses headless Chromium so likely yes, but video recording may need xvfb.
- Should the skill support custom checklists per-project (e.g., "always test the checkout flow with these 3 scenarios")?
References
Overview
Inspiration: Vercel's
dogfoodskill in vercel-labs/agent-browser (PR #538, merged Feb 24, 2026). Also adopted by callstackincubator/agent-device for mobile QA.Give Hermes Agent the ability to systematically explore and test web applications, find bugs, and produce structured QA reports with reproduction evidence (screenshots, video, console errors). The agent becomes its own QA tester — point it at a URL, it explores every page, clicks every button, tests forms with edge cases, checks console errors, captures visual evidence, and writes a professional bug report.
How Vercel's Dogfood Skill Works
Workflow (5 Phases)
screenshots/,videos/), copy report template, open target URL inagent-browserIssue Taxonomy (7 Categories)
Severity Levels
Exploration Checklist (Per Page)
Report Template Structure
Key Design Decisions from Vercel's Implementation
sleep 1sbetween actions so videos are watchable by humanstypenotfillduring recording for character-by-character input visibilityImplementation for Hermes Agent
Browser Automation Tool Options
agent-browser(recommended)web_extract+vision_analyzeRecommendation:
agent-browser— it's the tool Vercel's skill was built for. Key commands:Dogfood Skill Components
The skill should teach Hermes to:
agent-browsercommands via terminalvision_analyzeto verify screenshots show real issuesReport Output
Markdown report with embedded screenshot paths, suitable for:
Use Cases
Phased Rollout
Phase 1: Install agent-browser + basic skill
npm install -g agent-browser && agent-browser installPhase 2: Full evidence pipeline
Phase 3: Integration with Hermes workflows
Open Questions
agent-browseror treat it as an optional dependency that gets installed on first use?vision_analyzevalidate every screenshot, or trustagent-browser's accessibility tree for issue detection? (hybrid approach — use accessibility tree for functional issues, vision for visual issues?)agent-browseruses headless Chromium so likely yes, but video recording may needxvfb.References