Feature: Dogfood Skill — Agent-Driven Exploratory QA for Web Applications

## Overview

**Inspiration:** Vercel's `dogfood` skill in [vercel-labs/agent-browser](https://github.com/vercel-labs/agent-browser) ([PR #538](https://github.com/vercel-labs/agent-browser/pull/538), merged Feb 24, 2026). Also adopted by [callstackincubator/agent-device](https://github.com/callstackincubator/agent-device) for mobile QA.

Give Hermes Agent the ability to systematically explore and test web applications, find bugs, and produce structured QA reports with reproduction evidence (screenshots, video, console errors). The agent becomes its own QA tester — point it at a URL, it explores every page, clicks every button, tests forms with edge cases, checks console errors, captures visual evidence, and writes a professional bug report.

---

## How Vercel's Dogfood Skill Works

### Workflow (5 Phases)

1. **Initialize** — Create output dirs (`screenshots/`, `videos/`), copy report template, open target URL in `agent-browser`
2. **Authenticate** — If login required, perform auth flow and save browser state for session persistence
3. **Orient & Explore** — Take annotated screenshots of each page, test interactive elements, check empty/error states, monitor browser console
4. **Document Issues** — Capture evidence immediately as bugs are found:
   - Interactive/behavioral bugs: video recording + step-by-step screenshots
   - Static bugs (typos, alignment): single annotated screenshot
   - Console errors: captured from browser
5. **Wrap Up** — Aim for 5-10 well-documented issues, update severity counts in report header, close session

### Issue Taxonomy (7 Categories)

| Category | What to Look For |
|----------|-----------------|
| **Visual/UI** | Layout breakage, rendering issues, missing assets, animation glitches |
| **Functional** | Broken links (404s), buttons that do nothing, form validation errors, state not persisting, silent failures |
| **UX** | Missing loading indicators, slow interactions (>300ms), confusing navigation, dead ends, missing confirmation for destructive actions |
| **Content** | Typos, grammatical errors, placeholder/Lorem Ipsum text, truncated text without tooltips |
| **Performance** | Slow page loads (>3s), janky scrolling, large layout shifts (CLS), excessive network requests, memory leaks |
| **Console/Errors** | JS exceptions, unhandled promise rejections, failed network requests (4xx/5xx), CORS errors, deprecation warnings |
| **Accessibility** | Missing alt text, unlabeled form inputs, poor keyboard navigation, focus traps, insufficient color contrast |

### Severity Levels

| Severity | Definition |
|----------|-----------|
| **Critical** | Blocks a core workflow, causes data loss, or crashes the app |
| **High** | Major feature broken or unusable; no workaround exists |
| **Medium** | Feature works but with noticeable problems; workaround exists |
| **Low** | Minor cosmetic or polish issue |

### Exploration Checklist (Per Page)

1. Visual scan — take annotated screenshot; check layout and alignment
2. Interactive elements — click every button and link; verify feedback
3. Forms — test empty submissions, invalid inputs, and edge cases
4. Navigation — verify breadcrumbs, back buttons, and deep links
5. States — explicitly check empty, loading, error, and overflow states
6. Console — monitor for JS errors and failed network requests
7. Responsiveness — test at various viewport sizes
8. Auth boundaries — test behavior for logged-out users or different roles

### Report Template Structure

```markdown
# Dogfood Report

| Field | Value |
|-------|-------|
| Date | {DATE} |
| App URL | {URL} |
| Session | {SESSION_NAME} |
| Scope | {SCOPE} |

## Summary
| Severity | Count |
|----------|-------|
| Critical | 0 |
| High | 0 |
| Medium | 0 |
| Low | 0 |

## ISSUE-001: {Short title}
- **Severity:** critical / high / medium / low
- **Category:** visual / functional / ux / content / performance / console / accessibility
- **URL:** {page URL}
- **Repro Video:** {path or N/A}

**Description:** {what is wrong, expected vs actual}

**Repro Steps:**
1. Navigate to {URL}
   ![Step 1](screenshots/issue-001-step1.png)
2. {Action}
   ![Step 2](screenshots/issue-001-step2.png)
3. **Observe:** {Result}
   ![Result](screenshots/issue-001-result.png)
```

### Key Design Decisions from Vercel's Implementation

- **"Repro is everything"** — match evidence type to issue type (video for interaction bugs, screenshot for static bugs)
- **Human-pace recording** — `sleep 1s` between actions so videos are watchable by humans
- **Use `type` not `fill`** during recording for character-by-character input visibility
- **Incremental writing** — append to report as you go for crash resilience
- **Test as a user, not a developer** — do not read application source code or config files
- **Testable** — includes structural tests (CI, no API key needed) + full E2E evals (against a fixture page with intentional bugs)

---

## Implementation for Hermes Agent

### Browser Automation Tool Options

| Option | Pros | Cons |
|--------|------|------|
| **`agent-browser`** (recommended) | Purpose-built for AI agents; Rust CLI + Playwright backend; annotated screenshots with numbered refs (@e1, @e2); security features (domain allowlist, content boundaries, auth vault); snapshot command gives accessibility tree | npm dependency; needs Chromium install |
| **Playwright directly** | No wrapper dependency; full browser control | More raw commands; no agent-optimized features like ref-based interaction |
| **`web_extract` + `vision_analyze`** | Zero setup; already available | No real interaction (can't click/fill); limited to content extraction |

**Recommendation:** `agent-browser` — it's the tool Vercel's skill was built for. Key commands:
```bash
agent-browser open <url>                              # Navigate
agent-browser snapshot -i                             # Accessibility tree (interactive elements)
agent-browser screenshot --annotate output.png        # Numbered screenshot
agent-browser click @e2                               # Click by ref
agent-browser fill @e3 "text"                         # Fill input by ref
agent-browser console                                 # Get console output
agent-browser errors                                  # Get JS errors
agent-browser record start output.webm                # Start video recording
agent-browser record stop                             # Stop recording
agent-browser scroll down 300                         # Scroll
agent-browser close                                   # End session
```

### Dogfood Skill Components

1. **SKILL.md** — Main instructions teaching the agent the 5-phase workflow
2. **references/issue-taxonomy.md** — Full issue category definitions and severity levels
3. **templates/dogfood-report-template.md** — Report template to copy into output dir
4. **Optional: scripts/** — Helper scripts for setup/teardown

The skill should teach Hermes to:
- Accept a target URL and optional scope/auth parameters
- Systematically explore using `agent-browser` commands via terminal
- Capture evidence (screenshots saved to disk, optional video)
- Optionally use `vision_analyze` to verify screenshots show real issues
- Write a structured report using the template
- Categorize and severity-rate each issue

### Report Output

Markdown report with embedded screenshot paths, suitable for:
- Saving to disk for review
- GitHub issue creation (one issue per finding, or summary issue)
- Telegram/Discord summary delivery
- Comparison against previous runs

---

## Use Cases

- **Self-QA:** Test our own landing page, docs, or web properties
- **PR review augmentation:** When a PR changes UI, run dogfood against the preview deployment to catch visual regressions before merge
- **Scheduled QA:** Cronjob that runs dogfood against production URLs periodically and reports issues via Telegram/Discord
- **User-facing:** "Hey Hermes, QA test this website for me" — instant value for any web project the user is working on

---

## Phased Rollout

### Phase 1: Install agent-browser + basic skill
- `npm install -g agent-browser && agent-browser install`
- Create a dogfood skill adapted from Vercel's version
- Support: open URL → explore pages → take screenshots → basic markdown report
- Verify headless operation on our server (no display)

### Phase 2: Full evidence pipeline
- Video recording for interaction bugs
- Console error capture and reporting
- Issue taxonomy + severity classification
- Structured report template with per-issue evidence
- Form testing with edge cases (empty, invalid, overflow inputs)

### Phase 3: Integration with Hermes workflows
- Auto-create GitHub issues from findings
- Send summary to Telegram/Discord
- Compare against previous runs (regression detection)
- Scheduled dogfood cronjobs for monitored URLs
- Integration with PR review workflow (test preview deployments)

---

## Open Questions

- Should we vendor `agent-browser` or treat it as an optional dependency that gets installed on first use?
- How to handle sites requiring authentication? (agent-browser has an auth vault — is that sufficient, or do we need our own credential storage?)
- Should `vision_analyze` validate every screenshot, or trust `agent-browser`'s accessibility tree for issue detection? (hybrid approach — use accessibility tree for functional issues, vision for visual issues?)
- What's the right scope limit per session? Vercel targets 5-10 issues. Too many = noise, too few = incomplete.
- Can we run this fully headless on our server (no display)? `agent-browser` uses headless Chromium so likely yes, but video recording may need `xvfb`.
- Should the skill support custom checklists per-project (e.g., "always test the checkout flow with these 3 scenarios")?

---

## References

- [Vercel agent-browser repo](https://github.com/vercel-labs/agent-browser)
- [Dogfood skill PR #538](https://github.com/vercel-labs/agent-browser/pull/538)
- [Dogfood SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skills/dogfood/SKILL.md)
- [Issue taxonomy reference](https://github.com/vercel-labs/agent-browser/blob/main/skills/dogfood/references/issue-taxonomy.md)
- [Report template](https://github.com/vercel-labs/agent-browser/blob/main/skills/dogfood/templates/dogfood-report-template.md)
- [callstackincubator/agent-device](https://github.com/callstackincubator/agent-device) (mobile adaptation)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Dogfood Skill — Agent-Driven Exploratory QA for Web Applications #315

Overview

How Vercel's Dogfood Skill Works

Workflow (5 Phases)

Issue Taxonomy (7 Categories)

Severity Levels

Exploration Checklist (Per Page)

Report Template Structure

Key Design Decisions from Vercel's Implementation

Implementation for Hermes Agent

Browser Automation Tool Options

Dogfood Skill Components

Report Output

Use Cases

Phased Rollout

Phase 1: Install agent-browser + basic skill

Phase 2: Full evidence pipeline

Phase 3: Integration with Hermes workflows

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Category	What to Look For
Visual/UI	Layout breakage, rendering issues, missing assets, animation glitches
Functional	Broken links (404s), buttons that do nothing, form validation errors, state not persisting, silent failures
UX	Missing loading indicators, slow interactions (>300ms), confusing navigation, dead ends, missing confirmation for destructive actions
Content	Typos, grammatical errors, placeholder/Lorem Ipsum text, truncated text without tooltips
Performance	Slow page loads (>3s), janky scrolling, large layout shifts (CLS), excessive network requests, memory leaks
Console/Errors	JS exceptions, unhandled promise rejections, failed network requests (4xx/5xx), CORS errors, deprecation warnings
Accessibility	Missing alt text, unlabeled form inputs, poor keyboard navigation, focus traps, insufficient color contrast

Severity	Definition
Critical	Blocks a core workflow, causes data loss, or crashes the app
High	Major feature broken or unusable; no workaround exists
Medium	Feature works but with noticeable problems; workaround exists
Low	Minor cosmetic or polish issue

Option	Pros	Cons
`agent-browser` (recommended)	Purpose-built for AI agents; Rust CLI + Playwright backend; annotated screenshots with numbered refs (@e1, @e2); security features (domain allowlist, content boundaries, auth vault); snapshot command gives accessibility tree	npm dependency; needs Chromium install
Playwright directly	No wrapper dependency; full browser control	More raw commands; no agent-optimized features like ref-based interaction
`web_extract` + `vision_analyze`	Zero setup; already available	No real interaction (can't click/fill); limited to content extraction

Feature: Dogfood Skill — Agent-Driven Exploratory QA for Web Applications #315

Description

Overview

How Vercel's Dogfood Skill Works

Workflow (5 Phases)

Issue Taxonomy (7 Categories)

Severity Levels

Exploration Checklist (Per Page)

Report Template Structure

Key Design Decisions from Vercel's Implementation

Implementation for Hermes Agent

Browser Automation Tool Options

Dogfood Skill Components

Report Output

Use Cases

Phased Rollout

Phase 1: Install agent-browser + basic skill

Phase 2: Full evidence pipeline

Phase 3: Integration with Hermes workflows

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions