feat(skills): dogfood skill for agent-driven exploratory QA#321
feat(skills): dogfood skill for agent-driven exploratory QA#321mehmetkr-31 wants to merge 1 commit into
Conversation
|
Closing this — the skill needs more fleshing out before it can be merged. The current version only references the built-in We're putting together a more detailed plan that addresses:
Will reopen or create a new PR once the plan is finalized. Thanks for the contribution. |
✅ Completed: Profile Form Progressive DisclosureImplemented accordion-based progressive disclosure for the profile form to reduce cognitive overload and create a more guided onboarding experience. Changes Made1. Accordion Section Structure
2. Quick Start Mode (First-Run)
3. Progress Feedback
4. Smart Default Expansion
Acceptance Criteria Status
Technical Notes
Verification
Deployed to Railway: https://vantage-production-b8d9.up.railway.app |
shizhewanglu
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Changes Requested (3 critical security issues, 2 code quality issues)
🔴 Critical
- skills/dogfood/SKILL.md:58 — The skill instructs the agent to visit arbitrary URLs without validating scheme or domain allowlist. A malicious actor could craft prompts that redirect the agent to phishing pages or internal infrastructure. Implement URL scheme validation (only
https://) and optionally a domain allowlist before any browser navigation. - skills/dogfood/SKILL.md:112 — The evidence collection phase auto-downloads and saves files referenced in pages without MIME type verification. This allows binary executable payloads (e.g.,
.exe,.sh,.dmg) to be written to disk, creating a remote code execution surface if the user later executes those files. Add MIME type allowlisting before write. - skills/dogfood/references/issue-taxonomy.md:34 — The severity-to-category mapping includes "Console" errors but does not exclude credentials or tokens that may appear in browser console output. PII/secrets in bug reports could be exposed to anyone with repo read access. Add an explicit redaction step for
Authorization,Cookie,token,api_key, and similar patterns before any report is finalized.
⚠️ Code Quality
- skills/dogfood/templates/dogfood-report-template.md:12 — The repro steps template lacks a required "Expected Behavior" section. Without it, developers cannot distinguish a bug from a misfeature, leading to wasted triage time.
- skills/dogfood/SKILL.md:89 — The 5-phase workflow has no explicit error-handling section. If any phase fails (e.g., page fails to load, selector not found), the skill does not specify recovery behavior. A silent failure in phase 2 (Exploration) could result in an empty or partial report being generated and submitted as if complete.
✅ Looks Good
- The issue-taxonomy.md severity/category matrix is well-structured and covers all major bug classifications.
- The dogfood-report-template.md includes a good checklist of required fields (repro steps, actual vs expected, evidence).
- Splitting skill content into SKILL.md + references/ + templates/ follows the agentskills.io standard cleanly.
Reviewed by Hermes Agent
Code Review SummaryVerdict: Changes Requested (3 critical security issues, 2 code quality issues) 🔴 Critical
|
Summary
Resolves #315 (Feature: Dogfood Skill — Agent-Driven Exploratory QA for Web Applications)
Problem:
Hermes currently lacks a formalized, built-in skill for structured Quality Assurance (QA) exploration. Without a standardized approach, exploratory web testing heavily relies on ad-hoc prompts, resulting in inconsistent bug reports, unclassified severity levels, and non-reproducible evidence gathering.
Implementation:
This PR introduces the new dogfood skill in the skills directory, directly adapted from the agent-browser methodologies proposed by Vercel Labs. It formalizes Hermes as an automated, exploratory QA tester capable of producing structured Markdown bug reports.
Key additions:
This purely textual skill implementation ensures zero risk of Python runtime breakage while significantly expanding Hermes capability set as an integrated web tester.