feat(agent): MVE Experiment Designer#976
Conversation
feat(instructions): introduce MVE coaching conventions for Experiment Designer chore(collections): include Experiment Designer in experimental collections chore(collections): update experimental collection YAML to reference new agent and instructions 🔧 - Generated by Copilot
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #976 +/- ##
==========================================
- Coverage 88.01% 86.91% -1.11%
==========================================
Files 45 31 -14
Lines 7886 5409 -2477
==========================================
- Hits 6941 4701 -2240
+ Misses 945 708 -237
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@mattdot ... can you look at the hifi and lofi prototype builders in design thinking and see if this covers your needs first? |
…mum Viable Experiments
@WilliamBerryiii not quite. It kind of proposes testing assumptions, but it doesn't really do it with the scientific rigor I'd expect from a true MVE. It feels more like it's proposing a vibe check of the assumptions rather than an experiment result that we have rock solid confidence in. |
One last set of questions (I should have asked earlier but has to think about it) ... where do you think this goes from a collections perspective after it's run in the experimental phase? More Coding Focused? Data Science too? Should this agent's artifact (the experiment.md) be handed off to the PRD-builder and/or Task Researcher for the implementation phase? You've got more experience in this space, are the experiments you're running more of a "rough PRD" scale or more of a "if we had enough tokens, we could probably get this through a task researcher run" 😂 ... This really comes down to do you want the experiment to run PRD -> *-Backlog-Manager for entry into the backlog or go right to coding (or both). |
The output of this is really a plan and hypothesis to go do an experiment on. Once you actually do the experiment, the results of the experiment would be used much like other research could be used, as inputs to PRD or ADR. For the collections, I could see this in the Data Science and Project Planning collections. |
|
@mattdot - should I update this to exit with a hand off document for the ADO and GH backlog managers? Do you anticipate that the experiment generates work items or do we go right to task researcher/planner/implementor/reviewer for workflow execution? |
I kind of feel like backlog might be the way to go since you could come out with several hypothesis to test and i would be good to track/work them independently. |
- add optional Phase 6 generating backlog-brief.md from mve-plan.md - add backlog-brief.md template to session artifacts and instructions - add usage guide and end-to-end example for Phase 6 workflow - enable experiment-to-backlog transition via bridge document 🔬 - Generated by Copilot
Changes Pushed: Backlog Bridge PhaseHey @mattdot — I pushed a commit to your branch that adds Phase 6 (Backlog Bridge) to the Experiment Designer. Here's a summary of what changed and why. Let me know if you're ok with these changes and I'll get the merge going. What's NewPhase 6: Backlog Bridge — an optional phase that converts completed MVE outputs into a
Files Changed (2 files, +148 / -24)
Prompt Builder ReviewThese changes went through a Prompt Builder evaluation pass (test + evaluate + fix cycle). Key findings addressed:
All linting ( Commit |
- fix ADO backlog manager intent classification to route structured briefs to Discovery instead of PRD Planning - add disambiguation heuristics separating PRDs from backlog-brief.md inputs - add backlog brief keyword signal to GitHub backlog manager Discovery row - add Backlog Brief document type to GitHub discovery parsing guidelines 🔗 - Generated by Copilot
Discovery Path B Alignment (
|
| File | Change |
|---|---|
.github/agents/ado/ado-backlog-manager.agent.md |
Added "backlog brief" keyword and "structured requirement briefs" indicator to Discovery row; refined disambiguation heuristics to separate PRDs (→ PRD Planning) from structured briefs (→ Discovery Path B) |
.github/agents/github/github-backlog-manager.agent.md |
Added "backlog brief" to Discovery keyword signals and contextual indicators |
.github/instructions/github/github-backlog-discovery.instructions.md |
Added Backlog Brief rows to Document Parsing Guidelines table (experiment requirements → User story, non-functional constraints → Task) |
Design Note
ADO's ado-wit-discovery.instructions.md was intentionally not modified — it uses generic extraction that handles backlog briefs adequately. The GitHub version has a structured Document Parsing Guidelines table that needed explicit Backlog Brief entries.
This completes the end-to-end path: Experiment Designer → backlog-brief.md → Backlog Manager → Discovery Path B → work items.
🤖 I have created a release *beep* *boop* --- ## [3.2.0](hve-core-v3.1.46...hve-core-v3.2.0) (2026-03-20) ### ✨ Features * add -OutputPath parameter to Validate-MarkdownFrontmatter.ps1 ([#1134](#1134)) ([fdf1bcf](fdf1bcf)), closes [#1006](#1006) * add action version consistency scan workflow ([#1127](#1127)) ([4229df1](4229df1)) * **agent:** MVE Experiment Designer ([#976](#976)) ([70f86ca](70f86ca)) * **agents:** add ADO Backlog Manager orchestrator agent ([#800](#800)) ([fae3987](fae3987)) * **agents:** add meeting analyst agent for transcript analysis using work-iq ([#502](#502)) ([5345b5b](5345b5b)) * **agents:** add quick-reference line to RPI Phase 5 suggestions ([#897](#897)) ([9a90f39](9a90f39)) * **agents:** add RAI Planner, enhance SSSC Planner, and redesign Security Planner ([#979](#979)) ([06f826c](06f826c)) * **agents:** add symmetric cross-system handoff to GitHub Backlog Manager ([#952](#952)) ([ba34a35](ba34a35)) * **agents:** Functional Code Review Agent — pre-PR functional correctness reviewer ([#733](#733)) ([9cf63b7](9cf63b7)) * **build:** add Python extensions and uv 0.10.8 to devcontainer ([#920](#920)) ([9ca0579](9ca0579)) * **build:** add uv ecosystem to Dependabot configuration ([#913](#913)) ([2a4bd39](2a4bd39)) * **build:** enable npm pinning enforcement in dependency scan ([#838](#838)) ([4e9e31f](4e9e31f)) * **build:** migrate attestation actions to v4.1.0 and add SBOM verification docs ([#841](#841)) ([ca1e65b](ca1e65b)) * **collections:** add four new validator checks (orphan, duplicate, companion, coverage) ([#869](#869)) ([1a96b73](1a96b73)) * **devcontainer,security:** add enterprise artifact hub configuration ([#1032](#1032)) ([1d56d25](1d56d25)) * **docs:** add Rust coding standards and guidelines ([#809](#809)) ([d4c4899](d4c4899)) * **extension:** add Microsoft logo icon to VS Code Marketplace listings ([#906](#906)) ([82aca41](82aca41)) * **github:** add declarative label management ([#953](#953)) ([a1a6845](a1a6845)) * **instructions:** add ADO backlog shared infrastructure ([#786](#786)) ([1914078](1914078)) * **instructions:** add ADO backlog sprint planning and capacity tracking ([#788](#788)) ([d6fb77d](d6fb77d)) * **instructions:** add ADO triage workflow and prompt ([#787](#787)) ([cde0190](cde0190)) * **instructions:** add shared story quality conventions and sprint planning ([#803](#803)) ([a2f18e3](a2f18e3)) * **prompts:** add ADO discovery and work item prompts with agent routing ([#790](#790)) ([7e74523](7e74523)) * **prompts:** add security review prompts ([#1118](#1118)) ([ad30967](ad30967)) * **scripts:** add dynamic Python skill discovery for lint/test ([#957](#957)) ([0a90f57](0a90f57)) * **scripts:** add Get-StandardTimestamp utility to CIHelpers module ([#1126](#1126)) ([b273a4b](b273a4b)) * **scripts:** add Python copyright header validation ([#905](#905)) ([67df902](67df902)) * **scripts:** add Python skill support to Validate-SkillStructure ([#903](#903)) ([68479d9](68479d9)) * **scripts:** add workflow npm command scanning to dependency pinning ([#837](#837)) ([6b5ae06](6b5ae06)) * **security:** add basic security reviewer agent with owasp skills ([#1008](#1008)) ([cb1fd05](cb1fd05)) * **security:** add sigstore attestation bundles and fix component-detection action ([#1148](#1148)) ([f79c272](f79c272)) * **skills:** add Atheris fuzz harness with CI workflow integration ([#1102](#1102)) ([d337e1d](d337e1d)) * **skills:** add PowerPoint automation skill with YAML-driven deck generation ([#868](#868)) ([00465cd](00465cd)) * **skills:** convert hve-core-installer agent to self-contained skill ([#846](#846)) ([1d821fb](1d821fb)) * **skills:** enhance pr-reference skill with flexible filtering and base branch detection ([#1095](#1095)) ([26a32ea](26a32ea)) * **workflows:** add devcontainer infrastructure change log workflow ([#899](#899)) ([8aca446](8aca446)) * **workflows:** add milestone auto-close on stable and pre-release publishes ([#834](#834)) ([79362b1](79362b1)) * **workflows:** add ms.date documentation freshness checking ([#969](#969)) ([3ed441c](3ed441c)) * **workflows:** add Python linting CI workflow with Ruff ([#951](#951)) ([f89f0eb](f89f0eb)) * **workflows:** add Python testing CI workflow with pytest and Codecov ([#934](#934)) ([5e8306f](5e8306f)) * **workflows:** add uv and Python package sync to copilot-setup-steps ([#921](#921)) ([45d517d](45d517d)) ### 🐛 Bug Fixes * **build:** override Linguist vendored flag for Python skill files ([#1155](#1155)) ([0eee5b6](0eee5b6)) * **build:** override serialize-javascript to >=7.0.3 for RCE fix ([#876](#876)) ([e49039a](e49039a)) * **build:** resolve Pinned-Dependencies alerts for vsce npm commands in extension workflows ([#782](#782)) ([89dad9d](89dad9d)) * **build:** update undici and yauzl overrides for security audit ([#1030](#1030)) ([2c2f92f](2c2f92f)) * **docs:** add CLI Plugins to install.md navigation surfaces ([#902](#902)) ([79d6595](79d6595)) * **docs:** add sidebar ordering for Design Thinking documentation ([#832](#832)) ([551fddc](551fddc)), closes [#830](#830) * **docs:** graduate design-thinking to preview and correct stale collection references ([#831](#831)) ([5110e35](5110e35)) * **docs:** include project-planning in UX Designer install guidance ([#908](#908)) ([e7aa9bc](e7aa9bc)) * **docs:** remediate writing-style convention violations ([#865](#865)) ([68b04bc](68b04bc)) * **docs:** remove draft content announcement banner ([#825](#825)) ([b45de80](b45de80)) * **docs:** remove unbounded path-to-regexp override breaking SSG ([#1153](#1153)) ([d810018](d810018)) * **docs:** use actual clone paths instead of folder display names in multi-root workspace settings ([#984](#984)) ([5dbab82](5dbab82)) * **instructions:** replace black with ruff in uv-projects ([#898](#898)) ([b0c06d9](b0c06d9)) * **scripts:** cover .github/ skill files in copyright header validation ([#1055](#1055)) ([#1098](#1098)) ([27fbd33](27fbd33)) * **scripts:** eliminate phantom git changes from plugin generation ([#1035](#1035)) ([e49a1b5](e49a1b5)) * **scripts:** enable JSON log output for lint:version-consistency ([#1033](#1033)) ([52b0885](52b0885)) * **security:** calculate compliance score from total scanned dependencies ([#930](#930)) ([c112c3d](c112c3d)) * **skills:** add AST validation and namespace restriction for content-extra.py ([#1027](#1027)) ([c50c7a3](c50c7a3)) * **skills:** add depth limits to recursive PowerPoint processing functions ([#1028](#1028)) ([bf08994](bf08994)) * **skills:** harden XML parsing and blob writes in powerpoint extract ([#1053](#1053)) ([89d24b1](89d24b1)) * **skills:** resolve ruff lint and format violations in powerpoint skill ([#1048](#1048)) ([17bbe7a](17bbe7a)) * **workflows:** add uv.lock dependencies submission have fork-skip condition ([#1109](#1109)) ([dec56ac](dec56ac)) * **workflows:** automate weekly SHA staleness check with issue creation ([#975](#975)) ([1ea4caa](1ea4caa)) * **workflows:** close Codecov integration gaps for Pester and pytest flags ([#1106](#1106)) ([cca29b7](cca29b7)) * **workflows:** propagate uv sync errors in copilot-setup-steps ([#961](#961)) ([df88d7c](df88d7c)) * **workflows:** resolve release-please skip cascade and Python project discovery ([#1043](#1043)) ([79993e2](79993e2)) * **workflows:** scan only commit subjects for breaking change detection ([#1157](#1157)) ([a38a657](a38a657)) ### 📚 Documentation * clarify HVE Core Extension vs Installer messaging across documentation ([#965](#965)) ([0fceb8f](0fceb8f)) * **docs:** add ADO integration user documentation ([#935](#935)) ([ec89302](ec89302)) * **docs:** add Project Planning agent documentation ([#936](#936)) ([3a3a0fd](3a3a0fd)) * **onboarding:** overhaul marketplace onboarding and documentation site ([#982](#982)) ([4309e10](4309e10)) ### ♻️ Refactoring * **build:** merge code-review collection into coding-standards ([#863](#863)) ([8027e7b](8027e7b)) * **workflows:** rename release pipeline workflows and add marketplace automation triggers ([#829](#829)) ([b6397f4](b6397f4)) ### 🔧 Maintenance * **build:** add clean:logs npm script ([#1122](#1122)) ([f85fe02](f85fe02)), closes [#988](#988) * **build:** add JSON reporter for cspell ([#1123](#1123)) ([6d59f67](6d59f67)) * **ci:** add multi-arch support to copilot-setup-steps binary downloads ([#955](#955)) ([8d0c706](8d0c706)) * **deps-dev:** bump cspell from 9.6.4 to 9.7.0 in the npm-dependencies group ([#839](#839)) ([3fa16ff](3fa16ff)) * **deps:** bump actions/dependency-review-action from 4.8.3 to 4.9.0 in the github-actions group across 1 directory ([#942](#942)) ([1a9b858](1a9b858)) * **deps:** bump cairosvg from 2.8.2 to 2.9.0 in /.github/skills/experimental/powerpoint ([#1025](#1025)) ([f4deda7](f4deda7)) * **deps:** bump dompurify from 3.3.1 to 3.3.2 in /docs/docusaurus ([#924](#924)) ([d2060d6](d2060d6)) * **deps:** bump svgo from 3.3.2 to 3.3.3 in /docs/docusaurus ([#880](#880)) ([6dc2406](6dc2406)) * **deps:** bump the github-actions group across 1 directory with 4 updates ([#1100](#1100)) ([2290dc0](2290dc0)) * **deps:** bump the github-actions group with 6 updates ([#840](#840)) ([f57bc01](f57bc01)) * **docs:** correct New-MsDateReport table rendering and refresh stale docs ([#1114](#1114)) ([c2b806f](c2b806f)) * **settings:** remove orphaned Checkov config and stale gitignore entries ([#870](#870)) ([98fcd74](98fcd74)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: hve-core-release-please[bot] <254602402+hve-core-release-please[bot]@users.noreply.github.com> Co-authored-by: Bill Berry <wberry@microsoft.com>
Pull Request
Description
Adds a new conversational coaching agent that guides users through designing a Minimum Viable Experiment (MVE). The agent follows a structured, phase-based process — from problem discovery and hypothesis formation through viability vetting to a complete experiment plan. It helps users translate unknowns and assumptions into crisp, testable hypotheses, evaluates experiment feasibility, and produces actionable MVE plans with session tracking via .copilot-tracking. Includes the agent definition (experiment-designer.agent.md) and companion instructions (experiment-designer.instructions.md) covering MVE domain knowledge, vetting criteria, and experiment type reference.
Related Issue(s)
Closes #973
Type of Change
Select all that apply:
Code & Documentation:
Infrastructure & Configuration:
AI Artifacts:
prompt-builderagent and addressed all feedback.github/instructions/*.instructions.md).github/prompts/*.prompt.md).github/agents/*.agent.md).github/skills/*/SKILL.md)Other:
.ps1,.sh,.py)Sample Prompts (for AI Artifact Contributions)
User Request:
Execution Flow:
Phase 1 — Problem & Context Discovery: Agent asks probing questions about the problem statement, customer context, business case, unknowns, and constraints. Creates a tracking directory at .copilot-tracking/mve/{date}/{experiment-name}/ and writes context.md.
Phase 2 — Hypothesis Formation: Agent guides user to translate unknowns into testable hypotheses using the format "We believe [assumption]. We will test this by [method]. We will know we are right/wrong when [measurable outcome]." Prioritizes hypotheses by risk and impact. Writes hypotheses.md.
Phase 3 — MVE Vetting & Red Flag Check: Agent applies four vetting criteria (business sense, crisp problem statement, Responsible AI, clear next steps) and checks against nine red flag patterns (demos, skipping ahead, solved problems, mini-MVP, etc.). Writes vetting.md. If fundamental problems found, returns to Phase 1 or 2.
Phase 4 — Experiment Design: Agent helps choose experiment type, define technical approach, set measurable success/failure criteria per hypothesis, scope timeline to weeks, and plan post-experiment evaluation. Writes experiment-design.md.
Phase 5 — MVE Plan Output: Agent consolidates all phase outputs into a single mve-plan.md document for stakeholder review. Iterates based on user feedback, returning to earlier phases if needed.
Output Artifacts:
context.md — Problem statement, customer context, business justification
hypotheses.md — Prioritized testable hypotheses with assumption/method/outcome
vetting.md — Vetting criteria results and red flag assessment
experiment-design.md — Approach, scope, timeline, resources, success criteria
mve-plan.md — Consolidated plan document for stakeholder review
Business Case
{Why this experiment matters, what decision it informs}
Success Indicators:
The .copilot-tracking/mve/{date}/{experiment-name}/ directory contains all five markdown artifacts (context.md, hypotheses.md, vetting.md, experiment-design.md, mve-plan.md)
Each hypothesis follows the three-part format: assumption, test method, measurable outcome
Hypotheses are prioritized by risk and impact with clear rationale
Vetting results explicitly address all four criteria and flag any red flags encountered
Success and failure criteria are defined per hypothesis with quantitative thresholds
The experiment is scoped to weeks (not months) with explicit out-of-scope boundaries
mve-plan.md includes next steps for both validated and invalidated outcomes
The agent challenged vague problem statements or untestable hypotheses rather than accepting them uncritically
For detailed contribution requirements, see:
Testing
I've used it for a few MVE opportunities to help refine our hypotheses and plan our MVE.
Checklist
Required Checks
AI Artifact Contributions
/prompt-analyzeto review contributionprompt-builderreviewRequired Automated Checks
The following validation commands must pass before merging:
npm run lint:mdnpm run spell-checknpm run lint:frontmatternpm run validate:skillsnpm run lint:md-linksnpm run lint:psnpm run plugin:generate(can't run dev container, hoping ci/cd pipeline checks these :) )
Security Considerations
Additional Notes