feat: support pre-written follow-up prompts in eval YAML by spboyer · Pull Request #209 · microsoft/waza

spboyer · 2026-04-21T19:54:00Z

Summary

Closes #189

Adds support for follow_up_prompts in eval YAML task definitions, enabling multi-turn evaluation scenarios where each follow-up reuses the same session and workspace.

Working as Linus (Backend Developer)

Changes

Core

internal/models/testcase.go: Add FollowUps []string to TestStimulus (yaml: follow_up_prompts)
internal/execution/engine.go: Add WorkspaceDir to ExecutionRequest for workspace reuse
internal/execution/copilot.go: Skip setupWorkspace when WorkspaceDir is set
internal/execution/mock.go: Support workspace reuse and SessionID passthrough
internal/orchestration/runner.go: Add executeFollowUps() with result aggregation (events, tool calls, usage, duration)

Tests

3 YAML parsing subtests (present, omitted, single)
5 orchestration tests (no follow-ups, single, multiple, error mid-sequence, full benchmark)

Documentation

schemas/task.schema.json: Add follow_up_prompts property
site/src/content/docs/guides/eval-yaml.mdx: Add follow-up prompts guide section
site/src/content/docs/reference/schema.mdx: Add schema reference entry

Example

inputs:
  prompt: "Create a Python function that reads a CSV file"
  follow_up_prompts:
    - "Add error handling for missing files"
    - "Write unit tests for the function"

How it works

Initial prompt executes normally, creating a workspace and session
Each follow-up reuses the same WorkspaceDir and SessionID
Results are aggregated: events/tool calls appended, duration summed, last output wins
If any follow-up fails, remaining are skipped and the run is marked as error
Graders evaluate only the final state after all prompts complete

Add follow_up_prompts field to TestStimulus for multi-turn eval scenarios. Follow-up prompts reuse the same session and workspace, preserving conversation history and file changes across turns. Changes: - Add FollowUps []string to TestStimulus (yaml: follow_up_prompts) - Add WorkspaceDir to ExecutionRequest for workspace reuse - Update CopilotEngine to skip setupWorkspace when WorkspaceDir is set - Update MockEngine to support workspace reuse and SessionID passthrough - Add executeFollowUps() to orchestration runner with result aggregation - Add 3 YAML parsing tests and 5 orchestration tests - Update JSON schema, eval-yaml guide, and schema reference docs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer requested a review from chlowell as a code owner April 21, 2026 19:54

spboyer added the squad:linus Assigned to Linus (Backend Developer) label Apr 21, 2026

spboyer requested review from richardpark-msft and wbreza as code owners April 21, 2026 19:54

github-actions Bot enabled auto-merge (squash) April 21, 2026 19:54

spboyer force-pushed the squad/189-follow-up-prompts branch from 41244f5 to 9abb021 Compare April 21, 2026 21:05

spboyer merged commit c37f404 into main Apr 21, 2026
3 of 5 checks passed

spboyer deleted the squad/189-follow-up-prompts branch April 21, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support pre-written follow-up prompts in eval YAML#209

feat: support pre-written follow-up prompts in eval YAML#209
spboyer merged 1 commit into
mainfrom
squad/189-follow-up-prompts

spboyer commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

spboyer commented Apr 21, 2026

Summary

Changes

Core

Tests

Documentation

Example

How it works

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants