Skip to content

feat: support pre-written follow-up prompts in eval YAML#209

Merged
spboyer merged 1 commit into
mainfrom
squad/189-follow-up-prompts
Apr 21, 2026
Merged

feat: support pre-written follow-up prompts in eval YAML#209
spboyer merged 1 commit into
mainfrom
squad/189-follow-up-prompts

Conversation

@spboyer

@spboyer spboyer commented Apr 21, 2026

Copy link
Copy Markdown
Member

Summary

Closes #189

Adds support for follow_up_prompts in eval YAML task definitions, enabling multi-turn evaluation scenarios where each follow-up reuses the same session and workspace.

Working as Linus (Backend Developer)

Changes

Core

  • internal/models/testcase.go: Add FollowUps []string to TestStimulus (yaml: follow_up_prompts)
  • internal/execution/engine.go: Add WorkspaceDir to ExecutionRequest for workspace reuse
  • internal/execution/copilot.go: Skip setupWorkspace when WorkspaceDir is set
  • internal/execution/mock.go: Support workspace reuse and SessionID passthrough
  • internal/orchestration/runner.go: Add executeFollowUps() with result aggregation (events, tool calls, usage, duration)

Tests

  • 3 YAML parsing subtests (present, omitted, single)
  • 5 orchestration tests (no follow-ups, single, multiple, error mid-sequence, full benchmark)

Documentation

  • schemas/task.schema.json: Add follow_up_prompts property
  • site/src/content/docs/guides/eval-yaml.mdx: Add follow-up prompts guide section
  • site/src/content/docs/reference/schema.mdx: Add schema reference entry

Example

inputs:
  prompt: "Create a Python function that reads a CSV file"
  follow_up_prompts:
    - "Add error handling for missing files"
    - "Write unit tests for the function"

How it works

  1. Initial prompt executes normally, creating a workspace and session
  2. Each follow-up reuses the same WorkspaceDir and SessionID
  3. Results are aggregated: events/tool calls appended, duration summed, last output wins
  4. If any follow-up fails, remaining are skipped and the run is marked as error
  5. Graders evaluate only the final state after all prompts complete

@spboyer spboyer requested a review from chlowell as a code owner April 21, 2026 19:54
@spboyer spboyer added the squad:linus Assigned to Linus (Backend Developer) label Apr 21, 2026
@github-actions github-actions Bot enabled auto-merge (squash) April 21, 2026 19:54
Add follow_up_prompts field to TestStimulus for multi-turn eval scenarios.
Follow-up prompts reuse the same session and workspace, preserving
conversation history and file changes across turns.

Changes:
- Add FollowUps []string to TestStimulus (yaml: follow_up_prompts)
- Add WorkspaceDir to ExecutionRequest for workspace reuse
- Update CopilotEngine to skip setupWorkspace when WorkspaceDir is set
- Update MockEngine to support workspace reuse and SessionID passthrough
- Add executeFollowUps() to orchestration runner with result aggregation
- Add 3 YAML parsing tests and 5 orchestration tests
- Update JSON schema, eval-yaml guide, and schema reference docs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/189-follow-up-prompts branch from 41244f5 to 9abb021 Compare April 21, 2026 21:05
@spboyer spboyer merged commit c37f404 into main Apr 21, 2026
3 of 5 checks passed
@spboyer spboyer deleted the squad/189-follow-up-prompts branch April 21, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

squad:linus Assigned to Linus (Backend Developer)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature request: support pre-written follow up prompts

2 participants