fix: guard empty sandbox prompts by spboyer · Pull Request #278 · microsoft/waza

spboyer · 2026-05-22T20:48:34Z

Closes #273

Summary

Fail fast when prompts mention relative paths but no workspace files were loaded.
Update the quick-start example to use a valid, more specific contains check and clarify that --context-dir only resolves inputs.files.
Add regression coverage for the empty-sandbox path.

Validation

go test ./...
cd site && npm run build

Docs impact

Updated site/src/content/docs/quick-start.mdx to correct the grader example and document sandbox/file-loading behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR addresses a failure mode where waza run can proceed with an effectively empty sandbox when prompts reference relative paths, leading to misleading “passed” results under permissive graders. It adds a fast-fail guard in the runner, updates the Quick Start grader example to be less permissive, and adds a regression test for the new guard.

Changes:

Add a runner-level validation that errors when the prompt mentions ./ or ../ but no workspace resources were loaded.
Update Quick Start documentation to use a contains-based text grader instead of a permissive \w+ regex and clarify sandbox/file-loading behavior.
Add a unit test covering the empty-sandbox + relative-path prompt rejection.

Show a summary per file

File	Description
site/src/content/docs/quick-start.mdx	Tightens the Quick Start grader example and documents how sandbox file loading works.
internal/orchestration/runner.go	Adds a preflight guard to reject relative-path prompts when no resources were loaded.
internal/orchestration/runner_test.go	Adds regression coverage for the new empty-sandbox guard.

Copilot's findings

Comments suppressed due to low confidence (1)

internal/orchestration/runner.go:1216

rejectRelativePathPromptWithEmptySandbox validates tc.Stimulus.Message, but follow-up execution overwrites followReq.Message after buildExecutionRequest returns. This allows relative-path prompts in follow-ups to bypass the guard entirely. Consider validating against the actual request message (or re-validating after followReq.Message = prompt).

func (r *EvalRunner) executeFollowUps(ctx context.Context, tc *models.TestCase, resp *execution.ExecutionResponse) {
	for i, prompt := range tc.Stimulus.FollowUps {
		followReq, err := r.buildExecutionRequest(tc)
		if err != nil {
			resp.ErrorMsg = fmt.Sprintf("follow-up %d/%d setup failed: %v", i+1, len(tc.Stimulus.FollowUps), err)
			break
		}
		followReq.Message = prompt
		followReq.SessionID = resp.SessionID
		followReq.WorkspaceDir = resp.WorkspaceDir

Files reviewed: 3/3 changed files
Comments generated: 4

fix: guard empty sandbox prompts #273

d7e95b3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 22, 2026 20:48

Copilot started reviewing on behalf of spboyer May 22, 2026 20:48 View session

github-actions Bot enabled auto-merge (squash) May 22, 2026 20:51

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread internal/orchestration/runner.go

Comment thread internal/orchestration/runner.go

Comment thread internal/orchestration/runner_test.go

Comment thread site/src/content/docs/quick-start.mdx

github-actions Bot merged commit de9b2b6 into main May 23, 2026
9 checks passed

spboyer mentioned this pull request May 23, 2026

Empty sandbox + permissive default grader silently produces passing-but-useless runs #273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: guard empty sandbox prompts#278

fix: guard empty sandbox prompts#278
github-actions[bot] merged 1 commit into
mainfrom
spboyer/fix-issue-273-empty-sandbox-grader

spboyer commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spboyer commented May 22, 2026

Summary

Validation

Docs impact

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants