Skip to content

fix: guard empty sandbox prompts#278

Merged
github-actions[bot] merged 1 commit into
mainfrom
spboyer/fix-issue-273-empty-sandbox-grader
May 23, 2026
Merged

fix: guard empty sandbox prompts#278
github-actions[bot] merged 1 commit into
mainfrom
spboyer/fix-issue-273-empty-sandbox-grader

Conversation

@spboyer

@spboyer spboyer commented May 22, 2026

Copy link
Copy Markdown
Member

Closes #273

Summary

  • Fail fast when prompts mention relative paths but no workspace files were loaded.
  • Update the quick-start example to use a valid, more specific contains check and clarify that --context-dir only resolves inputs.files.
  • Add regression coverage for the empty-sandbox path.

Validation

  • go test ./...
  • cd site && npm run build

Docs impact

  • Updated site/src/content/docs/quick-start.mdx to correct the grader example and document sandbox/file-loading behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 22, 2026 20:48
@github-actions github-actions Bot enabled auto-merge (squash) May 22, 2026 20:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a failure mode where waza run can proceed with an effectively empty sandbox when prompts reference relative paths, leading to misleading “passed” results under permissive graders. It adds a fast-fail guard in the runner, updates the Quick Start grader example to be less permissive, and adds a regression test for the new guard.

Changes:

  • Add a runner-level validation that errors when the prompt mentions ./ or ../ but no workspace resources were loaded.
  • Update Quick Start documentation to use a contains-based text grader instead of a permissive \w+ regex and clarify sandbox/file-loading behavior.
  • Add a unit test covering the empty-sandbox + relative-path prompt rejection.
Show a summary per file
File Description
site/src/content/docs/quick-start.mdx Tightens the Quick Start grader example and documents how sandbox file loading works.
internal/orchestration/runner.go Adds a preflight guard to reject relative-path prompts when no resources were loaded.
internal/orchestration/runner_test.go Adds regression coverage for the new empty-sandbox guard.

Copilot's findings

Comments suppressed due to low confidence (1)

internal/orchestration/runner.go:1216

  • rejectRelativePathPromptWithEmptySandbox validates tc.Stimulus.Message, but follow-up execution overwrites followReq.Message after buildExecutionRequest returns. This allows relative-path prompts in follow-ups to bypass the guard entirely. Consider validating against the actual request message (or re-validating after followReq.Message = prompt).
func (r *EvalRunner) executeFollowUps(ctx context.Context, tc *models.TestCase, resp *execution.ExecutionResponse) {
	for i, prompt := range tc.Stimulus.FollowUps {
		followReq, err := r.buildExecutionRequest(tc)
		if err != nil {
			resp.ErrorMsg = fmt.Sprintf("follow-up %d/%d setup failed: %v", i+1, len(tc.Stimulus.FollowUps), err)
			break
		}
		followReq.Message = prompt
		followReq.SessionID = resp.SessionID
		followReq.WorkspaceDir = resp.WorkspaceDir
  • Files reviewed: 3/3 changed files
  • Comments generated: 4

Comment thread internal/orchestration/runner.go
Comment thread internal/orchestration/runner.go
Comment thread internal/orchestration/runner_test.go
Comment thread site/src/content/docs/quick-start.mdx
@github-actions github-actions Bot merged commit de9b2b6 into main May 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty sandbox + permissive default grader silently produces passing-but-useless runs

3 participants