fix: guard empty sandbox prompts#278
Merged
github-actions[bot] merged 1 commit intoMay 23, 2026
Merged
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses a failure mode where waza run can proceed with an effectively empty sandbox when prompts reference relative paths, leading to misleading “passed” results under permissive graders. It adds a fast-fail guard in the runner, updates the Quick Start grader example to be less permissive, and adds a regression test for the new guard.
Changes:
- Add a runner-level validation that errors when the prompt mentions
./or../but no workspace resources were loaded. - Update Quick Start documentation to use a
contains-based text grader instead of a permissive\w+regex and clarify sandbox/file-loading behavior. - Add a unit test covering the empty-sandbox + relative-path prompt rejection.
Show a summary per file
| File | Description |
|---|---|
| site/src/content/docs/quick-start.mdx | Tightens the Quick Start grader example and documents how sandbox file loading works. |
| internal/orchestration/runner.go | Adds a preflight guard to reject relative-path prompts when no resources were loaded. |
| internal/orchestration/runner_test.go | Adds regression coverage for the new empty-sandbox guard. |
Copilot's findings
Comments suppressed due to low confidence (1)
internal/orchestration/runner.go:1216
rejectRelativePathPromptWithEmptySandboxvalidatestc.Stimulus.Message, but follow-up execution overwritesfollowReq.MessageafterbuildExecutionRequestreturns. This allows relative-path prompts in follow-ups to bypass the guard entirely. Consider validating against the actual request message (or re-validating afterfollowReq.Message = prompt).
func (r *EvalRunner) executeFollowUps(ctx context.Context, tc *models.TestCase, resp *execution.ExecutionResponse) {
for i, prompt := range tc.Stimulus.FollowUps {
followReq, err := r.buildExecutionRequest(tc)
if err != nil {
resp.ErrorMsg = fmt.Sprintf("follow-up %d/%d setup failed: %v", i+1, len(tc.Stimulus.FollowUps), err)
break
}
followReq.Message = prompt
followReq.SessionID = resp.SessionID
followReq.WorkspaceDir = resp.WorkspaceDir
- Files reviewed: 3/3 changed files
- Comments generated: 4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #273
Summary
containscheck and clarify that--context-dironly resolvesinputs.files.Validation
go test ./...cd site && npm run buildDocs impact
site/src/content/docs/quick-start.mdxto correct the grader example and document sandbox/file-loading behavior.