Skip to content

waza suggest swallows engine errors — reports "response is not valid suggestion YAML" instead of the real failure #316

Description

@joshuaferguson

Summary

When the pass-2 LLM request in waza suggest fails at the engine level, the command always exits with:

parsing suggest response: response is not valid suggestion YAML

regardless of what actually went wrong. The real error is captured in ExecutionResponse.ErrorMsg but never surfaced, making the command appear flaky/broken when the underlying cause is a transient request failure.

Environment

  • waza v0.33.0 (binary install via install.sh)
  • macOS (arm64)
  • Default copilot-sdk executor / bundled Copilot CLI

Repro

waza suggest .github/skills/<any-skill> --debug --format json

Fails most of the time in our repo (23 skills; tried multiple skills, multiple days). It occasionally succeeds, which is consistent with a transient/server-side rejection rather than a YAML formatting problem.

Debug evidence

With --debug, the event stream shows pass 1 (grader selection) completing normally, then pass 2 (eval generation, ~47 KB prompt) dying ~130 ms after the turn starts — no assistant.message, no assistant.turn_end — far too fast for any generation to have happened:

15:14:27.223 type=user.message        (pass-2 implementation prompt, ~47KB)
15:14:27.225 type=session.title_changed
15:14:27.424 type=assistant.turn_start turnID=0
15:14:27.424 type=session.usage_info
15:14:27.556 type=session.info
parsing suggest response: response is not valid suggestion YAML

For comparison, pass 1 in the same run produced assistant.message (graders: [trigger, file, text, prompt]) and assistant.turn_end before session.idle.

Root cause

CopilotEngine.Execute intentionally reports inline conversation errors via the response rather than a Go error (internal/execution/copilot.go#L478-L509):

// errors that are returned inline, as part of the conversation, also come back
// in the returned error. Rather than having one of those fun functions that returns
// both an error and a result, I'll just put the error message in the ExecutionResponse.
errMsg = err.Error()
...
ErrorMsg: errMsg,
Success:  err == nil,

But suggest.Generate only checks the Go error and resp == nil (internal/suggest/suggest.go#L93-L107):

resp, err := engine.Execute(implCtx, &execution.ExecutionRequest{Message: implPrompt})
...
if resp == nil {
    return nil, errors.New("empty engine response")
}
suggestion, err := ParseResponse(resp.FinalOutput)

When the engine fails, resp.FinalOutput is empty, ParseResponse("") falls through to the catch-all at internal/suggest/suggest.go#L217, and the actual resp.ErrorMsg is discarded. The same gap applies to the pass-1 call at internal/suggest/suggest.go#L73 (a pass-1 failure silently degrades to "no grader docs").

Suggested fix

In suggest.Generate, after each engine.Execute call:

if !resp.Success {
    return nil, fmt.Errorf("engine execution failed: %s", resp.ErrorMsg)
}

(or include resp.ErrorMsg in the parse-failure message when FinalOutput is empty). Bonus: distinguishing "model returned nothing" from "model returned unparseable YAML" in ParseResponse would make the remaining genuine parse failures easier to debug too.

Related

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions