Stream Foundation Models responses for cancellable decode#339
Merged
Conversation
…ro-snapshot streams Two narrow corrections to the FM streaming decode loop: - Move `rawSuggestion = partial.content` ahead of `try Task.checkCancellation()` so a late cancel between the final cumulative snapshot and its assignment can't drop fully decoded text on the floor. Cancellation still throws — saving the best-available text first just makes the ordering's intent obvious. - Track `didReceiveSnapshot` and throw `SuggestionClientError.generationFailed` when the stream completes without yielding any snapshots. Apple's documented contract is at least one snapshot on a successful generation, so the zero-snapshot path is a runtime anomaly worth surfacing instead of letting an empty suggestion silently reach the overlay. Both functional paths (cancellation behavior on the normal decode, generation results when at least one snapshot arrives) are unchanged.
FuJacob
added a commit
that referenced
this pull request
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Swaps the FM engine's
session.respondcall forsession.streamResponseand iterates the cumulative snapshots. The externalSuggestionResultshape is unchanged — we still return after the final snapshot — but the inner loop now checksTask.checkCancellation()between snapshots, so a coordinator cancel (typing past the in-flight suggestion, switching apps) can abort mid-decode instead of waiting for the model to finish.Stacked on
perf/fm-session-reuse-prewarm. No prompt-policy change, no UI plumbing change yet — pushing partials all the way to the overlay is the natural follow-up and intentionally out of scope here.Validation
xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build CODE_SIGNING_ALLOWED=NO→
** BUILD SUCCEEDED **xcodebuild test ... -only-testing:CotabbyTests/FoundationModelPromptRendererTests -only-testing:CotabbyTests/SuggestionEngineRouterTests CODE_SIGNING_ALLOWED=NO→
Executed 9 tests, with 0 failures (0 unexpected)swiftlint lint --quiet→ no violations.Manual: type fast in a real app with Apple Intelligence selected and watch the suggestion debug panel. Latency on completed requests should be unchanged; CPU/latency on requests that get cancelled mid-flight (fast typer, browser AX flicker) should drop because the decode loop stops sooner.
Linked issues
Refs the FM-quality investigation.
Risk / rollout notes
Snapshot.contentis the entire response so far, so the final iteration carries the full text. If a future Apple update flips to delta semantics this code would degrade to producing only the last delta — easy to spot in the eval suite from the parent stack PR.Task.checkCancellation()is called both inside the loop (every snapshot) and once after the loop, so a late cancellation between the last snapshot and result construction also throws.SuggestionGenerating's signature is unchanged.Greptile Summary
This PR swaps
session.respondforsession.streamResponseinFoundationModelSuggestionEngine, iterating cumulative snapshots soTask.checkCancellation()can abort mid-decode instead of waiting for the full response. The externalSuggestionResultshape is unchanged.rawSuggestion = partial.contentbefore callingtry Task.checkCancellation(), preserving the best available text even on a late cancel (addresses previous review feedback on ordering).guard didReceiveSnapshotafter the loop converts a zero-snapshot stream completion into an explicitgenerationFailederror rather than silently returning an empty suggestion (addresses previous review feedback on zero-iteration paths).Confidence Score: 5/5
Safe to merge — the streaming loop is logically correct, both previously flagged concerns have been addressed, and the change is fully scoped to the FM generation path.
The snapshot-before-checkCancellation ordering is correct, the zero-snapshot guard is in place, all existing error cases (CancellationError, GenerationError, SuggestionClientError) continue to be caught and re-mapped appropriately, and the external SuggestionResult contract is unchanged.
No files require special attention.
Important Files Changed
session.respondwithsession.streamResponse+ cumulative-snapshot iteration loop; addsdidReceiveSnapshotguard and preserves all error-mapping and cancellation handling.Sequence Diagram
sequenceDiagram participant Coordinator participant Engine as FoundationModelSuggestionEngine participant Session as LanguageModelSession participant FM as Apple FM Framework Coordinator->>Engine: generateSuggestion(request) Engine->>Session: ensureSession(request, model) Session-->>Engine: session (cached or new) Engine->>FM: session.streamResponse(prompt, options) FM-->>Engine: "AsyncSequence<Snapshot>" loop for each cumulative snapshot FM->>Engine: partial.content (cumulative text so far) Engine->>Engine: "rawSuggestion = partial.content" Engine->>Engine: "didReceiveSnapshot = true" Engine->>Engine: Task.checkCancellation() alt Task cancelled Engine-->>Coordinator: throw SuggestionClientError.cancelled end end Engine->>Engine: Task.checkCancellation() (post-loop) Engine->>Engine: guard didReceiveSnapshot alt Zero snapshots received Engine-->>Coordinator: throw SuggestionClientError.generationFailed end Engine->>Engine: SuggestionTextNormalizer.normalize(rawSuggestion) Engine-->>Coordinator: SuggestionResult(text, rawText, latency)Reviews (2): Last reviewed commit: "Address Greptile: save snapshot before c..." | Re-trigger Greptile