Stream Foundation Models responses for cancellable decode by FuJacob · Pull Request #339 · FuJacob/cotabby

FuJacob · 2026-05-28T08:39:21Z

Summary

Swaps the FM engine's session.respond call for session.streamResponse and iterates the cumulative snapshots. The external SuggestionResult shape is unchanged — we still return after the final snapshot — but the inner loop now checks Task.checkCancellation() between snapshots, so a coordinator cancel (typing past the in-flight suggestion, switching apps) can abort mid-decode instead of waiting for the model to finish.

Stacked on perf/fm-session-reuse-prewarm. No prompt-policy change, no UI plumbing change yet — pushing partials all the way to the overlay is the natural follow-up and intentionally out of scope here.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build CODE_SIGNING_ALLOWED=NO
→ ** BUILD SUCCEEDED **

xcodebuild test ... -only-testing:CotabbyTests/FoundationModelPromptRendererTests -only-testing:CotabbyTests/SuggestionEngineRouterTests CODE_SIGNING_ALLOWED=NO
→ Executed 9 tests, with 0 failures (0 unexpected)

swiftlint lint --quiet → no violations.

Manual: type fast in a real app with Apple Intelligence selected and watch the suggestion debug panel. Latency on completed requests should be unchanged; CPU/latency on requests that get cancelled mid-flight (fast typer, browser AX flicker) should drop because the decode loop stops sooner.

Linked issues

Refs the FM-quality investigation.

Risk / rollout notes

Behavior change is scoped to the FM critical path; llama is untouched.
We rely on Apple's documented cumulative-snapshot streaming semantics: each yielded Snapshot.content is the entire response so far, so the final iteration carries the full text. If a future Apple update flips to delta semantics this code would degrade to producing only the last delta — easy to spot in the eval suite from the parent stack PR.
Task.checkCancellation() is called both inside the loop (every snapshot) and once after the loop, so a late cancellation between the last snapshot and result construction also throws.
No protocol change; SuggestionGenerating's signature is unchanged.

Greptile Summary

This PR swaps session.respond for session.streamResponse in FoundationModelSuggestionEngine, iterating cumulative snapshots so Task.checkCancellation() can abort mid-decode instead of waiting for the full response. The external SuggestionResult shape is unchanged.

The streaming loop saves rawSuggestion = partial.content before calling try Task.checkCancellation(), preserving the best available text even on a late cancel (addresses previous review feedback on ordering).
A guard didReceiveSnapshot after the loop converts a zero-snapshot stream completion into an explicit generationFailed error rather than silently returning an empty suggestion (addresses previous review feedback on zero-iteration paths).

Confidence Score: 5/5

Safe to merge — the streaming loop is logically correct, both previously flagged concerns have been addressed, and the change is fully scoped to the FM generation path.

The snapshot-before-checkCancellation ordering is correct, the zero-snapshot guard is in place, all existing error cases (CancellationError, GenerationError, SuggestionClientError) continue to be caught and re-mapped appropriately, and the external SuggestionResult contract is unchanged.

No files require special attention.

Important Files Changed

Filename	Overview
Cotabby/Services/Runtime/FoundationModelSuggestionEngine.swift	Replaces `session.respond` with `session.streamResponse` + cumulative-snapshot iteration loop; adds `didReceiveSnapshot` guard and preserves all error-mapping and cancellation handling.

Sequence Diagram

sequenceDiagram
    participant Coordinator
    participant Engine as FoundationModelSuggestionEngine
    participant Session as LanguageModelSession
    participant FM as Apple FM Framework

    Coordinator->>Engine: generateSuggestion(request)
    Engine->>Session: ensureSession(request, model)
    Session-->>Engine: session (cached or new)
    Engine->>FM: session.streamResponse(prompt, options)
    FM-->>Engine: "AsyncSequence<Snapshot>"

    loop for each cumulative snapshot
        FM->>Engine: partial.content (cumulative text so far)
        Engine->>Engine: "rawSuggestion = partial.content"
        Engine->>Engine: "didReceiveSnapshot = true"
        Engine->>Engine: Task.checkCancellation()
        alt Task cancelled
            Engine-->>Coordinator: throw SuggestionClientError.cancelled
        end
    end

    Engine->>Engine: Task.checkCancellation() (post-loop)
    Engine->>Engine: guard didReceiveSnapshot
    alt Zero snapshots received
        Engine-->>Coordinator: throw SuggestionClientError.generationFailed
    end
    Engine->>Engine: SuggestionTextNormalizer.normalize(rawSuggestion)
    Engine-->>Coordinator: SuggestionResult(text, rawText, latency)

_{Reviews (2): Last reviewed commit: "Address Greptile: save snapshot before c..." | Re-trigger Greptile}

…ro-snapshot streams Two narrow corrections to the FM streaming decode loop: - Move `rawSuggestion = partial.content` ahead of `try Task.checkCancellation()` so a late cancel between the final cumulative snapshot and its assignment can't drop fully decoded text on the floor. Cancellation still throws — saving the best-available text first just makes the ordering's intent obvious. - Track `didReceiveSnapshot` and throw `SuggestionClientError.generationFailed` when the stream completes without yielding any snapshots. Apple's documented contract is at least one snapshot on a successful generation, so the zero-snapshot path is a runtime anomaly worth surfacing instead of letting an empty suggestion silently reach the overlay. Both functional paths (cancellation behavior on the normal decode, generation results when at least one snapshot arrives) are unchanged.

Stream Foundation Models responses for cancellable decode

d8b9537

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

Comment thread Cotabby/Services/Runtime/FoundationModelSuggestionEngine.swift

Comment thread Cotabby/Services/Runtime/FoundationModelSuggestionEngine.swift

FuJacob merged commit a3dbc9a into perf/fm-session-reuse-prewarm May 28, 2026

FuJacob added a commit that referenced this pull request May 28, 2026

Stream Foundation Models responses for cancellable decode (#339)

83d7cf2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream Foundation Models responses for cancellable decode#339

Stream Foundation Models responses for cancellable decode#339
FuJacob merged 2 commits into
perf/fm-session-reuse-prewarmfrom
feat/fm-streaming

FuJacob commented May 28, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented May 28, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented May 28, 2026 •

edited by greptile-apps Bot

Loading