Skip to content

Stream Foundation Models responses for cancellable decode#339

Merged
FuJacob merged 2 commits into
perf/fm-session-reuse-prewarmfrom
feat/fm-streaming
May 28, 2026
Merged

Stream Foundation Models responses for cancellable decode#339
FuJacob merged 2 commits into
perf/fm-session-reuse-prewarmfrom
feat/fm-streaming

Conversation

@FuJacob

@FuJacob FuJacob commented May 28, 2026

Copy link
Copy Markdown
Owner

Summary

Swaps the FM engine's session.respond call for session.streamResponse and iterates the cumulative snapshots. The external SuggestionResult shape is unchanged — we still return after the final snapshot — but the inner loop now checks Task.checkCancellation() between snapshots, so a coordinator cancel (typing past the in-flight suggestion, switching apps) can abort mid-decode instead of waiting for the model to finish.

Stacked on perf/fm-session-reuse-prewarm. No prompt-policy change, no UI plumbing change yet — pushing partials all the way to the overlay is the natural follow-up and intentionally out of scope here.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build CODE_SIGNING_ALLOWED=NO
** BUILD SUCCEEDED **

xcodebuild test ... -only-testing:CotabbyTests/FoundationModelPromptRendererTests -only-testing:CotabbyTests/SuggestionEngineRouterTests CODE_SIGNING_ALLOWED=NO
Executed 9 tests, with 0 failures (0 unexpected)

swiftlint lint --quiet → no violations.

Manual: type fast in a real app with Apple Intelligence selected and watch the suggestion debug panel. Latency on completed requests should be unchanged; CPU/latency on requests that get cancelled mid-flight (fast typer, browser AX flicker) should drop because the decode loop stops sooner.

Linked issues

Refs the FM-quality investigation.

Risk / rollout notes

  • Behavior change is scoped to the FM critical path; llama is untouched.
  • We rely on Apple's documented cumulative-snapshot streaming semantics: each yielded Snapshot.content is the entire response so far, so the final iteration carries the full text. If a future Apple update flips to delta semantics this code would degrade to producing only the last delta — easy to spot in the eval suite from the parent stack PR.
  • Task.checkCancellation() is called both inside the loop (every snapshot) and once after the loop, so a late cancellation between the last snapshot and result construction also throws.
  • No protocol change; SuggestionGenerating's signature is unchanged.

Greptile Summary

This PR swaps session.respond for session.streamResponse in FoundationModelSuggestionEngine, iterating cumulative snapshots so Task.checkCancellation() can abort mid-decode instead of waiting for the full response. The external SuggestionResult shape is unchanged.

  • The streaming loop saves rawSuggestion = partial.content before calling try Task.checkCancellation(), preserving the best available text even on a late cancel (addresses previous review feedback on ordering).
  • A guard didReceiveSnapshot after the loop converts a zero-snapshot stream completion into an explicit generationFailed error rather than silently returning an empty suggestion (addresses previous review feedback on zero-iteration paths).

Confidence Score: 5/5

Safe to merge — the streaming loop is logically correct, both previously flagged concerns have been addressed, and the change is fully scoped to the FM generation path.

The snapshot-before-checkCancellation ordering is correct, the zero-snapshot guard is in place, all existing error cases (CancellationError, GenerationError, SuggestionClientError) continue to be caught and re-mapped appropriately, and the external SuggestionResult contract is unchanged.

No files require special attention.

Important Files Changed

Filename Overview
Cotabby/Services/Runtime/FoundationModelSuggestionEngine.swift Replaces session.respond with session.streamResponse + cumulative-snapshot iteration loop; adds didReceiveSnapshot guard and preserves all error-mapping and cancellation handling.

Sequence Diagram

sequenceDiagram
    participant Coordinator
    participant Engine as FoundationModelSuggestionEngine
    participant Session as LanguageModelSession
    participant FM as Apple FM Framework

    Coordinator->>Engine: generateSuggestion(request)
    Engine->>Session: ensureSession(request, model)
    Session-->>Engine: session (cached or new)
    Engine->>FM: session.streamResponse(prompt, options)
    FM-->>Engine: "AsyncSequence<Snapshot>"

    loop for each cumulative snapshot
        FM->>Engine: partial.content (cumulative text so far)
        Engine->>Engine: "rawSuggestion = partial.content"
        Engine->>Engine: "didReceiveSnapshot = true"
        Engine->>Engine: Task.checkCancellation()
        alt Task cancelled
            Engine-->>Coordinator: throw SuggestionClientError.cancelled
        end
    end

    Engine->>Engine: Task.checkCancellation() (post-loop)
    Engine->>Engine: guard didReceiveSnapshot
    alt Zero snapshots received
        Engine-->>Coordinator: throw SuggestionClientError.generationFailed
    end
    Engine->>Engine: SuggestionTextNormalizer.normalize(rawSuggestion)
    Engine-->>Coordinator: SuggestionResult(text, rawText, latency)
Loading

Reviews (2): Last reviewed commit: "Address Greptile: save snapshot before c..." | Re-trigger Greptile

Comment thread Cotabby/Services/Runtime/FoundationModelSuggestionEngine.swift
Comment thread Cotabby/Services/Runtime/FoundationModelSuggestionEngine.swift
…ro-snapshot streams

Two narrow corrections to the FM streaming decode loop:

- Move `rawSuggestion = partial.content` ahead of `try Task.checkCancellation()`
  so a late cancel between the final cumulative snapshot and its assignment can't
  drop fully decoded text on the floor. Cancellation still throws — saving the
  best-available text first just makes the ordering's intent obvious.
- Track `didReceiveSnapshot` and throw `SuggestionClientError.generationFailed`
  when the stream completes without yielding any snapshots. Apple's documented
  contract is at least one snapshot on a successful generation, so the
  zero-snapshot path is a runtime anomaly worth surfacing instead of letting an
  empty suggestion silently reach the overlay.

Both functional paths (cancellation behavior on the normal decode, generation
results when at least one snapshot arrives) are unchanged.
@FuJacob FuJacob merged commit a3dbc9a into perf/fm-session-reuse-prewarm May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant